Users may encounter problems when working with SQL Server, but they can be avoided with the right know-how. This blog article investigates serious problems of the type “My application has stopped working.” Frank Sander analyzes the most frequent cases and provides suggestions on how they can be resolved, avoided or limited.
Problem 1: A Downtime-Critical Database is Inconsistent
The daily maintenance plan for backup and database consistency checking reported a consistency error during implementation. The database in question was used actively in a production workflow. Although otherwise standard, recovery of the last consistent full backup in connection with all relevant transaction log backups needed to be avoided for reasons of time.
The first step was to identify the table containing the inconsistent data. Legacy backups were used to restore its content in a test environment. Then the same table was emptied in the productive environment and filled with the datasets from the test environment.
A maintenance plan to check for database inconsistencies is absolutely crucial. There are different ways of preventing data loss, provided the results are checked properly and countermeasures are taken quickly. Moreover, databases that require guaranteed high availability must be operated – irrespective of any inconsistencies – within a suitable high-availability solution. SQL Server offers two useful options with its AlwaysOn Failover Cluster and the AlwaysOn availability groups, which were released with version 2012. Ultimately, the specific requirements will determine which of these high-availability solutions is used.
Problem 2: A Data loss-critical Database has Been Inconsistent for at Least 3 Days
Microsoft recommends that inconsistent databases be restored using the last consistent full backup, possibly with other differential backups and/or transaction log backups. But the database was operated in simple recovery model only, so using this kind of recovery solution would have meant the loss of at least three days of data. A data loss-free repair of the database using
was not possible due to the error message from the database consistency check.
The solution to repair the database with the lowest possible loss of data was therefore to use
with downstream restoration of referential integrity.
As in the first example, this case clearly demonstrates how crucial it is to conduct regular database consistency checks, at best combined with full backups. Action must be taken as quickly as possible if there is an error message, especially when the simple recovery model is used. Otherwise the potential data loss will get bigger and bigger. Equally essential is the clarification of how much data loss is tolerable. The recovery model and backup concept will then need to be adjusted accordingly.
Problem 3: The Hard Disk Partition for Transaction Logs Needs to be Expanded Continuously
The transaction log is getting steadily bigger. A maintenance plan to implement transaction log backups and a suitable job for the SQL Server Agents were scheduled, but the job kept aborting with an error message. Execution of the job was not monitored.
The maintenance plan was corrected. The size of the transaction log was reduced successfully once it had been backed up. A comment in this regard: Although reducing the size of the transaction log is unproblematic, reducing the size of the data file will have performance-critical implications in most cases. It should never be executed regularly as part of maintenance plans.
The most critical aspect of this example is the question “What would happen if the transaction log backup is needed?” Maintenance plans, i.e. their execution as SQL Server Agent jobs, must be monitored on all accounts. For this purpose, SQL Server offers a number of features, including e-mail notification in the event of errors. Monitoring the status of the SQL Server Agent service is also essential, as neither jobs nor e-mails can be sent if the service is disabled.
Problem 4: After a Failover, Individual Logins are Invalid when AlwaysOn Availability Groups are in Use
Logins have a SID. What’s more, one database user is always assigned to them on one or several database/s. To model this scenario, the login SID is stored for the database user (= user assignment). To access the system, the user will need a login and a database that has a database user as described above. This is the case for instance 1 in the diagram below.
Let’s assume that an availability group is set up for database 1 on instance 1, while instance 2 manages a secondary replica of DB 1. In this case, all logins with permissions for DB 1 must also be created on instance 2, as their information is stored in the master database and not in the user database. A SID will be generated automatically for all password-protected SQL Logins if the GUI is used for this purpose, and in most cases it will differ from the original SID. However, the original SID for the user database John.Doe is shared during database mirroring. This is not a problem as long as instance 1 manages the primary replica and it is used exclusively for all accesses to DB 1. But John.Doe will not have a valid assignment and will be unable to log in if instance 1 experiences a scheduled (e.g. for an update) or unscheduled failure and the database DB 1 on instance 2 becomes the primary replica.
It’s worth noting that Windows logins take the login SID from the AD. This means that a Windows login created on an instance managing a secondary replica will have the same SID as the login for the instance containing the primary replica. So in this case the problem described above simply cannot occur.
The creation of SQL logins on the secondary replicas must include the SID of the login for the primary instance. After failover, alignment of the user assignments in the standard way, used above all for migrations
is not sufficient. Although login on instance 2 would work initially, as the login-SID saved for the user John.Doe was changed to 987654, the change would also be mirrored to instance 1, and the same problem would arise again in the event of a failover on instance 1.
Problem 5: The existing instances are unavailable during installation of another instance of SQL Server within a highly productive operating system environment
There were already two instances of SQL Server 2012 and 2014 (each), and SQL Server 2012 was installed. This installation overwrote a number of .dll’s in the SQL Server browser, which caused the service to crash.
A repair installation using the SQL Server 2014 medium reliably fixed these problems.
The use of several SQL Server versions in an operating system environment is not advisable, especially for productive systems. On no account should more recent and then older versions of SQL Server be installed in an operating system environment. If one or several instances are already running in a relevant operating system environment, installation should take place within the maintenance window for these instances and of course only after production of database backups.
Database consistency checks are absolutely essential to keep SQL Server up and running smoothly. Conduct them with every full backup, monitor the result of job completion and the status of SQL Server services, and use the checksums for page verification in your databases.
Looking for More Details?
Are you experiencing other problems with your SQL Server? Do you want to know more about the latest version?
SoftwareONE LifeHacks: Automatic Scheduling of Breaks
Today I’d like to introduce you to an altogether pleasant Outlook life hack. Say goodbye to the hectic of gulping a coffee between back-to-back meetings. Instead you can enjoy a welcome break that is automatically included in your calendar planning.