Wednesday 20 March 2013

SQL 2008 - Debugging failed backups

Here are some steps I followed today on SQL 2008 concerning some failed backups (not my server)...

Maintenance plan history

Viewing the history of the Maintenance plan shows 2 things
  1. The Success/Failure of the plan
  2. Which Task it has failed on

Detail concerning a failure is buried in a single line error message for each Maintenance Plan task.

Maintenance plan errors tend to obfuscate the true cause as a Maintenance Plan is an Integration Services wrapper around database management tasks.

 " Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.  End ...  The package execution fa...  The step failed. "

A SSIS developer would understand these errors, a DBA maybe not so.

If a task was backing up mulitple databases and the final one failed, there is a lot of text to wade through. Creating log files on our Maintenance plans is the way to go to log at this level.


SQL Server Agent History

Again the agent history hides the details we are after.
When a maintenance plan is scheduled, an agent job is created to run it.
The agent job has a parent entry with step ID 0 which reveals the job outcome.
The maintenance plan is implemented as a single step (step ID 1).
Even if a small part of the plan fails, the whole agent step (and parent job) are seen to fail.
The message output here is displayed over several lines and is more readable. It still lacks the detail we want however.


SQL Server Logs

These can be accessed via management studio or the event viewer.
Focusing here on one database, we can locate the reported cause much easier.

SQL Server Logs show 3 separate entries for each backup failure.

Date                       20/03/2013 9:17:03 AM
Log                          SQL Server (Current - 20/03/2013 9:42:00 AM)

Source                    Backup

Message
BACKUP failed to complete the command BACKUP DATABASE Obfuscated. Check the backup application log for detailed messages.


Generic Backup Failure Message, followed by the detail below

Date                      20/03/2013 9:17:03 AM
Log                         SQL Server (Current - 20/03/2013 9:42:00 AM)

Source                  Backup

Message
Error: 3041, Severity: 16, State: 1.

Yes, the backup has failed, we could search for Severity 16, State 1 if we wished.
The next step tells us more however.

Date                      20/03/2013 9:17:03 AM
Log                       SQL Server (Current - 20/03/2013 9:42:00 AM)

Source                  spid76

Message
BackupDiskFile::CreateMedia: Backup device 'R:\BACKUP\FULL\Obfuscated\Obfuscated_backup_2013_03_18_225452_3071857.bak' failed to create. Operating system error 5(failed to retrieve text for this error. Reason: 15105).

So we need to look up OS Error 5

From a command prompt, lookup the OS error message like this,

NET HELPMSG 5





" Access is denied " is fairly blatantly a security thing that I resolved by giving a more generous range set of permissions to the service account responsible for performing the backups.

I never did get to the bottom of how the permissions changed in the first place though. I suspect a change in group membership for the service account but I'll never be able to prove it.

No comments: