Backup Retention Policies

Today’s post is just a bunch of questions, but they are very important questions, all around the main topic of a Backup Retention Policy.

  • How long do you keep backups for every database in your environment (including development, testing, quality assurance and production)?
  • Do you keep backups of system databases?
  • Do you keep backups of system databases in development, testing, quality assurance and production?
  • Out of those database backups you keep, how many of them are just-in-case full backups, or even just copy-only backups (out-of-band)?
  • Of the important ones with transaction log backups, do you also keep differential backups?
  • How often are these differential backups taken?
  • Do you check to see that the differential backup file sizes approach or exceed the size of a full backup?
  • Of the transaction log backups, how often are these backups taken? Every fifteen minutes?
  • How long do you keep transaction log backup files?
  • How soon will it be until you run out of space for backups?
  • What is your contingency plan for when you do run out of space?
  • How do you test your backups?
  • How frequently do you test your backups?
  • Do you have an offsite backup solution?

Since each environment is unique, there is no perfect answer for these questions, but I urge you to consider them, especially in the context of the recovery time objective (RTO) and recovery point objective (RPO).

We may not want to retain three months of transaction log backups for a test environment that is purged every week, but we may want to ensure that the model or msdb databases are regularly backed up.

Want to talk about this with your employer or client? Check out Brent Ozar’s updated High Availability and Disaster Recovery Planning Worksheet to see how you fare in the case of a disaster.

Let me know in the comments, or on Twitter at @bornsql.

What is physical storage anyway?

My friend Jason asked me about his MySQL database export file last week, whether it was sufficient to create a new database. I replied saying that he would have to set up the database separately, because the physical files are allocated in a particular way depending on the web host.

What is a physical file, though? Kenneth Fisher, that crazy cat from SQL Studies, reminded me on Twitter that a physical file, or table, or database, isn’t actually physical. Hard drives are mostly virtualised, especially with web servers, and nothing is physical anymore. Even more confusingly, hard drives are gradually replaced by electric circuits and solid state memory technology that does something called wear-levelling.

When we speak of physical files, or databases, or tables, we’re referring to the way these files are allocated on a storage device. In the old days, pre-2000, it was possible on some operating systems and file systems to allocate the fastest part of the spinning platter of a hard drive (the inside tracks) to processes that needed low seek times, like Relational Database Management Systems (RDBMS).

Nowadays with virtualised storage, including network storage and cloud storage, the physical aspect is meaningless. Spindles are what matter. But what’s a spindle? Until recently, storage was provided by spinning electromagnetic platters called hard drives. The spindle in the center is what we refer to, and the more drives allocated to a storage device, the better.

The problem with spinning drives is that they have a built-in delay for the read-head to locate the correct area on the platter surface, which is referred to as latency. Adding more spindles (more drives) to an array lowers the latency, because files are striped across multiple disks (RAID).

For example, in a three-disk configuration, one file could effectively be split into three parts. That improved read times, and therefore latency, because the file system could look for different parts of the file on different underlying platters, effectively reducing seek times by one-third.

To complicate things even more, solid-state storage, which comes in many forms, but most commonly as USB thumb drives, camera cards (often called memory cards, which is a misnomer), and solid-state drives (SSDs), are gradually replacing hard drives in devices for network and cloud storage. SSDs have no latency because there’s no physical platter to spin up and no read-head to position over a particular spot on the surface. The cells that store the circuits that contain the bits of data are accessible instantly.

However, this speed improvement has a trade-off. Solid state storage has a limited lifespan. The cells can only be written to a certain number of times before failing. The drive controller has to move data all the time, called wear-levelling, to ensure that the blocks of storage are used evenly, thereby extending the life of a drive. Where a file is physically located on the drive, can change at any given moment.

Virtual storage in cloud and network storage devices already muddied the waters with sharing data between virtual machines. Solid-state drives make it even more abstract.

A physical database might refer to the server it’s running on or a virtual machine that is running on a physical server. A physical file might refer to a collection of bits on a single spinning disk drive or SSD, or a collection of bits that are located on several hundred spinning disks.

We have reached a point where a physical storage paradigm is meaningless. When we talk about the physical file on a disk, we mean the abstract representation of the file system’s knowledge of where that file is stored at that point in time. Ultimately, though, it doesn’t mean what it used to.

If you would like to commiserate the loss of high-failure spinning rust, find me on Twitter at @bornsql .

PSA: Check Your Backups

This post comes to you from the Spanish countryside. I travel a lot these days, sometimes even for work. The best part about travelling, apart from food, is knowing that my backups are in good shape.

So my Public Service Announcement for you today is to make sure your SQL Server backups are valid.

Step 1: Back up your SQL Server databases. All of them. That’s your production, test, QA, and development environments, user and system databases. The only database you should not be backing up (and cannot) is tempdb.

Step 2: Test your databases by restoring them. Yes, use the checksum option when backing up, but as we all know, a good backup is one that you can restore.

Step 3: Check the consistency. Run DBCC CHECKDB WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY, EXTENDED_LOGICAL_CHECKS. If the database is really massive, you can break this up into DBCC CHECKALLOC, DBCC CHECKCATALOG, and then run DBCC CHECKTABLE per table.

Step 4: Take lots of photos of the Spanish countryside.

If you’d like to share your photos, or discuss how you test your backups, leave a comment, or find me on Twitter at @bornsql .

AzCopy 101 – Copying SQL Server backup files to Azure Blob Storage

SQL Server 2016, released last month, now lets us back up our database files to two locations simultaneously. This is very good news if we’re running the latest and greatest (and it really is a good upgrade).

For everyone else, the bad news is that we are targeting an on-site destination for our backups (local network, SAN, NAS, tape) and using some form of automation to push these backups offsite. One of my budget-conscious customers uses Dropbox.

Since the release of Azure Cool Blob Storage, at a third of the price of typical Blob Storage, we now have a cost-effective off-site archive that is competitive with Amazon Glacier, and remains in the Microsoft family.

We can either use my Azure Blob Storage Sync tool to push our database backups to Azure Blob Storage, or if there is a policy to use Microsoft-only tools, we can use AzCopy.

Many tutorials already exist for how to use AzCopy, including the official documentation. This post is more about gotchas that I’ve discovered while working extensively with the tool.

Keys and Containers

Azure Blob Storage stores its blobs (we can think of them as files) in a container, which is addressable by a URL. When we first set up our storage account, we need to create a container to store the files. The path to that container would look something like this:

https://myaccount.blob.core.windows.net/mycontainer

A key is then needed to access that container in that storage account. It is a long string of characters, with == at the end, which looks something like this:

Zx7uEPwA6MfJ5MXML0MYUqc8k78lYYCvq7h+lG0grumpyMG1TvEpp931SQLXWpoZWfgItEzhvWnKzy9RKGTYfA==

In the examples below, I use the word key to represent this key.

Command Line

We can copy files to and from Azure Blob Storage using the following command.

AzCopy /Source: /Dest: [Options]

If we’re copying files to Azure, the /Source switch would have an on-premises path, and the /Dest switch would have a container path. The /DestKey must then be specified.

(If we were copying from Azure, the Source and Destination would be reversed, and the SourceKey would be specified.)

For example, let’s copy files from my local drive to an Azure storage container:

AzCopy /Source:D:\SQLData\Backup /Dest:https://myaccount.blob.core.windows.net/mycontainer /DestKey:key /S /XO /Z:D:\_Ops\AzCopy

Notes:
/Source – my local hard drive
/Dest – the Azure storage container URL
/DestKey – the secure key to access the container
/S – recurses through all the subfolders in the source location
/XO – ignores all older files that already exist
/Z – specify a fixed path for the journal

The Nitty-Gritty

Don’t Run Multiple Copies

If AzCopy is already running, don’t run it again. This is strongly advised by the official AzCopy documentation.

The reason is that it makes the best use of available resources, or in more practical terms, it’s thrashing the disk and CPU.

Running multiple copies of AzCopy simultaneously will render your machine almost unusable, and it will probably break the journal anyway.

Journal

The biggest headache working with AzCopy is that it uses a journal to keep track of your progress during a copy operation.

The journal normally lives in %LocalAppData%\Microsoft\Azure\AzCopy. This becomes a problem when automating the AzCopy process. Usually we run automated tasks under a service account, which means that we don’t monitor the task until it fails.

Unpredictable journal behaviour

Unfortunately, AzCopy does not fail if a previous copy operation was not completed. Instead, it will wait with a Y or N prompt, which never comes, because the task is running in a service account context.

AzCopy provides the option to use /Y to suppress confirmation prompts.

There are two main reasons why the journal will await a confirmation:

  • The previous operation failed, and pressing Y will allow it to resume where it stopped;
  • The current operation is similar to a previous operation, and pressing Y will overwrite the journal and start over.

As we can see, these are vastly different results. Therefore I do not recommend using the /Y switch.

If for whatever reason AzCopy is interrupted (usually network-related issues), the journal will have to be cleared out manually.

Note that the %LocalAppData% path is user-specific. I would recommend specifying a journal path with the /Z switch, which allows us to set a default location.

Ignore Older Files

Like Xcopy and Robocopy, we can skip older files, which is useful when archiving transaction log backups every 15 minutes.

Use the /XO switch to only copy new files, and ignore the ones that already exist.

Summary

AzCopy is a great tool, but it has a lot of moving parts and can cause headaches if we don’t manage the journal correctly.

If you would like to continue the discussion, or ask me questions, look me up on Twitter at @bornsql.