On Best Practices

We hear a lot about “best practice”, but what does it mean in the context of SQL Server? Wikipedia describes best practice as follows:

A best practice is a method or technique that has been generally accepted as superior to any alternatives because it produces results that are superior to those achieved by other means or because it has become a standard way of doing things, e.g., a standard way of complying with legal or ethical requirements.

We will take a high-level look at high availability and disaster recovery, an important part of database management and administration and a good example of how a certain way of doing things has become a standard. Then we will discuss what it means to implement a best practice for disaster recovery.

High Availability

There are many ways to achieve high availability (HA) — the idea that our database has zero (or close to zero) downtime — and usually this is dictated by our budget. We want business continuity in the face of planned and unplanned downtime. This may take the form of database mirroring, Availability Groups, clustering, or log shipping.

Disaster Recovery

Disaster recovery (DR) is slightly different. It assumes that a disaster has occurred, our highly-available environment has gone down, and now the clock is ticking. Company losses are mounting by the minute.

How this affects me, the DBA

HA/DR is usually expressed in a Service Level Agreement using two terms:

  • RPO, or recovery point objective: the maximum amount of data loss allowed by the business.
  • RTO, or recovery time objective: the maximum amount of time it should take to recover from a disaster.

What we want is a reliable, predictable and repeatable disaster recovery plan, even if we have a high availability solution that has never failed before. If our Availability Group doesn’t fail over cleanly, or the drive array is on fire, or that Windows Update managed to kill the network interface, we need to roll up our sleeves and activate the DR plan.

So, according to guidance, that means having regular full backups, differential backups, and transaction log backups ready to restore at a moment’s notice, right? Yes, but that’s not where our job stops.

Putting our best foot forward

This is where best practice comes in. It’s all very well having a good disaster recovery plan, but it’s useless if we don’t test it. Best practice in this case would be having both a good backup strategy and a methodology to make sure our DR plan is tested often, so that when something goes wrong, we have a script to follow. That way we will be able to recover with a small RPO and a short RTO, so that business is impacted as little as possible.

We call this a Run Book. Anyone in our organisation should be able to follow the script and perform a successful recovery.

What if we are incapacitated in some way, or in my case recently, stuck on an aircraft with no access to the Internet? What if the only person who is able to restore the environment is the CEO after hearing that her building is on fire?

Our backups must not only be stored securely somewhere (and in my post last week, I provided a budget-friendly solution on how to do off-site backups), but we should be able to restore to a known point in time, within the recovery time objective, losing as little data as possible.

Our run book should be clear enough so that the person who is testing the disaster recovery plan understands it so as not to ask us questions.

Earlier we saw the word “superior” repeated in the definition of best practice. A superior solution doesn’t need to be the most expensive one, using the latest and greatest tools. A run book could be an Excel file containing enough information to rebuild a server and locate magical scripts in the cloud that are tested weekly.

Know the problem we’re trying to solve. Build a solution that is practical and cost-effective for our environment. Test and iterate until it is flawless. That’s best practice.

Compressed and Encrypted Backups on the Cheap

We back up our databases regularly. We store them on a secure network location so that if the database drive fails, we can still satisfy the requirements for a rapid disaster recovery time (RTO, or Recovery Time Objective).

Some of us store these backups off-site as well, as an added security measure, either on tape or using cloud storage solutions (Azure Blob Storage, Amazon S3).

And we encrypt these backups, right?

Right?

SQL Server 2016 has some very neat features for encrypting databases, tables, rows, backups, and even in-memory stuff.

But what if we are still slogging away on the ancient SQL Server 2008 or 2008 R2 Standard Edition, and we need to compress and encrypt our backups, using industry standard AES-256 encryption?

Enter 7-Zip. It is free, open-source, and governed by the LGPL licence.

7-Zip Logo

The GUI (referred to as the 7-Zip File Manager) is installed by default. It looks something like this:

7-Zip File Manager GUI

7-Zip File Manager GUI

Additionally, there is a command-line version (7za.exe), which has the same features as the 7-Zip GUI. You can get the command-line version by downloading the 7-Zip Extra package.

This command-line tool can be used for scripting in batch files or PowerShell scripts, which means we can automate a whole lot of things with our backups. There is both a 32-bit and a 64-bit version available.

7-Zip (a) [64] 15.14 : Copyright (c) 1999-2015 Igor Pavlov : 2015-12-31

Usage: 7za <command> [<switches>...] <archive_name>
       [<file_names>...] [<@listfiles...>]

[Tip: I highly recommend using the default 7z archive format, instead of Zip. It provides a significant disk savings over the Zip format and requires about the same time to compress at the Normal compression level.]

The scenario

Let’s assume that we have a limited budget. We can afford SQL Server Standard Edition and 1TB of off-site storage, in the form of Azure Blob Storage, Dropbox, or Google Drive, to name just a few.

We can use 7-Zip on the command line to produce an encrypted file, even if we don’t want to compress it (7-Zip can create what I call “container” files, which are simply an encryption wrapper around the file) and push that to our Dropbox sync folder.

Remember that SQL Server 2008 R2 and higher allows us to compress backups in Standard Edition. It makes very little sense to compress an already-compressed backup.

Command-line switches

I use this page as a reference for 7-Zip command line examples. Of particular interest is -mx0, (which will store, but not compress, the source file), and -pSECRET (which allows the use of a password, where SECRET is the password).

Putting the pieces together

Step 1: Back up our databases (with compression if available)

Step 2: Copy the files to a secure network location

Step 3: Use 7-Zip to encrypt (and maybe compress) these files

Step 4: Copy the encrypted files to off-site storage

There are myriad tools to perform these steps, and there’s no single correct way to do it, though I prefer PowerShell for a number of reasons. It’s modern, it is still updated, and it can interact with a number of Windows features in a consistent way, which means we can automate a lot of our everyday tasks.

For example, if we stay on the Microsoft stack with Azure, we can use PowerShell to iterate through the backup directory, encrypting each backup file, then copy it to Azure using a tool like AzCopy, or even my Azure Blob Storage Sync tool.

Summary

We don’t need the latest and greatest (and expensive) tools to implement best practices. Using 7-Zip, Dropbox and PowerShell, we can easily ensure that our databases are secure in more ways than one.

Is DBCC SHRINKFILE (filename, EMPTYFILE) fully logged?

Update: Paul Randal told me I had made a mistake in my post. The correction is below.


On Wednesday last week, Kenneth Fisher (b | t) asked: “Is DBCC SHRINKFILE (filename, EMPTYFILE) fully logged?”

Let’s buckle down and see for ourselves, shall we?

Firstly, what does DBCC SHRINKFILE do?

As we know, SQL Server allows us to reclaim space in data and log files by issuing a DBCC SHRINKFILE command. Books Online goes into some detail about each of the parameters we can use but does not seem to answer Kenneth’s question.

What does the EMPTYFILE parameter do, anyway?

Per Books Online:

EMPTYFILE

Migrates all data from the specified file to other files in the same filegroup. Because the Database Engine no longer allows data to be placed in the empty file, the file can be removed by using the ALTER DATABASE statement.

Let’s say we want to move data from one file in a filegroup to an empty file. We might do this if we want to free up a significant amount of space after a major data cleanup, but the default SHRINKFILE options would take a long time.

Fair enough, but Kenneth wants to know if it is fully logged.

How do we test this?

Let’s create a database called ShrinkFileTest, and then add a new filegroup called FGData, which will in turn contain a single data file called FGFile1.

[All of the following code was executed on a SQL Server 2014 instance.]

CREATE DATABASE [ShrinkFileTest] ON
PRIMARY (NAME = N'ShrinkFileTest',
FILENAME = N'D:\SQL2014\DI\ShrinkFileTest.mdf',
SIZE = 5120 KB, FILEGROWTH = 1024 KB),
FILEGROUP [FGData] (NAME = N'FGFile1',
FILENAME = N'D:\SQL2014\DI\FGFile1.ndf',
SIZE = 2048 MB, FILEGROWTH = 1024 MB)
LOG ON (NAME = N'ShrinkFileTest_log',
FILENAME = N'D:\SQL2014\DI\ShrinkFileTest_log.ldf',
SIZE = 1024 MB, FILEGROWTH = 1024 MB);
GO

ALTER DATABASE [ShrinkFileTest]
SET RECOVERY FULL;
GO

Later, we will add a new file to FGData and use the shrink command to move the data to the new file.

So now let’s populate the FGData filegroup with some test data. Create a table called [dbo].[Test] on the filegroup, and insert ten million rows.

use [ShrinkFileTest];
GO
CREATE TABLE [dbo].[Test] (
[Col1] INT IDENTITY(1,1) PRIMARY KEY,
[Col2] NCHAR(5) DEFAULT N'aaaaa'
) ON [FGData];
GO

-- Load in ten million rows
INSERT INTO [dbo].[Test]
SELECT TOP (10000000) 'aaaaa'
FROM sys.all_objects AS o1
CROSS JOIN sys.all_objects AS o2
CROSS JOIN sys.all_objects AS o3;
GO

We will now create a scenario where most of the rows are deleted but the data file contains a lot of free space.

DELETE FROM [dbo].[Test] WHERE [Col1] % 1000 <> 0;

We should be left with ten thousand rows in the table, at 1,000 intervals (this scenario is designed to create massive free space gaps in the data and ensure each row is on its own data page).

We want to move all the existing data into an empty file, so we can delete the file FGFile1.

Add the second file to the FGData filegroup:

USE [master]
GO
ALTER DATABASE [ShrinkFileTest] ADD FILE (
NAME = N'FGFile2',
FILENAME = N'D:\SQL2014\DI\FGFile2.ndf',
SIZE = 512 MB, FILEGROWTH = 1024 MB
) TO FILEGROUP [FGData];
GO

Before we run the DBCC SHRINKFILE command though, we should flush the transaction log of the tens of thousands of DELETEs (which are fully logged), so that it’s easier to read:

use [ShrinkFileTest];
GO
CHECKPOINT;
GO

Run the shrink command with the EMPTYFILE parameter:

DBCC SHRINKFILE (FGFile1, EMPTYFILE);

Here’s our output from the shrink:

Let’s see what is in the log:

SELECT [Current LSN], [Operation], [Context], [Page ID], [Slot ID]
FROM fn_dblog (NULL, NULL);
GO

Below the CHECKPOINT output, you will see two rows in the log file, both of which are modifying the Page Free Space (PFS) page.

Summary

Kenneth, it is fully logged. In Paul’s words:

Shrink is always fully logged – it’s impossible not to log physical changes to data files.

I hope this satisfies your curiosity.

Speaking at SQLSaturday 507 this weekend in Edmonton

On Saturday 9 April 2016, four days from today, I will be speaking at the first ever SQLSaturday in the province of Alberta.

Edmonton is the provincial capital, so it makes sense to have it there, though it’s a bit of a drive to get there from Calgary, where I live.

Fortunately, Red Deer is about halfway between the two cities, and I have a place where I can eat and buy some Rooibos tea in Gasoline Alley.

My talk will be similar to the one I gave last year at the Calgary SQL PASS User Group, The SQL Server Performance Starter Kit, which was well received by the attendees.

If you’re in the area, please come and say hello. It’ll be great fun, and you might even learn something.

You can find out more on the event’s home page: SQLSaturday #507.