Remote Storage Sync and Restore updated

Recently, I presented my “Back up and Restore your Database with Azure Blob Storage” session to the Bellingham PASS Chapter.

There are two components to the Azure Blob Storage Sync and Restore utility: Sync and Restore.

I wrote the Sync tool to work around difficulties I’d experienced with Azure Blob Storage’s Infinite Lease, and limitations in AzCopy. Lately these problems have been made mostly redundant, but the Restore tool remains extremely valuable, and I use it regularly for testing backups.

I’m happy to announce that new versions of both the sync and restore components have been released today, and they have a new name.

Both include some new features, which I’d like to highlight below, mostly to the Restore tool.

New Features

Point-In-Time Restore

During the Bellingham session, one of the questions asked was whether the Restore tool could perform a point-in-time restore using the RESTORE ... STOPAT keyword.

You can now perform a point-in-time restore using the Restore tool, and it will only download the files from the latest backup that it needs, up to and including that point in time.

Note: This feature only works on the latest available backup. A feature enhancement will be included in a later release to perform a point-in-time restore for any backup, if the requisite files exist.

Download Indicator

It’s handy to know how fast a file is downloading, but there aren’t easy ways to do this on a command-line tool, and it can get confusing in a disaster recovery scenario if you don’t know if the download is stuck.

AzureBlobStorageRestore will now show your download progress, using a rotating cursor similar to one I first saw on the MS-DOS version of Doctor Solomon’s Antivirus.

Big shout-out to Daniel Wolf for his ProgressBar gist (available free under the MIT License).

Sort Order

Previously, the Restore tool would download files from smallest to largest, but this could be disconcerting in a disaster recovery scenario as it might look like files are missing.

Now files can be downloaded in alphabetical order, or the default of smallest to largest.

One More Thing

There’s a lot of emphasis on off-site storage these days, and not everyone uses Azure Blob Storage.

For example, one of my customers has a 5TB file share, mapped using UNC (in the format \\server\share) and wants to copy files there.

What if you already keep tape backups, managed by a company that takes your tapes off-site in a fireproof lockbox, and all you need to do is get the files to a network share to be backed up to tape?

What if you just want to copy your database backup files to a local network share as well as Azure Blob Storage?

What if you know you should be moving your backup files to a network share but keep forgetting to do so?

More importantly, what if your backup files are already on a local network share and you have a disaster recovery situation. Downloading from Azure Blob Storage will take too long, and you just want to restore the latest backup from a local network share.

Introducing File Storage Sync and Restore.

FileStorageSync

Like its older sibling, FileStorageSync synchronises the contents of a local directory with a network share using a UNC path (in the format \\server\share). You can provide a username and password as well, which is convenient for creating scheduled tasks. All other features from AzureBlobStorageSync are included.

FileStorageRestore

Like its older sibling, FileStorageRestore parses all the files in a UNC share, and downloads only the the latest full, differential and transaction log backup files required to perform a restore of your database.

It also includes Point-In-Time Restore (for the latest backup) and Sort Order customisation.

A download indicator is not provided in this version because local network speeds are usually so quick that it would add unnecessary overhead.

All other features from AzureBlobStorageRestore are included.

Note: A future enhancement which restores the latest backup from the UNC network share, without copying the files first, is coming soon. You will be able to restore your database as fast as possible.


If you want to play with the new features, you can find them in the GitHub repository.

This project is released free under the MIT License (free for commercial and non-commercial use), and all you have to do is give me credit.

Find me on Twitter at @bornsql if you have any thoughts. Feature requests can be submitted on the GitHub repository itself.

Azure Storage Throughput

Hot on the heels of DTUs from earlier this week, today we’re going to switch focus a little, while staying with Azure, and have a quick look at storage speeds on SQL Server running on Azure Virtual Machines (Infrastructure-as-a-Service).

Microsoft recently lowered their prices for more popular VMs, which is making this an attractive solution for a lot of folks who don’t want to run their servers on-site.

However, there’s a limit to how fast our storage can get, despite the attraction of solid-state drives on the Premium Storage offering.

We in the industry refer to “latency” when discussing storage. This is measured in seconds per MB (s/MB), or how long it takes to access 1 MB of data. Target latency for SQL Server is 5 ms to 10 ms. In other words, the best case scenario for SQL Server is that we should be able to access 1 MB of data in between 5- and 10-thousandths of a second.

Everything on Azure is virtualised, from the CPU and RAM, down to the storage layer. Behind the scenes, all of these components are connected by what Microsoft calls the “fabric layer”. In simpler terms, it’s the network, stupid.

Yes, it has a low latency. Yes, a customer of mine did save almost 50% of their budget by switching from a data centre to Azure, and yes, they did get a nice performance boost in the process.

However, there’s a limit, and that is the theoretical limit on network throughput. The absolute best case for any local network these days is 10 Gb/s, or in other words, 1.25 GB per second (there are 8 bits in a byte, so we have to divide the bits by eight to get the bytes per second).

In realistic terms though, the connection between our virtual machine and the SSD premium storage, depending on our region and storage tier, is more likely to be 1 Gbps, or 125 MB per second. It’s not really clear from the Azure Documentation what the actual limits are, but I’ve personally seen it max out at 125 MB/s. The region we host in matters a great deal.

Inverting that 125 MB/s figure to get the latency, we’re seeing 8 ms at best. That’s still within the window of 5–10 ms, so it’s not terrible. Now add in the fact that we’re not the only person using that underlying hardware, and the latency starts creeping up. There will be competition on the VM host, the network card, the storage, and it all adds up. Pretty soon we’re looking at 20–30 ms of latency, which is now more than twice as long as the preferred amount.

No matter what the CPU speed is, or how much RAM we have, our bottleneck is always going to be the network. Those of us who have experience with older SANs (storage area network) will recognise this. Even having the fastest spindles (hard drives) in the SAN is meaningless when we’re connected to the server with a 1 Gbps half-duplex network card (no sending traffic is allowed while receiving traffic).

As I say on this blog and in person, Azure is not the answer for everyone (whether we’re looking at SQL Server on an Azure VM or an Azure SQL Database). We need to make sure that we understand the inherent limits of the infrastructure before committing to a long-term project to migrate everything.

I’m not saying Azure isn’t a good idea. I just pointed out above that a client I migrated recently was very happy with their outcome. Their data centre provider offering was terrible, and Azure was a great improvement. I personally run a production system on SQL Server, running on a small Azure VM.

20–30ms might be twice as fast as your current solution. If your database is running on 7200 rpm SATA drives in a RAID 5 configuration, you’ll see a massive increase in performance by switching to an Azure VM with P30 Premium Storage.

Here are some fantastic resources to help you plan your migration better:

It all comes back to the question: Are you sure?

Let me know your Azure migration plans on Twitter, at @bornsql . I look forward to hearing from you.

Importing a BACPAC into Azure SQL Database

In our final look at migrating a database to Azure SQL Database, we will import a BACPAC file.

From previous weeks we know that a BACPAC file contains the full schema, as well as the data (in BCP format), of the database we are migrating. It is based on the structure of a DACPAC file, which is created when we export a Data-tier Application.

This step is also surprisingly easy, because it uses knowledge we have gained in other blog posts.

To import the BACPAC, the file must be uploaded to Azure Blob Storage. We can either use AzCopy or the AzureBlobStorageSync tool to upload the file to a standard container.

Once the file is uploaded, we can use PowerShell or the Azure Portal. For demonstration purposes, I will use the Portal website.

Upload the BACPAC

Using AzCopy, copy the file to a container. Remember that we cannot use Premium Storage for this file, but that works out fine because it’s much cheaper to use Standard Blob Storage anyway.

For example:

AzCopy /Source:D:\Temp\bacpac /Dest:https://myaccount.blob.core.windows.net/mycontainer /DestKey:key /Z:D:\_Ops\AzCopy /Pattern:"sourcefile.bacpac"

(Refer to the previous blog post about AzCopy to understand what these switches mean.)

Importing the BACPAC

With the Azure Portal open, navigate to our SQL server. Notice the Import database option on the top menu.

Screenshot 2016-08-21 22.51.13

Similarly to the Management Studio wizard, we need to input some information to get the database created and populated.

Screenshot 2016-08-21 23.01.22

In the Storage configuration, navigate to the container where we just uploaded that BACPAC file, and select it. The rest is simple. Once you click on “OK”, the database will be created.

Conclusion

And there you have it. Two ways to migrate a SQL Server database from on-premises to Azure SQL Database.

I presented a session to the Calgary PASS Chapter last week on this topic and discovered that network speed matters a great deal. I recommend using the two-part method, via a BACPAC file, so that you can control the process and not have to rely on slow network connections. AzCopy is at least predictable.

If you have any questions or comments, you can find me on Twitter as usual, at @bornsql .

Moving a database to Azure SQL Database

This week we will actually move a database into Azure SQL Database, using the first of two of Microsoft’s recommended methods.

The main thing to keep in mind is that SQL Server (on-premises, or “earthed”) and Azure SQL Database (“cloud”) are not the same product. They support the same schemas and data, and allow the same scripts and applications to run, but they are not the same thing.

This is important, because when we migrate to Azure SQL Database from an on-premises SQL Server environment, we can’t simply do a backup and restore of our database.

The simple reason is that there is no file system access in Azure SQL Database. We won’t be able to see a backup file to restore it.

The more complicated reason is that we don’t have access to any instance-level features, including server configuration, tempdb access, and so on, and therefore our databases must be self-contained.

The recommended method to migrate a database to Azure SQL Database is the Migration Wizard in SQL Server Management Studio.

For larger databases, we can create an extract first, which can be imported later using the necessary tools.

Under the covers, the migration is using SQL Server Data Tools to extract a Data-tier Application (DAC). This acronym is not to be confused with Dedicated Administrator Connection.

According to MSDN (emphasis added):

A data-tier application (DAC) is a logical database management entity that defines all of the SQL Server objects – like tables, views, and instance objects including logins – associated with a user’s database. A DAC is a self-contained unit of SQL Server database deployment that enables data-tier developers and database administrators to package SQL Server objects into a portable artifact called a DAC package, also known as a DACPAC.

A BACPAC is a related artifact that encapsulates the database schema as well as the data stored in the database.

From this, we can establish that we are creating a BACPAC file, containing the schema in XML format, a few metadata files with version information, and of course the data itself, in BCP (Bulk Copy) format. This is then compressed (using standard PK-ZIP format) in order to save space.

Anyone who has used other products that create database dump files will be familiar with this process.

The BCP file format is not human-readable, unlike, say, MySQL’s dump files, and if we look at the MSDN article on format files, it will make our heads spin.

That said, I was able to pick out string values from the BCP file in my test that corresponded with row values in the source database table.

To be on the safe side, I like to break up the process into two stages, so that if the second part fails, we can retry it at our convenience:

  • Create the BACPAC file locally and verify it;
  • Import the BACPAC file into Azure SQL Database.

If the database is small enough, we use the Migration Wizard inside the latest version of SQL Server Management Studio and do both steps in one go. Let’s do that now.

(The following images are copyright © 2016 Microsoft Corporation, retrieved from the official documentation pages.)

Inside SQL Server Management Studio, right-click on the database we want to migrate. In this case, it’s [AdventureWorks2012]. Click Tasks > Deploy Database to Microsoft Azure SQL Database….

migrateusingdeploymentwizard01

Here we specify the Server connection of our Azure SQL account, by clicking on the Connect… button:

migrateusingdeploymentwizard02

This is where we connect to the Azure SQL account. Since this migration wizard is doing all of the heavy lifting, we do not need to create a database beforehand.

migrateusingdeploymentwizard03

Now that we have connected to our Azure account, we can create the database on this screen. Remember the different editions and their associated costs, so we won’t pick the $7000 per month option unless we can afford it.

migrateusingdeploymentwizard04

Once we start the migration, it can take a while, and it may even fail due to compatibility issues, which have to be resolved.

We briefly touched on these last week, and they must be resolved before migrating. We would use the latest SQL Server Data Tools to identify errors or warnings.

But because we’ve done our compatibility homework, everything works the first time.

It’s really that simple.

If the database is large, or the network connection is unstable or slow, we will have to do the import from our BACPAC file or using BCP directly. That will be covered in next week’s post.

If you have any questions or comments about this process, feel free to ask me on Twitter at @bornsql .

What is physical storage anyway?

My friend Jason asked me about his MySQL database export file last week, whether it was sufficient to create a new database. I replied saying that he would have to set up the database separately, because the physical files are allocated in a particular way depending on the web host.

What is a physical file, though? Kenneth Fisher, that crazy cat from SQL Studies, reminded me on Twitter that a physical file, or table, or database, isn’t actually physical. Hard drives are mostly virtualised, especially with web servers, and nothing is physical anymore. Even more confusingly, hard drives are gradually replaced by electric circuits and solid state memory technology that does something called wear-levelling.

When we speak of physical files, or databases, or tables, we’re referring to the way these files are allocated on a storage device. In the old days, pre-2000, it was possible on some operating systems and file systems to allocate the fastest part of the spinning platter of a hard drive (the inside tracks) to processes that needed low seek times, like Relational Database Management Systems (RDBMS).

Nowadays with virtualised storage, including network storage and cloud storage, the physical aspect is meaningless. Spindles are what matter. But what’s a spindle? Until recently, storage was provided by spinning electromagnetic platters called hard drives. The spindle in the center is what we refer to, and the more drives allocated to a storage device, the better.

The problem with spinning drives is that they have a built-in delay for the read-head to locate the correct area on the platter surface, which is referred to as latency. Adding more spindles (more drives) to an array lowers the latency, because files are striped across multiple disks (RAID).

For example, in a three-disk configuration, one file could effectively be split into three parts. That improved read times, and therefore latency, because the file system could look for different parts of the file on different underlying platters, effectively reducing seek times by one-third.

To complicate things even more, solid-state storage, which comes in many forms, but most commonly as USB thumb drives, camera cards (often called memory cards, which is a misnomer), and solid-state drives (SSDs), are gradually replacing hard drives in devices for network and cloud storage. SSDs have no latency because there’s no physical platter to spin up and no read-head to position over a particular spot on the surface. The cells that store the circuits that contain the bits of data are accessible instantly.

However, this speed improvement has a trade-off. Solid state storage has a limited lifespan. The cells can only be written to a certain number of times before failing. The drive controller has to move data all the time, called wear-levelling, to ensure that the blocks of storage are used evenly, thereby extending the life of a drive. Where a file is physically located on the drive, can change at any given moment.

Virtual storage in cloud and network storage devices already muddied the waters with sharing data between virtual machines. Solid-state drives make it even more abstract.

A physical database might refer to the server it’s running on or a virtual machine that is running on a physical server. A physical file might refer to a collection of bits on a single spinning disk drive or SSD, or a collection of bits that are located on several hundred spinning disks.

We have reached a point where a physical storage paradigm is meaningless. When we talk about the physical file on a disk, we mean the abstract representation of the file system’s knowledge of where that file is stored at that point in time. Ultimately, though, it doesn’t mean what it used to.

If you would like to commiserate the loss of high-failure spinning rust, find me on Twitter at @bornsql .

AzCopy 101 – Copying SQL Server backup files to Azure Blob Storage

SQL Server 2016, released last month, now lets us back up our database files to two locations simultaneously. This is very good news if we’re running the latest and greatest (and it really is a good upgrade).

For everyone else, the bad news is that we are targeting an on-site destination for our backups (local network, SAN, NAS, tape) and using some form of automation to push these backups offsite. One of my budget-conscious customers uses Dropbox.

Since the release of Azure Cool Blob Storage, at a third of the price of typical Blob Storage, we now have a cost-effective off-site archive that is competitive with Amazon Glacier, and remains in the Microsoft family.

We can either use my Azure Blob Storage Sync tool to push our database backups to Azure Blob Storage, or if there is a policy to use Microsoft-only tools, we can use AzCopy.

Many tutorials already exist for how to use AzCopy, including the official documentation. This post is more about gotchas that I’ve discovered while working extensively with the tool.

Keys and Containers

Azure Blob Storage stores its blobs (we can think of them as files) in a container, which is addressable by a URL. When we first set up our storage account, we need to create a container to store the files. The path to that container would look something like this:

https://myaccount.blob.core.windows.net/mycontainer

A key is then needed to access that container in that storage account. It is a long string of characters, with == at the end, which looks something like this:

Zx7uEPwA6MfJ5MXML0MYUqc8k78lYYCvq7h+lG0grumpyMG1TvEpp931SQLXWpoZWfgItEzhvWnKzy9RKGTYfA==

In the examples below, I use the word key to represent this key.

Command Line

We can copy files to and from Azure Blob Storage using the following command.

AzCopy /Source: /Dest: [Options]

If we’re copying files to Azure, the /Source switch would have an on-premises path, and the /Dest switch would have a container path. The /DestKey must then be specified.

(If we were copying from Azure, the Source and Destination would be reversed, and the SourceKey would be specified.)

For example, let’s copy files from my local drive to an Azure storage container:

AzCopy /Source:D:\SQLData\Backup /Dest:https://myaccount.blob.core.windows.net/mycontainer /DestKey:key /S /XO /Z:D:\_Ops\AzCopy

Notes:
/Source – my local hard drive
/Dest – the Azure storage container URL
/DestKey – the secure key to access the container
/S – recurses through all the subfolders in the source location
/XO – ignores all older files that already exist
/Z – specify a fixed path for the journal

The Nitty-Gritty

Don’t Run Multiple Copies

If AzCopy is already running, don’t run it again. This is strongly advised by the official AzCopy documentation.

The reason is that it makes the best use of available resources, or in more practical terms, it’s thrashing the disk and CPU.

Running multiple copies of AzCopy simultaneously will render your machine almost unusable, and it will probably break the journal anyway.

Journal

The biggest headache working with AzCopy is that it uses a journal to keep track of your progress during a copy operation.

The journal normally lives in %LocalAppData%\Microsoft\Azure\AzCopy. This becomes a problem when automating the AzCopy process. Usually we run automated tasks under a service account, which means that we don’t monitor the task until it fails.

Unpredictable journal behaviour

Unfortunately, AzCopy does not fail if a previous copy operation was not completed. Instead, it will wait with a Y or N prompt, which never comes, because the task is running in a service account context.

AzCopy provides the option to use /Y to suppress confirmation prompts.

There are two main reasons why the journal will await a confirmation:

  • The previous operation failed, and pressing Y will allow it to resume where it stopped;
  • The current operation is similar to a previous operation, and pressing Y will overwrite the journal and start over.

As we can see, these are vastly different results. Therefore I do not recommend using the /Y switch.

If for whatever reason AzCopy is interrupted (usually network-related issues), the journal will have to be cleared out manually.

Note that the %LocalAppData% path is user-specific. I would recommend specifying a journal path with the /Z switch, which allows us to set a default location.

Ignore Older Files

Like Xcopy and Robocopy, we can skip older files, which is useful when archiving transaction log backups every 15 minutes.

Use the /XO switch to only copy new files, and ignore the ones that already exist.

Summary

AzCopy is a great tool, but it has a lot of moving parts and can cause headaches if we don’t manage the journal correctly.

If you would like to continue the discussion, or ask me questions, look me up on Twitter at @bornsql.

Forecast: Azure Skies Over the Amazon

Microsoft Azure and Amazon Web Services are taking over.

Last week, a client approached me to migrate their complex environment in two phases from two physical data centers, first to Azure VMs, and ultimately to an Azure SQL Database, doing away with logical servers completely.

In my mind, this trend is only going to accelerate. There are obvious exclusions that I won’t go into around privacy, large data and so on, but in general, I see a mass migration to “the Cloud”.

The primary motivator for virtualisation has always been cost. For example, Azure Blob Storage offers fully redundant global replication, where your data is copied to two other centers in other parts of the world, for a very small price bump and zero additional administration. In a traditional hosted environment, you would be looking at a significant outlay for ensuring global redundancy, not to mention maintaining and administering it.

A trade-off of virtualisation is performance. Virtual machines, by their nature, incur at least a 5% resource overhead, and if set up incorrectly or over-provisioned, can cut performance by as much as 50%. That is slowly changing as we become wise to the needs of SQL Server in a virtual environment, and Microsoft is driving the recommendations.

Licensing is also changing as a result of the Cloud. When setting up a traditional environment, software licenses can easily account for 80% or more of your initial outlay, depending on number of CPU cores, SQL Server edition, and so on.

With Azure and Amazon, just as you would pay a monthly fee for a Windows VM, you can now pay an additional monthly fee for a SQL Server license, that (depending on configuration) could take up to six years to become more expensive than buying a SQL Server Enterprise license outright. Plus, in those six years, you can get the latest version of the product as quickly as it takes to spin up a new virtual machine.

An operational expense like that makes developing for a SQL Server platform very attractive for developers, especially for companies that don’t necessarily have capital to pay hundreds of thousands of dollars on licensing.

It behooves us, as data professionals and consultants, to understand these risk factors, to ensure that our customers can get the most bang for their buck. If they want performance, and their budget is limited, Azure virtual machines, or even database-as-a-service, might be the best thing for them.

Yes, the Cloud is just someone else’s data center, but Microsoft and Amazon are lowering the barrier to entry for a number of businesses, large and small, to scale up their environment and help grow their businesses. It would be foolish to dismiss it out of hand.

If I ran a large data center, I’d be concerned. Take me to the clouds above.

Update to Azure Blob Storage Sync and Restore

Blob Storage Sync tool updated

During a SQL Server migration this month, I found some inconsistencies in my Azure Blob Storage Sync tool, so I made several improvements, and fixed an outstanding bug.

As you know, it relies on the naming convention provided in Ola Hallengren’s Maintenance Solution and comes in two parts: the AzureBlobStorageSync command-line application, and the AzureBlobStorageRestore command-line application.

New Features

I realised that it was not possible, using this tool, to download every file from the Blob Storage Container in one go. The code only downloaded the files necessary to perform the latest restore for a single database.

To resolve this, and allow all files to be downloaded from Blob Storage, I have added a new configuration key called DownloadFilesFromAzure, which takes True or False values.

Another new feature is an explicit option to upload all local files to Blob Storage during a sync. Previously, it was implied that all local files in the LocalPath should be uploaded, but you may not want to do that. This is implemented as configuration key CopyFilesToAzure, which takes True or False values.

Deleting Files

There are now two ways to delete files from a Blob Storage Container:

  • Files that do not match the source, that must be deleted off the target;
  • Files that are no longer needed on the target.

The first option is a typical synchronisation feature and was implicit in the previous version. It is now implemented using the configuration key DeleteMissingFilesFromAzure, which takes a True or False value. If it is set to True, files that do not exist on the local drive will be deleted from Blob Storage.

The second option is for deleting files that match a certain string in the file name. This is handy for server migrations where the file names generated by Ola’s backup script contain the old server name. While the backup script can perform cleanup tasks based on timestamps, it will ignore files that have a different server name, and you might be left with orphaned files long after the backup retention window has passed.

The configuration key, called DeleteExplicitFilesFromAzure takes True or False values, plus an additional configuration key, called ExplicitFilesToDeleteMatchingString. Here you can put a string containing the old server name, and any matching file with that name will be deleted. This particular feature only works with one string at a time. You will have to run it more than once if you need to delete file names that match other strings.

Fixes

Yes, the only logged issue in the repository has been fixed! I now use sp_executesql instead of EXEC for the T-SQL portion of the restore tool. This was probably the easiest thing to fix.

A more critical fix, and the main reason for this blog post and tool update, is to do with downloading files.

Imagine an all-too-common scenario where a download fails before it is complete. In an emergency, the last thing you need is your Restore tool failing. Whereas before I was simply using the name of the file, I now also check file size as well. If the file sizes do not match, the file will be downloaded from Blob Storage again.

Files now download from smallest to largest in size. This is a major benefit if you have a lot of small transaction log files.

Notes

Reading the list of files from Blob Storage takes about ten seconds for 2,500 files, before parsing can begin. This is not a lot of time, but it’s something to keep in mind.

Feature Requests and Bug Reports

If you have any feature requests, or have found a bug, please log that on the GitHub repository. Even better, if you know C#, you can add your own features and fix your own bugs, submit a pull request, and I’ll merge your code into the tool.

Questions, comments or complaints? Twitter is the place: @bornsql is where you’ll find me.

Pre-Announcement Announcement

On Saturday, 27 June 2015, I will be presenting at SQLSaturday #407 in Vancouver BC. The session is titled “Restoring a SQL Server Database from Azure Blob Storage”.

I will be releasing some new content and code at the same time as my talk, which will be available from this site, assuming WordPress plays along.

Once the content is available, you’ll understand why I’ve been so quiet on the blog.

Stay tuned.