I’m here for the small organizations, the shops that can’t afford expensive solutions to maintain their environments. I’m here for them because that’s me: the one-person consultancy. I’ve built stuff that’s useful to me, and then made it available for free on GitHub.
My first SQL Saturday session, way back in 2015, was the public launch of a free tool I wrote to keep copies of SQL Server backups offsite in Azure Blob Storage. It leverages the strong naming convention from Ola Hallengren’s Maintenance Solution, so if you use that, this is a very handy addition to your toolkit.
I wrote it because AzCopy was weak and inconsistent. It was fragile, needing constant attention and monitoring in case a journalling file got stuck. Also, AzCopy didn’t keep files in sync. If a file was deleted locally (as part of a cleanup to delete old backups), AzCopy was unable to delete files remotely, so it was messy to maintain files in Blob Storage containers. The uploader was written to keep files in sync, and not have to fuss with AzCopy.
The real value of this tool though, is being able to recover the latest backup files (full, differential and transaction logs where available) which are needed to recover from a catastrophic failure. Without any knowledge of the backups, just knowing the database name, it can parse the list of files in Azure, download the necessary ones to recover, and build a T-SQL script to restore them. Literally all you need to do is run the downloader, then run the restore script.
Over time I’ve made some changes to the toolset. It now also supports remote file storage using UNC paths, so that you can copy the files to a network share as well as Blob Storage. You know, for local backups, or if your Internet connection goes down. You can never have enough backups.
However, one thing that has always bothered me was that it was slow to download the files again. The most dramatic impact was with full backup files. While I used 1MB blocks and downloaded them the way the sample code suggested, I knew that this was inefficient because it was not taking advantage of parallel processing.
And then Microsoft released AzCopy v10, which is still in preview as I write this, and wow is it ever better. For starters, it can keep files in sync between local and remote storage, so when you delete locally, that file will be removed remotely. This is a key improvement that has rendered one half of my toolset redundant. And that’s a good thing.
It’s a good thing because I can comfortably recommend AzCopy v10 — maintained and supported by Microsoft — with its own source code repository on GitHub, to get your SQL Server files into Blob Storage as fast as possible. It’s fast because AzCopy uses parallel processing. If you’re not careful, it will saturate your network connection and consume lots of threads (as documented).
Since the real value in my Azure Blob Storage Sync tool is the downloader which generates a T-SQL file, I wondered recently if I could get that same parallel copy feature to download files faster, just like AzCopy does. Not only would it download just the files you need to recover the latest database backup and then generate the T-SQL script to restore that database, but it could download the files at the same speed as AzCopy. That’s a win-win in my book.
At the end of March 2019, I released version 1.3.0 of the AzureBlobStorageSync and AzureBlobStorageRestore toolset (download link) which includes this change. For full transparency, I based it off code that was posted in an answer on StackOverflow, written by Bruce Chen. Open source helping open source.
Keeping in mind that the value is in the download tool, I have not made any code changes to the uploader, but it is there if you want it. The toolset requires .NET Framework 4.7.2 now (previously 4.5), and all third-party dependencies have been updated (using NuGet, of course).
One important note: version 1.3.0 has a hard-coded limit of 10 parallel threads for downloading files. I picked ten because it’s ten times faster than it used to be. I don’t have any plans to make this value configurable because ten is a good balance to avoid saturating network connections and consuming CPU threads, however, the code is available under the MIT licence so you can fork it and make that change yourself if you like. You can even do a pull request with changes, and I’ll strongly consider merging those changes back into my code. That’s the beauty of open source software.
Along with the recent release of my sql2xls tool earlier in March 2019, I’m happy to be able to contribute something to the data platform community, leveraging open source solutions that folks are already using. If it makes just one person’s life easier, then I’m happy.
Share your community contributions in the comments below.
JFYI, have been using rclone.exe to sync backups to whatever cloud storage comes cheapest (currently backblaze).
cheers,
Comments are closed.