- The perils of RAID
Recently, I was asked to assist an organization with getting their data back for a SQL Server that had experienced physical hard drive failure.
I asked a lot of questions about the configuration of the data layer, and it came to light that it was a fairly regular setup. SQL Server is virtualised on VMWare, with a VMDK file that contains the data drive.
This file was stored on a RAID 50 array (also known as RAID 5+0), where two drives had failed at roughly the same time. Depending on which drives in the respective logical groups had failed, there was a small chance the data was recoverable, but I was skeptical.
The representative had managed to somehow mount the drives so that he could recover the VMDK file, mount it as a drive, and pull files from it to try and recover the SQL Server database.
Coincidentally, I had recently written about recovering from catastrophic data loss, so the memory was still fresh in my mind about how the RAID files that had been copied from the disks were corrupt.
One of the most talented data recovery people I know was already working on trying to recover the database, and his description of what he’d attempted already was pointing to an unrecoverable RAID failure.
We took a look through some random files on the disk. I was looking for evidence that the VMDK file that had been copied was taken from a RAID array in a corrupt state, namely that one of the disks had failed, and that the second disk had failed during the rebuild of the array.
The easiest way to see this is to look for a JPEG or PNG file over 256 KB in size. Most RAID block sizes are multiples of 64 KB, usually 128 KB or 256 KB. Each block is split over the individual physical disks, with a parity bit, so for a particular block of data, if the RAID array has failed, you will see a corrupt image, or the image won’t be viewable at all.
There were no images that were large enough, so we looked for the next best thing: SQL Server’s ERRORLOG files. By default, there are up to six files, and each of them will usually contain more than 256 KB of data. A disaster at the RAID controller level would present as garbage data at a 64 KB offset somewhere in the file.
Sure enough, after scrolling through the first file we opened, there was a large chunk of random data in the file, running for 32 KB. I didn’t need to see any more to know that this array was not recoverable.
In this case, two disks went away in one of the RAID 5 sets at the same time. Two is effectively a catastrophic failure because there isn’t enough information, even with the parity bit distributed across drives, to rebuild one logical side of the RAID 50 stripe.
The representative was advised to send the drives away for data recovery. While I don’t expect them to have any luck in recovering the data, at least we were able to demonstrate the problem visually. If a text file looked like that, there was no chance we could be expected to safely recover a SQL Server database.
The moral of the story is to always back up your database, and make sure that it is securely copied off-site as soon as it is completed. Make sure you test that backup, and make sure you run a DBCC CHECKDB on the restored database. Use a backup checksum as well, since not even CHECKDB can catch all corruption.
Share your corruption story with me on Twitter, at @bornsql.
- Why is a value in DATETIME2 8 bytes, but in BINARY it is 9 bytes?
When trying to
binary(8), I’m getting a “binary or text data would be truncated” error. This seems really weird in context with the fact that
SELECT DATALENGTH()on a
DATETIME2value returns 8 (i.e., 8 bytes) as the result. This seems to be consistent across multiple versions of SQL Server. Has anyone come across this before? Cast to
SELECT CAST(SYSDATETIME() AS BINARY(8)) -- returns "Msg 8152, Level 16, State 17, Line 1
-- String or binary data would be truncated."
SELECT CAST(SYSDATETIME() AS BINARY(9)) -- returns valid binary value
SELECT DATA_LENGTH(SYSDATETIME()) -- returns 8
Quite a lot to take in. Let’s break this down.
DATETIME2is a data type that was introduced in SQL Server 2008. It uses up to 8 bytes to store a date and time: 3 bytes for the date component, and up to 5 bytes for the time component.
The point here is that it uses 8 bytes in total. That’s it. No more.
Jemma noted that when converting the
DATETIME2data type to
BINARY, it suddenly became longer by exactly one byte, which seems strange.
My fine readers will remember that binary is rendered as hexadecimal code when we look at it in SQL Server Management Studio, which means that a byte is represented by two hexadecimal characters stuck together.
Let’s use an example. I’m going to create a
DATETIME2(7)variable with today’s date and time, using the built-in
DECLARE @dt DATETIME2(7) = SYSUTCDATETIME();
Our result looks like this:
To show how SQL Server stores it, let’s convert it to binary and display the output. Taking Jemma’s findings into account, we’ll skip the error and jump to
SELECT CAST(@dt AS VARBINARY(25));
As expected, this is the result, in binary. Notice that it is nine bytes long, as Jemma pointed out (I’ve expanded it for legibility):
07 F7 AF 30 59 4B 5D 3D 0B
I suspected that it had something to do with the variable
TIMEportion of the data type, so I split the value into respective
TIMEin binary, to confirm that the extra byte was in the
TIMEcomponent. This required some creativity, to cast the full value to a component, and then to binary.
SELECT CAST(CAST(@dt AS DATE) AS VARBINARY(25));
SELECT CAST(CAST(@dt AS TIME(7)) AS VARBINARY(25));
Sure enough, the extra byte is in the
Notice when comparing these results to the full
DATETIME2(7)value above, that the date is stored to the right of the time value when it’s in binary format. This is likely something to do with the way SQL Server persists data to disk in Little Endian (byte-reversed) format. To SQL Server, the date is first (reading right to left), then the time, then the mystery
0x07at the end.
To be precise
While we were both trying to figure out this extra byte, I noticed that the binary value always seemed to start with a
0x07. But when I converted to
Jemma figured it out a few seconds before I did: the leading byte (the
0x07) is the precision of the
The reason that a
TIME) data type is one byte longer when converted to a binary value is because the precision is encoded directly into the value. This is to ensure no information is lost when converting between data formats.
Feel free to share your cool findings about data type conversions on Twitter, at @bornsql.
- Changes to Service Packs and Cumulative Updates for SQL Server 2017
For a few years now, Microsoft has augmented its irregular release of Service Packs with a more frequent Cumulative Update model, in order to get bug fixes and other improvements to customers faster.
With SQL Server 2017, which runs on both Linux and Windows (as well as Docker containers for Linux, Windows and macOS), the service pack model is outmoded.
Just as you now expect to see regular app updates on your mobile devices, SQL Server 2017 introduces the following rapid servicing model:
- Service Packs are gone. You will never see this nomenclature again for SQL Server. There are only Cumulative Updates (CUs). Just as before, every new CU will contain all the fixes from previous CUs, so you only need to download the latest one to be up to date. This is similar to the current model, except there won’t also be a latest Service Pack to worry about.
- For the first twelve months after the product is GA (generally available), SQL Server will have a Cumulative Update every month, containing the latest fixes and improvements.
- After the first twelve months, the release cadence will drop to quarterly for the next four years of mainstream support, unless there is an important security fix that needs to be deployed. For previous versions of SQL Server, up to and including SQL Server 2016, CUs were released every two months, so this new schedule gives more time for testing a CU once it’s released.
- Every twelve months after GA, the installation files will be updated to contain all the Cumulative Updates in what is effectively now a service pack, but won’t be called that. This will also become the slipstream update. In other words, you’re more likely to be up to date when installing from scratch, later in the release cycle.
- Customers on the GDR (General Distribution Release) release cycle will only get important security and corruption fixes, as before. You can switch to the standard CU release cadence any time, but once you do, you can’t switch back to GDR.
- You will not be required to install Cumulative Updates immediately, or at all if you don’t want to install them. This is different to previous versions where once a Service Pack was released, it made prior builds unsupported. However, it is highly recommended to update once you’ve tested the latest CU.
If you have any questions or comments about this new servicing model, look me up on Twitter at @bornsql.
- Compañero Conference and SQL Modernization Roadshow
October is a busy month for me. I am flying all over the US and Canada for speaking engagements to share some thoughts about migrating your SQL Server environment to the cloud (specifically Azure).
I will be presenting at the Compañero Conference, which takes place over two days, October 4 – 5 (that’s next week), in Norfolk VA.
It’s my first time in the mid-Atlantic region of the US, so I’m looking forward to it.
If you’re in or around the city next week, I can provide a special discount for you to attend. Let me know by direct message on Twitter at @bornsql.
SQL Modernization and the Cloud
Then, following on from a similar roadshow in January this year, I am presenting in four cities around Canada—Montreal on the 11th, Toronto on the 12th, Calgary on the 25th, and Vancouver on the 27th of October.
In this roadshow, I will be discussing moving from legacy systems to SQL Server 2016 and 2017, as well as SQL Server on Linux, migrating from Oracle to SQL Server, and of course you will get to see my famous live cloud migration demo.
Attendance to the roadshow is free. The event is sponsored by AMTRA Solutions.
I look forward to seeing you!
- Does rebuilding my clustered index also rebuild my non-clustered indexes?
I’ve been working with SQL Server for many years now, and up until recently, I assumed that rebuilding any clustered index would cause non-clustered indexes to be rebuilt as well, because the non-clustered index includes the clustered index in it.
This assumption is wrong.
On SQL Server 2000, this only used to affect non-unique clustered indexes because the uniquifier might change. I was minding SQL Server 2000 instances for a long while after it was no longer supported by Microsoft, which is why this myth stuck around in my head for so long.
On SQL Server 2005 and higher, non-clustered indexes are not rebuilt when a clustered index is rebuilt.
As my friend Gail Shaw says in this forum post (from 2011, no less!):
It’ll have absolutely no effect on the nonclustered indexes. Nonclustered indexes use the clustered index key as a ‘pointer’ and that doesn’t change in a rebuild.
- A non-clustered index is rebuilt if the clustered index is dropped and recreated. Without a clustered index, the non-clustered indexes will have to refer to the row identifier (RID) in the underlying heap instead.
- A non-clustered index is not rebuilt if a clustered index is rebuilt, on SQL Server 2005 and higher.