15 January 2013

Blog

  • The perils of RAID

    Recently, I was asked to assist an organization with getting their data back for a SQL Server that had experienced physical hard drive failure.

    I asked a lot of questions about the configuration of the data layer, and it came to light that it was a fairly regular setup. SQL Server is virtualised on VMWare, with a VMDK file that contains the data drive.

    This file was stored on a RAID 50 array (also known as RAID 5+0), where two drives had failed at roughly the same time. Depending on which drives in the respective logical groups had failed, there was a small chance the data was recoverable, but I was skeptical.

    The representative had managed to somehow mount the drives so that he could recover the VMDK file, mount it as a drive, and pull files from it to try and recover the SQL Server database.

    Coincidentally, I had recently written about recovering from catastrophic data loss, so the memory was still fresh in my mind about how the RAID files that had been copied from the disks were corrupt.

    One of the most talented data recovery people I know was already working on trying to recover the database, and his description of what he’d attempted already was pointing to an unrecoverable RAID failure.

    We took a look through some random files on the disk. I was looking for evidence that the VMDK file that had been copied was taken from a RAID array in a corrupt state, namely that one of the disks had failed, and that the second disk had failed during the rebuild of the array.

    The easiest way to see this is to look for a JPEG or PNG file over 256 KB in size. Most RAID block sizes are multiples of 64 KB, usually 128 KB or 256 KB. Each block is split over the individual physical disks, with a parity bit, so for a particular block of data, if the RAID array has failed, you will see a corrupt image, or the image won’t be viewable at all.

    There were no images that were large enough, so we looked for the next best thing: SQL Server’s ERRORLOG files. By default, there are up to six files, and each of them will usually contain more than 256 KB of data. A disaster at the RAID controller level would present as garbage data at a 64 KB offset somewhere in the file.

    Sure enough, after scrolling through the first file we opened, there was a large chunk of random data in the file, running for 32 KB. I didn’t need to see any more to know that this array was not recoverable.

    In this case, two disks went away in one of the RAID 5 sets at the same time. Two is effectively a catastrophic failure because there isn’t enough information, even with the parity bit distributed across drives, to rebuild one logical side of the RAID 50 stripe.

    The representative was advised to send the drives away for data recovery. While I don’t expect them to have any luck in recovering the data, at least we were able to demonstrate the problem visually. If a text file looked like that, there was no chance we could be expected to safely recover a SQL Server database.

    The moral of the story is to always back up your database, and make sure that it is securely copied off-site as soon as it is completed. Make sure you test that backup, and make sure you run a DBCC CHECKDB on the restored database. Use a backup checksum as well, since not even CHECKDB can catch all corruption.

    Share your corruption story with me on Twitter, at @bornsql.

  • Why is a value in DATETIME2 8 bytes, but in BINARY it is 9 bytes?

    In the #sqlhelp Slack channel on the SQL Server Community Slack workspace last month, Jemma Hooper asked:

    When trying to CAST or CONVERT a datetime2 value to binary(8), I’m getting a “binary or text data would be truncated” error. This seems really weird in context with the fact that SELECT DATALENGTH() on a DATETIME2 value returns 8 (i.e., 8 bytes) as the result. This seems to be consistent across multiple versions of SQL Server. Has anyone come across this before? Cast to BINARY(9) works fine.

    Sample scripts:
    SELECT CAST(SYSDATETIME() AS BINARY(8)) -- returns "Msg 8152, Level 16, State 17, Line 1
    -- String or binary data would be truncated."

    SELECT CAST(SYSDATETIME() AS BINARY(9)) -- returns valid binary value
    SELECT DATA_LENGTH(SYSDATETIME()) -- returns 8

    Quite a lot to take in. Let’s break this down.

    DATETIME2 is a data type that was introduced in SQL Server 2008. It uses up to 8 bytes to store a date and time: 3 bytes for the date component, and up to 5 bytes for the time component.

    The point here is that it uses 8 bytes in total. That’s it. No more.

    Jemma noted that when converting the DATETIME2 data type to BINARY, it suddenly became longer by exactly one byte, which seems strange.

    My fine readers will remember that binary is rendered as hexadecimal code when we look at it in SQL Server Management Studio, which means that a byte is represented by two hexadecimal characters stuck together.

    Testing, testing

    Let’s use an example. I’m going to create a DATETIME2(7) variable with today’s date and time, using the built-in SYSUTCDATETIME() function.

    DECLARE @dt DATETIME2(7) = SYSUTCDATETIME();
    SELECT @dt;

    Our result looks like this:

    2017-10-04 08:59:21.8910199

    To show how SQL Server stores it, let’s convert it to binary and display the output. Taking Jemma’s findings into account, we’ll skip the error and jump to VARBINARY(25) for safety:

    SELECT CAST(@dt AS VARBINARY(25));

    As expected, this is the result, in binary. Notice that it is nine bytes long, as Jemma pointed out (I’ve expanded it for legibility):

    Actual result: 0x07F7AF30594B5D3D0B
    Readable result: 07 F7 AF 30 59 4B 5D 3D 0B

    I suspected that it had something to do with the variable TIME portion of the data type, so I split the value into respective DATE and TIME in binary, to confirm that the extra byte was in the TIME component. This required some creativity, to cast the full value to a component, and then to binary.

    SELECT CAST(CAST(@dt AS DATE) AS VARBINARY(25));
    SELECT CAST(CAST(@dt AS TIME(7)) AS VARBINARY(25));

    Sure enough, the extra byte is in the TIME component:

    Date result: 0x5D3D0B(3 bytes)
    Time result: 0x07F7AF30594B(6 bytes)

    Notice when comparing these results to the full DATETIME2(7) value above, that the date is stored to the right of the time value when it’s in binary format. This is likely something to do with the way SQL Server persists data to disk in Little Endian (byte-reversed) format. To SQL Server, the date is first (reading right to left), then the time, then the mystery 0x07 at the end.

    To be precise

    While we were both trying to figure out this extra byte, I noticed that the binary value always seemed to start with a 0x07. But when I converted to TIME(6) and TIME(5), the 0x07 became a 0x06 and 0x05 respectively.

    TIME(7) result: 0x07F7AF30594B
    TIME(6) result: 0x06CC44EB8807
    TIME(5) result: 0x05AE53E4C000

    Jemma figured it out a few seconds before I did: the leading byte (the 0x07) is the precision of the DATETIME2 or TIME data type.

    Summary

    The reason that a DATETIME2 (or TIME) data type is one byte longer when converted to a binary value is because the precision is encoded directly into the value. This is to ensure no information is lost when converting between data formats.

    Feel free to share your cool findings about data type conversions on Twitter, at @bornsql.

  • Changes to Service Packs and Cumulative Updates for SQL Server 2017

    For a few years now, Microsoft has augmented its irregular release of Service Packs with a more frequent Cumulative Update model, in order to get bug fixes and other improvements to customers faster.

    With SQL Server 2017, which runs on both Linux and Windows (as well as Docker containers for Linux, Windows and macOS), the service pack model is outmoded.

    Just as you now expect to see regular app updates on your mobile devices, SQL Server 2017 introduces the following rapid servicing model:

    • Service Packs are gone. You will never see this nomenclature again for SQL Server. There are only Cumulative Updates (CUs). Just as before, every new CU will contain all the fixes from previous CUs, so you only need to download the latest one to be up to date. This is similar to the current model, except there won’t also be a latest Service Pack to worry about.
    • For the first twelve months after the product is GA (generally available), SQL Server will have a Cumulative Update every month, containing the latest fixes and improvements.
    • After the first twelve months, the release cadence will drop to quarterly for the next four years of mainstream support, unless there is an important security fix that needs to be deployed. For previous versions of SQL Server, up to and including SQL Server 2016, CUs were released every two months, so this new schedule gives more time for testing a CU once it’s released.
    • Every twelve months after GA, the installation files will be updated to contain all the Cumulative Updates in what is effectively now a service pack, but won’t be called that. This will also become the slipstream update. In other words, you’re more likely to be up to date when installing from scratch, later in the release cycle.
    • Customers on the GDR (General Distribution Release) release cycle will only get important security and corruption fixes, as before. You can switch to the standard CU release cadence any time, but once you do, you can’t switch back to GDR.
    • You will not be required to install Cumulative Updates immediately, or at all if you don’t want to install them. This is different to previous versions where once a Service Pack was released, it made prior builds unsupported. However, it is highly recommended to update once you’ve tested the latest CU.

    If you have any questions or comments about this new servicing model, look me up on Twitter at @bornsql.

  • Compañero Conference and SQL Modernization Roadshow

    October is a busy month for me. I am flying all over the US and Canada for speaking engagements to share some thoughts about migrating your SQL Server environment to the cloud (specifically Azure).

    Compañero Conference

    I will be presenting at the Compañero Conference, which takes place over two days, October 4 – 5 (that’s next week), in Norfolk VA.

    It’s my first time in the mid-Atlantic region of the US, so I’m looking forward to it.

    If you’re in or around the city next week, I can provide a special discount for you to attend. Let me know by direct message on Twitter at @bornsql.

    SQL Modernization and the Cloud

    Then, following on from a similar roadshow in January this year, I am presenting in four cities around Canada—Montreal on the 11th, Toronto on the 12th, Calgary on the 25th, and Vancouver on the 27th of October.

    In this roadshow, I will be discussing moving from legacy systems to SQL Server 2016 and 2017, as well as SQL Server on Linux, migrating from Oracle to SQL Server, and of course you will get to see my famous live cloud migration demo.

    Attendance to the roadshow is free. The event is sponsored by AMTRA Solutions.

    I look forward to seeing you!

    Photo by Dan Dimmock on Unsplash.

  • construction Does rebuilding my clustered index also rebuild my non-clustered indexes?

    I’ve been working with SQL Server for many years now, and up until recently, I assumed that rebuilding any clustered index would cause non-clustered indexes to be rebuilt as well, because the non-clustered index includes the clustered index in it.

    This assumption is wrong.

    On SQL Server 2000, this only used to affect non-unique clustered indexes because the uniquifier might change. I was minding SQL Server 2000 instances for a long while after it was no longer supported by Microsoft, which is why this myth stuck around in my head for so long.

    On SQL Server 2005 and higher, non-clustered indexes are not rebuilt when a clustered index is rebuilt.

    As my friend Gail Shaw says in this forum post (from 2011, no less!):

    It’ll have absolutely no effect on the nonclustered indexes. Nonclustered indexes use the clustered index key as a ‘pointer’ and that doesn’t change in a rebuild.

    To summarize:

    1. A non-clustered index is rebuilt if the clustered index is dropped and recreated. Without a clustered index, the non-clustered indexes will have to refer to the row identifier (RID) in the underlying heap instead.
    2. A non-clustered index is not rebuilt if a clustered index is rebuilt, on SQL Server 2005 and higher.

    Photo by 贝莉儿 NG on Unsplash

%d bloggers like this: