The final word on storage for DATETIME2

A wall of scattered pages from a book

Two years ago I wrote a post that got a lot of traction in the comments at the time. Last month there was renewed interest because one of the commenters noted that the official SQL Server documentation for DATETIME2 disagreed with my assertions, and that I was under-representing the storage requirements.

To remind you, I have been saying for years that you can use DATETIME2(3) as a drop-in replacement for DATETIME, and have better granularity (1ms versus 3ms) for 12.5% less storage (1 byte per column per row). The commenter intimated that because my statement conflicted with the documentation, that I must be wrong. As it turns out the documentation was wrong, but I also learned something new in the process!

The incorrect documentation used to say the following:

1 The first byte of a datetime2 value stores the precision of the value, which means the actual storage required for a datetime2 value is the storage size indicated in the table above plus 1 additional byte to store the precision. This makes the maximum size of a datetime2 value 9 bytes – 1 byte stores precision plus 8 bytes for data storage at maximum precision.

This is untrue — and there’s a code proof later in the post to see why — however I suspect that whoever wrote this comment was conflating the size of a DATETIME2 when cast to VARBINARY (which I wrote about here) as being the same as persisting to storage. I can’t really blame the technical writer as I’ve made this assumption myself. When I discovered that I was mistaken, it resulted in a recent series about how data is stored in SQL Server.

What this post is not about

There is a great argument to be made about sticking with DATETIME because there are functions that rely on ANSI standards that just don’t work with DATETIME2. As I noted in a recent post, the way dates are persisted by the storage engine differs radically between DATETIME and DATETIME2, so this makes sense. This post isn’t arguing against using DATETIME at all.

More importantly, I have chosen not to name the commenter for believing official documentation over my word. This isn’t personal. After all, I can’t count the number of times I have heard a variation on this theme: “Is there an official Microsoft source for this information? My IT department doesn’t want to take the word of a random blogger.”

Microsoft Docs can be edited by anyone

SQL Server Books Online were migrated to Microsoft Docs — hosted on GitHub — a few years ago, roughly around the time SQL Server 2016 was released. Over that time I have submitted numerous small corrections, especially during the writing of three technical books about SQL Server, though mostly in the realm of typos. With factually incorrect documentation I needed to be sure that I had thought about every angle. Also, my writing doesn’t match the tone of the official documentation. In other words, I wasn’t sure that my modification was 100% accurate.

So I followed Jon Skeet’s timeless post on how to answer a question on Stack Overflow (modified instead to ask one), and wrote to some folks who know a thing or two about the Microsoft Data Platform. Paul White was one of the people who responded (in fairness, I sent my email at midnight my time, so it makes sense that Paul was awake in New Zealand). And as it turns out, Paul taught me something I didn’t know about how DATETIME2 works, specifically in the realm of batch mode processing in SQL Server:

All data in a batch is represented by an eight-byte value in this particular normalized format, regardless of the underlying data type.

Granted this applies specifically to the data type in memory, not on disk, so my assertion that DATETIME2(3) takes up 7 bytes of storage was safe. However it goes to show that checking my work before submitting a correction to the documentation was the right call. Paul also reminded me that row compression changes how DATETIME2 is persisted according to a specific formula.

With that in mind, Microsoft responded the following day to say the documentation was updated with a correction, taking these facts into account.

The corrected documentation reads:

1 Provided values are for uncompressed rowstore. Use of data compression or columnstore may alter storage size for each precision. Additionally, storage size on disk and in memory may differ. For example, datetime2 values always require 8 bytes in memory when batch mode is used.

2 When a datetime2 value is cast to a varbinary value, an additional byte is added to the varbinary value to store precision.

Show your code

This was the code sample I used to explain why I thought the documentation was wrong.

In my sample code I surround my DATETIME2(3) column with two CHAR(3) columns so it’s easier to pick them out when looking at the hex.

This is what I see with the page header and the first two rows (each row starts with 0x50001100):

When I do DBCC PAGE with a detail level of 3 instead of 2, it breaks down each row nicely like so (for conciseness I’m only including the first row):

Notice the Length of 7 for d1 in the last example.

Summary

Even though I knew the documentation was wrong, I made the choice to ask other people to check my work. This resulted in me learning something new and getting an even better result for the documentation, as opposed to my narrow focus on “uncompressed rowstore” data. While that might be the default state for data in SQL Server, it isn’t inclusive, and that’s equally important.

Photo by Patrick Tomasso on Unsplash.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: