The final word on storage for DATETIME2

Two years ago I wrote a post that got a lot of traction in the comments at the time. Last month there was renewed interest because one of the commenters noted that the official SQL Server documentation for DATETIME2 disagreed with my assertions, and that I was under-representing the storage requirements.

To remind you, I have been saying for years that you can use DATETIME2(3) as a drop-in replacement for DATETIME, and have better granularity (1ms versus 3ms) for 12.5% less storage (1 byte per column per row). The commenter intimated that because my statement conflicted with the documentation, that I must be wrong. As it turns out the documentation was wrong, but I also learned something new in the process!

The incorrect documentation used to say the following:

¹ The first byte of a datetime2 value stores the precision of the value, which means the actual storage required for a datetime2 value is the storage size indicated in the table above plus 1 additional byte to store the precision. This makes the maximum size of a datetime2 value 9 bytes – 1 byte stores precision plus 8 bytes for data storage at maximum precision.

This is untrue — and there’s a code proof later in the post to see why — however I suspect that whoever wrote this comment was conflating the size of a DATETIME2 when cast to VARBINARY (which I wrote about here) as being the same as persisting to storage. I can’t really blame the technical writer as I’ve made this assumption myself. When I discovered that I was mistaken, it resulted in a recent series about how data is stored in SQL Server.

What this post is not about

There is a great argument to be made about sticking with DATETIME because there are functions that rely on ANSI standards that just don’t work with DATETIME2. As I noted in a recent post, the way dates are persisted by the storage engine differs radically between DATETIME and DATETIME2, so this makes sense. This post isn’t arguing against using DATETIME at all.

More importantly, I have chosen not to name the commenter for believing official documentation over my word. This isn’t personal. After all, I can’t count the number of times I have heard a variation on this theme: “Is there an official Microsoft source for this information? My IT department doesn’t want to take the word of a random blogger.”

Microsoft Docs can be edited by anyone

SQL Server Books Online were migrated to Microsoft Docs — hosted on GitHub — a few years ago, roughly around the time SQL Server 2016 was released. Over that time I have submitted numerous small corrections, especially during the writing of three technical books about SQL Server, though mostly in the realm of typos. With factually incorrect documentation I needed to be sure that I had thought about every angle. Also, my writing doesn’t match the tone of the official documentation. In other words, I wasn’t sure that my modification was 100% accurate.

So I followed Jon Skeet’s timeless post on how to answer a question on Stack Overflow (modified instead to ask one), and wrote to some folks who know a thing or two about the Microsoft Data Platform. Paul White was one of the people who responded (in fairness, I sent my email at midnight my time, so it makes sense that Paul was awake in New Zealand). And as it turns out, Paul taught me something I didn’t know about how DATETIME2 works, specifically in the realm of batch mode processing in SQL Server:

All data in a batch is represented by an eight-byte value in this particular normalized format, regardless of the underlying data type.

Granted this applies specifically to the data type in memory, not on disk, so my assertion that DATETIME2(3) takes up 7 bytes of storage was safe. However it goes to show that checking my work before submitting a correction to the documentation was the right call. Paul also reminded me that row compression changes how DATETIME2 is persisted according to a specific formula.

With that in mind, Microsoft responded the following day to say the documentation was updated with a correction, taking these facts into account.

The corrected documentation reads:

¹ Provided values are for uncompressed rowstore. Use of data compression or columnstore may alter storage size for each precision. Additionally, storage size on disk and in memory may differ. For example, datetime2 values always require 8 bytes in memory when batch mode is used.

² When a datetime2 value is cast to a varbinary value, an additional byte is added to the varbinary value to store precision.

Show your code

This was the code sample I used to explain why I thought the documentation was wrong.

In my sample code I surround my DATETIME2(3) column with two CHAR(3) columns so it’s easier to pick them out when looking at the hex.

USE tempdb;
GO
 
-- Create table
CREATE TABLE [dbo].[test] (
     [c1] [char](3) NOT NULL,
     [d1] [DATETIME2](3) NOT NULL,
     [c2] [char](3) NOT NULL,
)
GO
 
-- Set default values
ALTER TABLE [dbo].[test] ADD DEFAULT ('aaa') FOR [c1]
ALTER TABLE [dbo].[test] ADD DEFAULT SYSUTCDATETIME() FOR [d1]
ALTER TABLE [dbo].[test] ADD DEFAULT ('bbb') FOR [c2]
GO
 
-- Insert 100 rows
INSERT INTO dbo.test DEFAULT VALUES
GO 100
 
-- Get the pageID for our table
DBCC IND(2, 'test', 1);
 
-- Observe the hex (substitute the correct pageID here)
DBCC TRACEON(3604);
DBCC PAGE(2, 1, 2512, 2);

This is what I see with the page header and the first two rows (each row starts with 0x50001100):

Memory Dump @0x0000003F0FAF8000
 
0000003F0FAF8000:   01010000 00a00001 00000000 00001100 00000000  ..... ..............
0000003F0FAF8014:   00006400 b6000000 9011a80d d0090000 01000000  ..d.¶.....¨.Ð     ......
0000003F0FAF8028:   4d000000 800a0000 02000000 00000000 00000000  M...................
0000003F0FAF803C:   00000000 01000000 00000000 00000000 00000000  ....................
0000003F0FAF8050:   00000000 00000000 00000000 00000000 50001100  ................P...
0000003F0FAF8064:   616161dd 5d1e0179 410b6262 62030000 01000000  aaaÝ]..yA.bbb.......
0000003F0FAF8078:   0300fcff 24450000 00005000 11006161 61eb5d1e  ..üÿ$E....P...aaaë].
0000003F06FF808C:   0179410b 62626203 00000100 00000300 fcff2745  .yA.bbb.........üÿ'E
0000003F06FF80A0:   00000000 50001100 616161eb 5d1e0179 410b6262  ....P...aaaë]..yA.bb

When I do DBCC PAGE with a detail level of 3 instead of 2, it breaks down each row nicely like so (for conciseness I’m only including the first row):

Slot 0 Offset 0x60 Length 34
 
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VERSIONING_INFO
Record Size = 34                    
Memory Dump @0x0000003F0F2F8060
 
0000000000000000:   50001100 616161dd 5d1e0179 410b6262 62030000  P...aaaÝ]..yA.bbb...
0000000000000014:   01000000 0300fcff 24450000 0000               ......üÿ$E....
 
Version Information = 
     Transaction Timestamp: 17700
     Version Pointer: Null
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
 
c1 = aaa                            
 
Slot 0 Column 2 Offset 0x7 Length 7 Length (physical) 7
 
d1 = 2020-08-21 05:12:47.325        
 
Slot 0 Column 3 Offset 0xe Length 3 Length (physical) 3
 
c2 = bbb

Notice the Length of 7 for d1 in the last example.

Summary

Even though I knew the documentation was wrong, I made the choice to ask other people to check my work. This resulted in me learning something new and getting an even better result for the documentation, as opposed to my narrow focus on “uncompressed rowstore” data. While that might be the default state for data in SQL Server, it isn’t inclusive, and that’s equally important.

Photo by Patrick Tomasso on Unsplash.