This post dives into how SQL Server stores date and time data types in memory and on disk.
But first, a note about endianness: CPUs manufactured by Intel and other vendors process binary values in reverse order (known as little-endian), with the least significant byte in a binary value first, and the most significant (i.e. the first byte) last. There are technical and historical reasons for this which are not important for this post, but it does mean that when you want to read values back you will have to flip the order of bytes to use a hex calculator.
Dates
DATE
is the byte-reversed number of days since the year 0001-01-01, stored as three bytes. It goes up to 9999-12-31, which is stored as 0xDAB937
. You can check this value by reversing the bytes and sticking them into a hex calculator. 37 B9 DA
equals 3,652,058, which is the number of days since 0001-01-01.
If you try to cast 0xDBB937
as a DATE
value (by incrementing the least significant bit DA
by 1), it will throw a conversion error. There is obviously some overflow detection that protects against corruption in a date type.
Times
TIME
works slightly differently, as noted in a previous blog post. It can store the number of fractions of a second (known as the precision) — with a length ranging from 0 to 7 — from midnight up to and including the final fraction of a second before midnight. In normal operations, the precision is stored in the column metadata, and you will not see it if you look at the table directly. However, if you cast the value to VARBINARY
for example, the precision is prefixed as a single byte before the value, ranging from 0x00
to 0x07
.
There are 1 billion (1,000,000,000) nanoseconds in a single second, and TIME(7)
has a granularity of 100 nanoseconds, or ten million possible values per second. In a 24-hour period there are 864,000,000,000 possible values, so if you don’t need that kind of granularity you might be better off choosing a smaller precision.
Keep in mind that more than one length can be stored in the same number of bytes, as shown below:
TIME(7)
– 5 bytes – is the byte-reversed number of tenths of a microsecond (0.0000001
)TIME(6)
– 5 bytes – is the byte-reversed number of microseconds (0.000001
)TIME(5)
– 5 bytes – is the byte-reversed number of tens of microseconds (0.00001
)TIME(4)
– 4 bytes – is the byte-reversed number of tenths of a millisecond (0.0001
)TIME(3)
– 4 bytes – is the byte-reversed number of milliseconds (0.001
)TIME(2)
– 3 bytes – is the byte-reversed number of tens of milliseconds (0.01
)TIME(1)
– 3 bytes – is the byte-reversed number of tenths of a second (0.1
)TIME(0)
– 3 bytes – is the byte-reversed number of seconds (1
)
Dates and times
We can think of DATETIME2(n)
as DATE
and TIME
jammed together in the same field (though the time value is first because of endianness). The precision is again encoded in the value as a prefix if you cast it as binary. In the table itself, the prefix is excluded because the column metadata already contains the precision.
Let’s look at today’s date of April 22, 2020, at 5 minutes after 10 in the morning. We can calculate that it is the 737,536th day since January 1, 0001, and approximately 36,310 seconds have passed. Depending on the precision, the value would look something like this (remember the time is on the left, and the date is on the right):
Date | Stored binary | Date binary value | Time binary value |
---|---|---|---|
2020-04-22 10:05:09.3427651 |
0xC3050E8A5400410B |
0B 41 00 737,536 |
54 8A 0E 05 C3 363,093,427,651 |
2020-04-22 10:05:09.342765 |
0x2D9A34740800410B |
0B 41 00 737,536 |
08 74 34 9A 2D 36,309,342,765 |
2020-04-22 10:05:09.34277 |
0x05A96BD80000410B |
0B 41 00 737,536 |
00 D8 6B A9 05 3,630,934,277 |
2020-04-22 10:05:09.3428 |
0xB45DA41500410B |
0B 41 00 737,536 |
15 A4 5D B4 363,093,428 |
2020-04-22 10:05:09.343 |
0x5F092A0200410B |
0B 41 00 737,536 |
02 2A 09 5F 36,309,343 |
2020-04-22 10:05:09.34 |
0x56673700410B |
0B 41 00 737,536 |
37 67 56 3,630,934 |
2020-04-22 10:05:09.3 |
0x558A0500410B |
0B 41 00 737,536 |
05 8A 55 363,093 |
2020-04-22 10:05:09 |
0xD58D0000410B |
0B 41 00 737,536 |
00 8D D5 36,309 |
What about DATETIME?
This is a curious data type, because despite taking up eight bytes like DATETIME2(7)
, it has a granularity of just 3 to 4 milliseconds, which means it will round up (or down) any input value to the nearest 0, 3, or 7 milliseconds. How is this data type stored, and why is it so inefficient when compared to DATETIME2(n)
? And, why does the range for a DATETIME
data type start in the year 1753?
The answers to these questions may surprise you. For starters, the value is not byte-swapped. The first four bytes of DATETIME
represent the date, and the second four bytes represent the time. The “starting point,” or the place where the binary is 0x0000000000000000
, is midnight on January 1st, 1900. For any date after 1900-01-01, the date portion of the binary is incremented by one. Using our date of 2020-04-22 as an example, we can calculate that 43,941 days have passed since 1900-01-01, and converting that to hex gives us 0xABA5
.
Indeed, the statement SELECT CAST(0x0000ABA500000000 AS DATETIME)
returns 2020-04-22 00:00:00.000
as expected.
So what about dates prior to 1900? This is where it gets interesting. One day prior, namely midnight on December 31st, 1899, has a binary value of 0xFFFFFFFF00000000
. That’s the maximum possible value for a four-byte value, and equates to around 2 billion possible values. From what I remember in my 1992 high school computer science class, this is a binary arithmetic operation called two’s complement. This technique is used for calculating positive and negative values, so if 0 is our starting point (0x00000000
), -1 would be the maximum possible value for the number of bytes, which in this case is 0xFFFFFFFF
.
Two days prior to 1900-01-01 would then be 0xFFFFFFFE
, and sure enough, the statement SELECT CAST(0xFFFFFFFE00000000 AS DATETIME)
returns 1899-12-30 00:00:00.000
as expected. And now, because the DATETIME
data type only goes back as far as the year 1753, we don’t have to worry about overflowing this value.
(The reason for SQL Server using January 1st, 1753 as the earliest possible date, is best explained in this Stack Overflow answer. When the British switched to the Gregorian calendar, they lost 12 days in 1752, so as Kalen Delaney explains, “[…] the original Sybase SQL Server developers decided not to allow dates before 1753”. As the answer also states, DATE
and DATETIME2
are based on the Gregorian calendar, so you must keep this in mind for dates prior to that calendar’s adoption.)
The time portion of DATETIME
— namely the second four bytes — increments by one at a time as well, from 0x00000000
to 0x018B81FF
. However because DATETIME
rounds to the nearest 0, 3 or 7 milliseconds, the incrementing value might be 3 or 4 milliseconds. The maximum value for a time is 23:59:59.997
, which matches to that binary value of 0x018B81FF
. Working our way backwards, 0x018B81FE
(one less than the previous value) is equivalent to 23:59:59.993
, and 0x018B81FD
works out to 23:59:59.990
. Using millisecond values ending in 0, 3 and 7, we get back all the way to 0x00000000
which represents 00:00:00.000
.
It’s strange to me that with two billion possible values per four-byte segment (ignoring negative values), Sybase — who originally wrote SQL Server before the product was purchased and rewritten by Microsoft — opted to store the date and time in this fashion. There are only 86.4 million milliseconds in a 24-hour period, and 3.6 million days in 10,000 years, so those could easily have fitted into the two four-byte segments that make up this data type (you can even save a byte and have a more accurate value using the 7-byte wide DATETIME2(3)
data type).
This 0, 3 and 7 increment is very odd. Perhaps Jeff Moden has some thoughts. And speaking of thoughts, please leave yours in the comments below.
The in the hardware. DATETIME data type was based on the UNIX operating system which use clock ticks but the clock ticks do not match nicely into hours, minutes and seconds, so when you round them to the nearest value you get that funny zero, three, seven truncation. For you young people, we used to have these 16-bit minicomputers when we lived in caves. If you don’t believe me Google it.
The ISO 8601 standards also did not exist in the early days, nor was the Common Era calendar that firmly anchored yet. We used to refer to this as the Gregorian calendar or the Christian calendar. But that’s not quite right. The Gregorian calendar technically began in 1753, as per this article with the corrections made to the Julian calendar, which it replaced. The Common Era calendar that we use now begins with the date 0001-01-01 and ends with 9999-12-31. Yes, there are conventions for showing dates before and after this interval, but you’re probably not ever really going to use them (well, I had to when I was working the database for a museum)
Microsoft is trying very hard to come up to the ISO standards. Hence, the default display format for dates is “yyyy-mm-dd” , they still have to do the optional display formatting they inherited from Sybase. I’ve often wondered why we call it legacy code; a better name would have been a family curse code. The reason for allowing assorted national display formats in SQL Server was originally to mimic the COBOL PICTURE clause and the fact that you had to store dates and times as strings in that language.
When we started adding temporal types to the ANSI/ISO standards, there was a debate about punctuation. We decided to use a date format that includes the dashes, so that we can parse the date and not confuse it with a numeric value. We also decided to make it the only allowed display format in the standards. Likewise, the Americans won out with the use of a colon (:) as a separator between hours and minutes and seconds. Europeans would’ve preferred a full stop.
Officially what Microsoft calls DATETIME2(n) is called a TIMESTAMP(n) in the standards and we additionally have the datatypes DATE, and TIME(n), which match the standards.
Probably the best book ever written on temporal data in SQL is available as a free PDF download from the University of Arizona. It was written by Rick Snodgrass, who served on the ANSI X3 H2 committee and tried to introduce temporal data into the language.
Thanks as always for your insight, Joe. It makes sense, of course, when you look at minicomputer clock speed. I appreciate you stopping by. I’ll definitely check out Rick’s book.
Nice article, Randolph. Thanks for taking the time to research the subject and hammer out the article.
“Perhaps Jeff Moden has some thoughts”
Sure…
The time portion of DATETIME is actually an integer count of the number of 1/300’s of a second that have passed since midnight. That translates to a 3.3 millisecond resolution but that level of resolution isn’t possible for the way the data is stored so 0.0 and 3.3 ms are rounded down to 0ms and 3ms and 6.6 milliseconds is rounded up to 7 milliseconds.
If you’re measuring time cards, it doesn’t much matter. If you’re measuring scientific output of events, the DATETIME(2) is, of course, much more appropriate.
It’s also significant to note that, per ANSI standards, Addition (StartDate+Duration = EndDate, for example) and Subtraction (EndDate-StartDate) = Duration, for example) are easily possible in the DATETIME datatype (and SMALLDATETIME but let’s not go there) but not in the “newer” temporal datatypes. MS realized that mistake but, instead of fixing it (IMHO), they created the DATEDIFF_BIG so people could at least calculate durations using milliseconds without having to write a tome of code to do it.
The trouble is, they didn’t make a format for the result of any of the duration stuff and so people don’t know what the heck to do with the results of the subtraction (for example) because it usually shows up as a DATETIME in sometime in the year 1900.
If you’re interested in such a thing, here’s an article on it. Heh… I know the author,. 😀
https://www.sqlservercentral.com/articles/calculating-duration-using-datetime-start-and-end-dates-sql-spackle-2
Jeff, thanks for responding. I appreciate your insight as well. That post certainly rings a bell 😉
Comments are closed.