How SQL Server stores data types: dates and times

brown and white clocks

This post dives into how SQL Server stores date and time data types in memory and on disk.

But first, a note about endianness: CPUs manufactured by Intel and other vendors process binary values in reverse order (known as little-endian), with the least significant byte in a binary value first, and the most significant (i.e. the first byte) last. There are technical and historical reasons for this which are not important for this post, but it does mean that when you want to read values back you will have to flip the order of bytes to use a hex calculator.

Dates

DATE is the byte-reversed number of days since the year 0001-01-01, stored as three bytes. It goes up to 9999-12-31, which is stored as 0xDAB937. You can check this value by reversing the bytes and sticking them into a hex calculator. 37 B9 DA equals 3,652,058, which is the number of days since 0001-01-01.

If you try to cast 0xDBB937 as a DATE value (by incrementing the least significant bit DA by 1), it will throw a conversion error. There is obviously some overflow detection that protects against corruption in a date type.

Times

TIME works slightly differently, as noted in a previous blog post. It can store the number of fractions of a second (known as the precision) — with a length ranging from 0 to 7 — from midnight up to and including the final fraction of a second before midnight. In normal operations, the precision is stored in the column metadata, and you will not see it if you look at the table directly. However, if you cast the value to VARBINARY for example, the precision is prefixed as a single byte before the value, ranging from 0x00 to 0x07.

There are 1 billion (1,000,000,000) nanoseconds in a single second, and TIME(7) has a granularity of 100 nanoseconds, or ten million possible values per second. In a 24-hour period there are 864,000,000,000 possible values, so if you don’t need that kind of granularity you might be better off choosing a smaller precision.

Keep in mind that more than one length can be stored in the same number of bytes, as shown below:

  • TIME(7) – 5 bytes – is the byte-reversed number of tenths of a microsecond (0.0000001)
  • TIME(6) – 5 bytes – is the byte-reversed number of microseconds (0.000001)
  • TIME(5) – 5 bytes – is the byte-reversed number of tens of microseconds (0.00001)
  • TIME(4) – 4 bytes – is the byte-reversed number of tenths of a millisecond (0.0001)
  • TIME(3) – 4 bytes – is the byte-reversed number of milliseconds (0.001)
  • TIME(2) – 3 bytes – is the byte-reversed number of tens of milliseconds (0.01)
  • TIME(1) – 3 bytes – is the byte-reversed number of tenths of a second (0.1)
  • TIME(0) – 3 bytes – is the byte-reversed number of seconds (1)

Dates and times

We can think of DATETIME2(n) as DATE and TIME jammed together in the same field (though the time value is first because of endianness). The precision is again encoded in the value as a prefix if you cast it as binary. In the table itself, the prefix is excluded because the column metadata already contains the precision.

Let’s look at today’s date of April 22, 2020, at 5 minutes after 10 in the morning. We can calculate that it is the 737,536th day since January 1, 0001, and approximately 36,310 seconds have passed. Depending on the precision, the value would look something like this (remember the time is on the left, and the date is on the right):

Date Stored binary Date binary value Time binary value
2020-04-22
10:05:09.3427651
0xC3050E8A5400410B 0B 41 00
737,536
54 8A 0E 05 C3
363,093,427,651
2020-04-22
10:05:09.342765
0x2D9A34740800410B 0B 41 00
737,536
08 74 34 9A 2D
36,309,342,765
2020-04-22
10:05:09.34277
0x05A96BD80000410B 0B 41 00
737,536
00 D8 6B A9 05
3,630,934,277
2020-04-22
10:05:09.3428
0xB45DA41500410B 0B 41 00
737,536
15 A4 5D B4
363,093,428
2020-04-22
10:05:09.343
0x5F092A0200410B 0B 41 00
737,536
02 2A 09 5F
36,309,343
2020-04-22
10:05:09.34
0x56673700410B 0B 41 00
737,536
37 67 56
3,630,934
2020-04-22
10:05:09.3
0x558A0500410B 0B 41 00
737,536
05 8A 55
363,093
2020-04-22
10:05:09
0xD58D0000410B 0B 41 00
737,536
00 8D D5
36,309

What about DATETIME?

This is a curious data type, because despite taking up eight bytes like DATETIME2(7), it has a granularity of just 3 to 4 milliseconds, which means it will round up (or down) any input value to the nearest 0, 3, or 7 milliseconds. How is this data type stored, and why is it so inefficient when compared to DATETIME2(n)? And, why does the range for a DATETIME data type start in the year 1753?

The answers to these questions may surprise you. For starters, the value is not byte-swapped. The first four bytes of DATETIME represent the date, and the second four bytes represent the time. The “starting point,” or the place where the binary is 0x0000000000000000, is midnight on January 1st, 1900. For any date after 1900-01-01, the date portion of the binary is incremented by one. Using our date of 2020-04-22 as an example, we can calculate that 43,941 days have passed since 1900-01-01, and converting that to hex gives us 0xABA5.

Indeed, the statement SELECT CAST(0x0000ABA500000000 AS DATETIME) returns 2020-04-22 00:00:00.000 as expected.

So what about dates prior to 1900? This is where it gets interesting. One day prior, namely midnight on December 31st, 1899, has a binary value of 0xFFFFFFFF00000000. That’s the maximum possible value for a four-byte value, and equates to around 2 billion possible values. From what I remember in my 1992 high school computer science class, this is a binary arithmetic operation called two’s complement. This technique is used for calculating positive and negative values, so if 0 is our starting point (0x00000000), -1 would be the maximum possible value for the number of bytes, which in this case is 0xFFFFFFFF.

Two days prior to 1900-01-01 would then be 0xFFFFFFFE, and sure enough, the statement SELECT CAST(0xFFFFFFFE00000000 AS DATETIME) returns 1899-12-30 00:00:00.000 as expected. And now, because the DATETIME data type only goes back as far as the year 1753, we don’t have to worry about overflowing this value.

(The reason for SQL Server using January 1st, 1753 as the earliest possible date, is best explained in this Stack Overflow answer. When the British switched to the Gregorian calendar, they lost 12 days in 1752, so as Kalen Delaney explains, “[…] the original Sybase SQL Server developers decided not to allow dates before 1753”. As the answer also states, DATE and DATETIME2 are based on the Gregorian calendar, so you must keep this in mind for dates prior to that calendar’s adoption.)

The time portion of DATETIME — namely the second four bytes — increments by one at a time as well, from 0x00000000 to 0x018B81FF. However because DATETIME rounds to the nearest 0, 3 or 7 milliseconds, the incrementing value might be 3 or 4 milliseconds. The maximum value for a time is 23:59:59.997, which matches to that binary value of 0x018B81FF. Working our way backwards, 0x018B81FE (one less than the previous value) is equivalent to 23:59:59.993, and 0x018B81FD works out to 23:59:59.990. Using millisecond values ending in 0, 3 and 7, we get back all the way to 0x00000000 which represents 00:00:00.000.

It’s strange to me that with two billion possible values per four-byte segment (ignoring negative values), Sybase — who originally wrote SQL Server before the product was purchased and rewritten by Microsoft — opted to store the date and time in this fashion. There are only 86.4 million milliseconds in a 24-hour period, and 3.6 million days in 10,000 years, so those could easily have fitted into the two four-byte segments that make up this data type (you can even save a byte and have a more accurate value using the 7-byte wide DATETIME2(3) data type).

This 0, 3 and 7 increment is very odd. Perhaps Jeff Moden has some thoughts. And speaking of thoughts, please leave yours in the comments below.

Photo by Jon Tyson on Unsplash.

4 thoughts on “How SQL Server stores data types: dates and times

  • The in the hardware. DATETIME data type was based on the UNIX operating system which use clock ticks but the clock ticks do not match nicely into hours, minutes and seconds, so when you round them to the nearest value you get that funny zero, three, seven truncation. For you young people, we used to have these 16-bit minicomputers when we lived in caves. If you don’t believe me Google it.

    The ISO 8601 standards also did not exist in the early days, nor was the Common Era calendar that firmly anchored yet. We used to refer to this as the Gregorian calendar or the Christian calendar. But that’s not quite right. The Gregorian calendar technically began in 1753, as per this article with the corrections made to the Julian calendar, which it replaced. The Common Era calendar that we use now begins with the date 0001-01-01 and ends with 9999-12-31. Yes, there are conventions for showing dates before and after this interval, but you’re probably not ever really going to use them (well, I had to when I was working the database for a museum)

    Microsoft is trying very hard to come up to the ISO standards. Hence, the default display format for dates is “yyyy-mm-dd” , they still have to do the optional display formatting they inherited from Sybase. I’ve often wondered why we call it legacy code; a better name would have been a family curse code. The reason for allowing assorted national display formats in SQL Server was originally to mimic the COBOL PICTURE clause and the fact that you had to store dates and times as strings in that language.

    When we started adding temporal types to the ANSI/ISO standards, there was a debate about punctuation. We decided to use a date format that includes the dashes, so that we can parse the date and not confuse it with a numeric value. We also decided to make it the only allowed display format in the standards. Likewise, the Americans won out with the use of a colon (:) as a separator between hours and minutes and seconds. Europeans would’ve preferred a full stop.

    Officially what Microsoft calls DATETIME2(n) is called a TIMESTAMP(n) in the standards and we additionally have the datatypes DATE, and TIME(n), which match the standards.

    Probably the best book ever written on temporal data in SQL is available as a free PDF download from the University of Arizona. It was written by Rick Snodgrass, who served on the ANSI X3 H2 committee and tried to introduce temporal data into the language.

    • Thanks as always for your insight, Joe. It makes sense, of course, when you look at minicomputer clock speed. I appreciate you stopping by. I’ll definitely check out Rick’s book.

  • Nice article, Randolph. Thanks for taking the time to research the subject and hammer out the article.

    “Perhaps Jeff Moden has some thoughts”

    Sure…

    The time portion of DATETIME is actually an integer count of the number of 1/300’s of a second that have passed since midnight. That translates to a 3.3 millisecond resolution but that level of resolution isn’t possible for the way the data is stored so 0.0 and 3.3 ms are rounded down to 0ms and 3ms and 6.6 milliseconds is rounded up to 7 milliseconds.

    If you’re measuring time cards, it doesn’t much matter. If you’re measuring scientific output of events, the DATETIME(2) is, of course, much more appropriate.

    It’s also significant to note that, per ANSI standards, Addition (StartDate+Duration = EndDate, for example) and Subtraction (EndDate-StartDate) = Duration, for example) are easily possible in the DATETIME datatype (and SMALLDATETIME but let’s not go there) but not in the “newer” temporal datatypes. MS realized that mistake but, instead of fixing it (IMHO), they created the DATEDIFF_BIG so people could at least calculate durations using milliseconds without having to write a tome of code to do it.
    The trouble is, they didn’t make a format for the result of any of the duration stuff and so people don’t know what the heck to do with the results of the subtraction (for example) because it usually shows up as a DATETIME in sometime in the year 1900.

    If you’re interested in such a thing, here’s an article on it. Heh… I know the author,. 😀

    https://www.sqlservercentral.com/articles/calculating-duration-using-datetime-start-and-end-dates-sql-spackle-2

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: