How SQL Server stores data types: sql_variant

messy paint cans and colours

This post looks at a curious data type that isn’t really a data type. Instead, sql_variant tries to be all things to all people. As with most things in life, it has a few shortcomings as a result.

If you would like to read about storage of other data types, here are the previous posts in the series:

By a show of hands, who uses the sql_variant data type? In my experience the answer is “no one,” and that’s not necessarily a bad thing. There’s a general philosophy in our industry where we assume that columns are a specific data type. This predictability makes writing queries and applications against those columns easier. It’s difficult enough dealing with NULL values, so adding confusion with handling data conversions is an invitation to introduce bugs.

I wonder, however, how many of my readers know that some of our favourite internal tables in SQL Server use sql_variant, including the venerable sys.configurations?

The official documentation for this data type — available from Microsoft Docs — tells us that:

[A] column defined as sql_variant can store int, binary, and char values [up to] a maximum length of 8016 bytes. […] The maximum length of the actual base type value is 8,000 bytes.

Another aspect of this data type is that it is not fully supported by ODBC, so depending on the driver you use to connect to the database the underlying column will be returned in binary.

This last sentence gives us a clue as to how the SQL Server and Azure SQL Database storage engine might persist a sql_variant column.

How are these values stored?

Let’s look at a simple example using three “base” types to see what the storage engine is doing. Our table will have two columns, with the second column being our sql_variant column.

Using DBCC IND and DBCC PAGE — as demonstrated in previous posts in this series — we find some interesting results. The following information is combined from these results for clarity.

Value Length Binary
Randolph 24 0xE701401F08C00000520061006E0064006F006C0070006800
West 12 0xA701401F08C0000057657374
25 6 0x380119000000

Readers familiar with hexadecimal values will recognize the hex equivalent of the three different values in the above table:

  • 0x0052 is upper-case R in Unicode format, followed by lower-case “andolph” (see storing strings)
  • 0x57 is upper-case W, followed by lower-case “est” in regular ANSI
  • 0x00000019 is a four-byte integer, stored in reverse (see storing integers)

What do the prefixes mean?

We can immediately see prefixes: 0xE701401F08C00000 and 0xA701401F08C00000 for the string values which look similar, and 0x3801 for the integer value. What gives?

Firstly, you can run SELECT * FROM sys.types to see all of the possible data types and their matching system_type_id. This decimal value is converted to hex, and represents the first byte in the prefix.

Starting with the integer, Martin Smith did a lot of the heavy lifting for us already in a StackOverflow answer from seven years ago:

  • 0x38 (system_type_id = 56) represents the internal value of an integer
  • 0x01 represents the version of sql_variant format, which is 1 since at least SQL Server 2008

With the string values, there’s a little more to it. Let’s look at just the first byte:

  • 0xE7 (system_type_id = 231) represents NVARCHAR
  • 0xA7 (system_type_id = 167) represents VARCHAR

As for the rest, fortunately we don’t need to know what these binary values mean, because there are extended properties for sql_variant that break it down for us using the following query, substituting our previously created table:

The following results are what those binary values are encoding.

BaseType Precision Scale TotalBytes Collation MaxLength
nvarchar 0 0 24 Latin1_General_CS_AS 8000
varchar 0 0 12 Latin1_General_CS_AS 8000
int 10 0 6 NULL 4

Summary

I don’t ever use sql_variant, but it’s useful to know what to look for when upgrading existing systems or examining data pages during a disaster recovery. Also keep in mind that some internal SQL Server system tables and system stored procedures use this data type to mix strings and numbers. The data type has its place, but doesn’t make for good indexing or predictable code. There is also a storage overhead per column, especially for string values.

Leave your thoughts in the comments below.

Photo by Taelynn Christopher on Unsplash.

One thought on “How SQL Server stores data types: sql_variant”

  • Thank you very much for showing us that the column itself knows it’s own type.
    I’ve always assumed that I needed to store the data type elsewhere.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: