How SQL Server stores data types: sql_variant

This post looks at a curious data type that isn’t really a data type. Instead, sql_variant tries to be all things to all people. As with most things in life, it has a few shortcomings as a result.

If you would like to read about storage of other data types, here are the previous posts in the series:

By a show of hands, who uses the sql_variant data type? In my experience the answer is “no one,” and that’s not necessarily a bad thing. There’s a general philosophy in our industry where we assume that columns are a specific data type. This predictability makes writing queries and applications against those columns easier. It’s difficult enough dealing with NULL values, so adding confusion with handling data conversions is an invitation to introduce bugs.

I wonder, however, how many of my readers know that some of our favourite internal tables in SQL Server use sql_variant, including the venerable sys.configurations?

The official documentation for this data type — available from Microsoft Docs — tells us that:

[A] column defined as sql_variant can store int, binary, and char values [up to] a maximum length of 8016 bytes. […] The maximum length of the actual base type value is 8,000 bytes.

Another aspect of this data type is that it is not fully supported by ODBC, so depending on the driver you use to connect to the database the underlying column will be returned in binary.

This last sentence gives us a clue as to how the SQL Server and Azure SQL Database storage engine might persist a sql_variant column.

How are these values stored?

Let’s look at a simple example using three “base” types to see what the storage engine is doing. Our table will have two columns, with the second column being our sql_variant column.

CREATE TABLE test (     col1 VARCHAR(255) NOT NULL,     col2 SQL_VARIANT NULL );  INSERT INTO test SELECT 'FirstName', N'Randolph'; -- NVARCHAR INSERT INTO test SELECT 'LastName', 'West'; -- VARCHAR INSERT INTO test SELECT 'Age', 25; -- INT

Using DBCC IND and DBCC PAGE — as demonstrated in previous posts in this series — we find some interesting results. The following information is combined from these results for clarity.

Value	Length	Binary
Randolph	24	`0xE701401F08C00000520061006E0064006F006C0070006800`
West	12	`0xA701401F08C0000057657374`
25	6	`0x380119000000`

Readers familiar with hexadecimal values will recognize the hex equivalent of the three different values in the above table:

0x0052 is upper-case R in Unicode format, followed by lower-case “andolph” (see storing strings)
0x57 is upper-case W, followed by lower-case “est” in regular ANSI
0x00000019 is a four-byte integer, stored in reverse (see storing integers)

What do the prefixes mean?

We can immediately see prefixes: 0xE701401F08C00000 and 0xA701401F08C00000 for the string values which look similar, and 0x3801 for the integer value. What gives?

Firstly, you can run SELECT * FROM sys.types to see all of the possible data types and their matching system_type_id. This decimal value is converted to hex, and represents the first byte in the prefix.

Starting with the integer, Martin Smith did a lot of the heavy lifting for us already in a StackOverflow answer from seven years ago:

0x38 (system_type_id = 56) represents the internal value of an integer
0x01 represents the version of sql_variant format, which is 1 since at least SQL Server 2008

With the string values, there’s a little more to it. Let’s look at just the first byte:

0xE7 (system_type_id = 231) represents NVARCHAR
0xA7 (system_type_id = 167) represents VARCHAR

As for the rest, fortunately we don’t need to know what these binary values mean, because there are extended properties for sql_variant that break it down for us using the following query, substituting our previously created table:

SELECT SQL_VARIANT_PROPERTY(col2, 'BaseType') AS [BaseType],        SQL_VARIANT_PROPERTY(col2, 'Precision') AS [Precision],        SQL_VARIANT_PROPERTY(col2, 'Scale') AS [Scale],        SQL_VARIANT_PROPERTY(col2, 'TotalBytes') AS [TotalBytes],        SQL_VARIANT_PROPERTY(col2, 'Collation') AS [Collation],        SQL_VARIANT_PROPERTY(col2, 'MaxLength') AS [MaxLength] FROM dbo.test;

The following results are what those binary values are encoding.

BaseType	Precision	Scale	TotalBytes	Collation	MaxLength
nvarchar	0	0	24	Latin1_General_CS_AS	8000
varchar	0	0	12	Latin1_General_CS_AS	8000
int	10	0	6	NULL	4

Summary

I don’t ever use sql_variant, but it’s useful to know what to look for when upgrading existing systems or examining data pages during a disaster recovery. Also keep in mind that some internal SQL Server system tables and system stored procedures use this data type to mix strings and numbers. The data type has its place, but doesn’t make for good indexing or predictable code. There is also a storage overhead per column, especially for string values.

Leave your thoughts in the comments below.

Photo by Taelynn Christopher on Unsplash.

How SQL Server stores data types: sql_variant

How are these values stored?

What do the prefixes mean?

Summary

1 thought on “How SQL Server stores data types: sql_variant”