This post looks at a curious data type that isn’t really a data type. Instead, sql_variant
tries to be all things to all people. As with most things in life, it has a few shortcomings as a result.
If you would like to read about storage of other data types, here are the previous posts in the series:
By a show of hands, who uses the sql_variant
data type? In my experience the answer is “no one,” and that’s not necessarily a bad thing. There’s a general philosophy in our industry where we assume that columns are a specific data type. This predictability makes writing queries and applications against those columns easier. It’s difficult enough dealing with NULL
values, so adding confusion with handling data conversions is an invitation to introduce bugs.
I wonder, however, how many of my readers know that some of our favourite internal tables in SQL Server use sql_variant
, including the venerable sys.configurations
?
The official documentation for this data type — available from Microsoft Docs — tells us that:
[A] column defined as sql_variant can store int, binary, and char values [up to] a maximum length of 8016 bytes. […] The maximum length of the actual base type value is 8,000 bytes.
Another aspect of this data type is that it is not fully supported by ODBC, so depending on the driver you use to connect to the database the underlying column will be returned in binary.
This last sentence gives us a clue as to how the SQL Server and Azure SQL Database storage engine might persist a sql_variant
column.
How are these values stored?
Let’s look at a simple example using three “base” types to see what the storage engine is doing. Our table will have two columns, with the second column being our sql_variant
column.
CREATE TABLE test ( col1 VARCHAR(255) NOT NULL, col2 SQL_VARIANT NULL ); INSERT INTO test SELECT 'FirstName', N'Randolph'; -- NVARCHAR INSERT INTO test SELECT 'LastName', 'West'; -- VARCHAR INSERT INTO test SELECT 'Age', 25; -- INT
Using DBCC IND
and DBCC PAGE
— as demonstrated in previous posts in this series — we find some interesting results. The following information is combined from these results for clarity.
Value | Length | Binary |
Randolph | 24 | 0xE701401F08C00000520061006E0064006F006C0070006800 |
West | 12 | 0xA701401F08C0000057657374 |
25 | 6 | 0x380119000000 |
Readers familiar with hexadecimal values will recognize the hex equivalent of the three different values in the above table:
0x0052
is upper-case R in Unicode format, followed by lower-case “andolph” (see storing strings)0x57
is upper-case W, followed by lower-case “est” in regular ANSI0x00000019
is a four-byte integer, stored in reverse (see storing integers)
What do the prefixes mean?
We can immediately see prefixes: 0xE701401F08C00000
and 0xA701401F08C00000
for the string values which look similar, and 0x3801
for the integer value. What gives?
Firstly, you can run SELECT * FROM sys.types
to see all of the possible data types and their matching system_type_id
. This decimal value is converted to hex, and represents the first byte in the prefix.
Starting with the integer, Martin Smith did a lot of the heavy lifting for us already in a StackOverflow answer from seven years ago:
0x38
(system_type_id
= 56) represents the internal value of an integer0x01
represents the version ofsql_variant
format, which is1
since at least SQL Server 2008
With the string values, there’s a little more to it. Let’s look at just the first byte:
0xE7
(system_type_id
= 231) representsNVARCHAR
0xA7
(system_type_id
= 167) representsVARCHAR
As for the rest, fortunately we don’t need to know what these binary values mean, because there are extended properties for sql_variant
that break it down for us using the following query, substituting our previously created table:
SELECT SQL_VARIANT_PROPERTY(col2, 'BaseType') AS [BaseType], SQL_VARIANT_PROPERTY(col2, 'Precision') AS [Precision], SQL_VARIANT_PROPERTY(col2, 'Scale') AS [Scale], SQL_VARIANT_PROPERTY(col2, 'TotalBytes') AS [TotalBytes], SQL_VARIANT_PROPERTY(col2, 'Collation') AS [Collation], SQL_VARIANT_PROPERTY(col2, 'MaxLength') AS [MaxLength] FROM dbo.test;
The following results are what those binary values are encoding.
BaseType | Precision | Scale | TotalBytes | Collation | MaxLength |
nvarchar | 0 | 0 | 24 | Latin1_General_CS_AS | 8000 |
varchar | 0 | 0 | 12 | Latin1_General_CS_AS | 8000 |
int | 10 | 0 | 6 | NULL | 4 |
Summary
I don’t ever use sql_variant
, but it’s useful to know what to look for when upgrading existing systems or examining data pages during a disaster recovery. Also keep in mind that some internal SQL Server system tables and system stored procedures use this data type to mix strings and numbers. The data type has its place, but doesn’t make for good indexing or predictable code. There is also a storage overhead per column, especially for string values.
Leave your thoughts in the comments below.
Photo by Taelynn Christopher on Unsplash.
Thank you very much for showing us that the column itself knows it’s own type.
I’ve always assumed that I needed to store the data type elsewhere.
Comments are closed.