This post looks at a curious data type that isn’t really a data type. Instead,
sql_variant tries to be all things to all people. As with most things in life, it has a few shortcomings as a result.
If you would like to read about storage of other data types, here are the previous posts in the series:
By a show of hands, who uses the
sql_variant data type? In my experience the answer is “no one,” and that’s not necessarily a bad thing. There’s a general philosophy in our industry where we assume that columns are a specific data type. This predictability makes writing queries and applications against those columns easier. It’s difficult enough dealing with
NULL values, so adding confusion with handling data conversions is an invitation to introduce bugs.
I wonder, however, how many of my readers know that some of our favourite internal tables in SQL Server use
sql_variant, including the venerable
The official documentation for this data type — available from Microsoft Docs — tells us that:
[A] column defined as sql_variant can store int, binary, and char values [up to] a maximum length of 8016 bytes. […] The maximum length of the actual base type value is 8,000 bytes.
Another aspect of this data type is that it is not fully supported by ODBC, so depending on the driver you use to connect to the database the underlying column will be returned in binary.
This last sentence gives us a clue as to how the SQL Server and Azure SQL Database storage engine might persist a
How are these values stored?
Let’s look at a simple example using three “base” types to see what the storage engine is doing. Our table will have two columns, with the second column being our
CREATE TABLE test ( col1 VARCHAR(255) NOT NULL, col2 SQL_VARIANT NULL ); INSERT INTO test SELECT 'FirstName', N'Randolph'; -- NVARCHAR INSERT INTO test SELECT 'LastName', 'West'; -- VARCHAR INSERT INTO test SELECT 'Age', 25; -- INT
DBCC IND and
DBCC PAGE — as demonstrated in previous posts in this series — we find some interesting results. The following information is combined from these results for clarity.
Readers familiar with hexadecimal values will recognize the hex equivalent of the three different values in the above table:
0x0052is upper-case R in Unicode format, followed by lower-case “andolph” (see storing strings)
0x57is upper-case W, followed by lower-case “est” in regular ANSI
0x00000019is a four-byte integer, stored in reverse (see storing integers)
What do the prefixes mean?
We can immediately see prefixes:
0xA701401F08C00000 for the string values which look similar, and
0x3801 for the integer value. What gives?
Firstly, you can run
SELECT * FROM sys.types to see all of the possible data types and their matching
system_type_id. This decimal value is converted to hex, and represents the first byte in the prefix.
Starting with the integer, Martin Smith did a lot of the heavy lifting for us already in a StackOverflow answer from seven years ago:
system_type_id= 56) represents the internal value of an integer
0x01represents the version of
sql_variantformat, which is
1since at least SQL Server 2008
With the string values, there’s a little more to it. Let’s look at just the first byte:
system_type_id= 231) represents
system_type_id= 167) represents
As for the rest, fortunately we don’t need to know what these binary values mean, because there are extended properties for
sql_variant that break it down for us using the following query, substituting our previously created table:
SELECT SQL_VARIANT_PROPERTY(col2, 'BaseType') AS [BaseType], SQL_VARIANT_PROPERTY(col2, 'Precision') AS [Precision], SQL_VARIANT_PROPERTY(col2, 'Scale') AS [Scale], SQL_VARIANT_PROPERTY(col2, 'TotalBytes') AS [TotalBytes], SQL_VARIANT_PROPERTY(col2, 'Collation') AS [Collation], SQL_VARIANT_PROPERTY(col2, 'MaxLength') AS [MaxLength] FROM dbo.test;
The following results are what those binary values are encoding.
I don’t ever use
sql_variant, but it’s useful to know what to look for when upgrading existing systems or examining data pages during a disaster recovery. Also keep in mind that some internal SQL Server system tables and system stored procedures use this data type to mix strings and numbers. The data type has its place, but doesn’t make for good indexing or predictable code. There is also a storage overhead per column, especially for string values.
Leave your thoughts in the comments below.
Photo by Taelynn Christopher on Unsplash.
Thank you very much for showing us that the column itself knows it’s own type.
I’ve always assumed that I needed to store the data type elsewhere.