Skip to content

Compression in SQL Server: COMPRESS and DECOMPRESS

  • by
Stonehenge with miniature effect applied

On Twitter recently, I asked:

Does anyone I know use the COMPRESS and DECOMPRESS features in T-SQL?

To those who replied in the affirmative, I asked:

What made you decide on this as opposed to ROW or PAGE compression?

Joey D’Antoni (blogTwitter), one of my co-authors, reminded me that row and page compression do not work on LOB (“large object”) data. Dave Dustin (Twitter) and Niko Neugebauer (blog | Twitter) said that they specifically use COMPRESS for LOB data for this reason.

Code changes

Using these functions requires changes to existing code because you need to use the COMPRESS function when inserting data. Dave and Niko did it by concealing the functionality inside views.

If you cannot make code changes to your environment, you may have to resort to row and page compression, or possibly columnstore indexes.

How does it work?

Released with SQL Server 2016, COMPRESS leverages the open source gzip compression algorithm, and can reduce the space used by as much as 90% depending on the kind of data in question. Data is stored as the VARBINARY(MAX) data type:

INSERT INTO dbo.Table1 (Col1, Col2) VALUES ('C0001', COMPRESS('<long string goes here>'));

DECOMPRESS is used to read the data out again:

SELECT Col1, DECOMPRESS(Col2) AS Col2 FROM dbo.Table1 WHERE Col1 = 'C0001';

Where would I use it?

This is great for keeping track of LOB data that doesn’t need to be queried often, for example large strings containing JSON and XML text. As noted in a previous post, the XML data type may not be appropriate for auditing purposes because it permanently modifies the data, whereas COMPRESS would be an appropriate storage mechanism because the data is exactly the same when read out again.

Are there any limitations?

You can’t index compressed data, because it’s stored as a compressed byte array. If you want to create an index, you can either use row or page compression and hope for the best, or use a computed column containing the data you need from the column, and index that instead.

Share your experiences with data compression in the comments below.

Photo by Marinus van der Westhuizen. All rights reserved. Used with permission.