With the release of SQL Server 2019, I wanted to highlight in a single place some things that I’m excited about.
Drawing on sessions I presented this year at SQLBits and SQL Saturday Edmonton respectively, these are features in SQL Server 2019 for the busy DBA:
- UTF-8 in-engine support
- Intelligent query processing features
- Replacing sqlcmd with mssql-cli (almost)
- Always Encrypted with secure enclaves
- Accelerated database recovery
- External language support with language extensions
- Bonus: extra Linux stuff
Say we work at an English-speaking organization, and we have a legacy system with a lot of history, code cruft, and limited documentation. Now say we want to support foreign languages with that system, which may use alphabets containing characters beyond the ASCII range. The spectre of Unicode hangs over us as we weigh the risk (and cost) of converting all string columns in our database(s) from
NVARCHAR. Consider too the additional 100% storage needed for those double-byte characters.
With UTF-8 support in SQL Server 2019, we can change the collation of our database (and columns) to UTF-8. For characters in the Latin alphabet, it’ll use one byte per character. If we need to store a Unicode character, the collation handles that for us by using the bytes it needs (from 2 to 4).
Important caveat: We can’t just set it and forget it. Our column definitions might need to be expanded to cater for those extra bytes. In SQL Server (and Azure SQL Database), a column definition of
VARCHAR(200) means 200 bytes, not 200 characters. This is documented in Microsoft Docs.
Intelligent query processing
When SQL Server 2016 appeared on the scene, Microsoft claimed a 20% performance improvement out of the box compared to previous versions. This was achieved by making some improvements to internal structures, and was especially noticeable in systems with many CPU cores and large amounts of memory.
With SQL Server 2019, batch mode operators that were previously created for columnstore data structures are now available for rowstore structures as well. Other key features include:
- memory grant feedback: repeated executions of a query will help the optimizer make better use of memory
- table variable deferred compilation: statistics are updated the first time a table variable is populated, allowing better query decisions
- scalar UDF inlining: user-defined functions are now inlined into the query during compilation, which can dramatically improve performance
- approximate count distinct: lower resource usage with faster results, if we don’t mind some loss in accuracy, for large distinct counts
mssql-cli as a replacement for
Almost, but not quite. Also, not just for SQL Server 2019. With
mssql-cli Microsoft has created a neat cross-platform (Windows, Linux, macOS) command-line utility that includes IntelliSense support to help us type long object names. It can do multi-line queries, and keep a record of our favourite queries. We can also edit scripts in an external editor. Command-line junkies should give this a spin.
Always Encrypted with secure enclaves
Apple made secure enclaves popular with storing fingerprint information on-device using Touch ID. Take a piece of memory that is completely blocked off from regular system memory, and combine it with the existing Always Encrypted feature introduced in SQL Server 2016. Not only can SQL Server protect sensitive data from system administrators who would normally have full access to the database, but now it can put indexes on encrypted data without anyone but the authorized users having access through an approved client connection. This is next-level security.
Important caveat: We will be trading performance for security. The amount of memory for the secure enclave is limited in this version, and it has important considerations when setting up, however we can create indexes on this data even if it is encrypted.
Accelerated database recovery
A lot has been written about this feature, with one of my favourite write-ups written by my friend Andy Mallon. I will be turning this feature on across the board for all databases I manage. For the price of an additional 10% of database file usage (in my own testing), we get significant performance gains whenever database recovery runs. This is useful at SQL Server startup, during database restores, and for transaction rollbacks. Even better: the Persistent Version Store (PVS) is a per-database feature, meaning far less impact on TempDB. Also, long-running transactions are recorded in the PVS, not the transaction log. This is what drives the HyperScale feature in Azure SQL Database, and I’m happy to see it make its way into the on-premises product.
Important caveat: If we use Read-Committed Snapshot Isolation (RCSI), there is a small performance overhead.
First there was R, the language used by data scientists for statistical analysis. Then there was Python. Now with SQL Server 2019, there is a third, unexpected (in my opinion) external language: Java. Along with SQLCLR which has provided .NET support in the database engine for many years, we can process data using our choice of four different languages, right inside the SQL Server database engine.
Bonus: closer parity on Linux
SQL Server 2017 was the “Linux” version. All database engine features are available on Windows, Linux, and inside Docker containers, because it’s the same code on all platforms. SQL Server 2019 is the “parity” version, as the list of components that don’t run on Linux is shrinking. We can run SSIS packages. We can run PolyBase. We can use Distributed Transaction Coordinator. We can run Machine Learning Services. Throw in Change Data Capture (CDC). Finish up with Active Directory authentication.
Leave your thoughts in the comments below.