Cumulative Updates Are First Class

Since 27 January 2016, Microsoft recommends that Cumulative Updates be installed with the same confidence as Service Packs, and installed as they become available.

The official announcement was posted on 24 March 2016 (their emphasis):

As of January CU releases, these caution messages have been updated, we now recommend ongoing, proactive installation of CUs as they become available. You should plan to install a CU with the same level of confidence you plan to install SPs (Service Packs) as they are released.

[Note: This guidance is also posted in the latest Cumulative Updates themselves, if you need additional proof to show your boss.]

This is new guidance and shows more rapid and proactive support for SQL Server from Microsoft.

With that in mind, I would strongly recommend that you make sure your production instances of SQL Server are up to date by installing the latest Service Pack and Cumulative Update for your instance.

If you would like to know what version you’re currently running, issue the command SELECT @@VERSION against your instance.

For example, my latest instance on a virtual machine I use is 12.00.4436.

Using the version matrix on the Microsoft SQL Server Version List, I know that this instance is running SQL Server 2014 Service Pack 1, with Cumulative Update 4.

It also shows that there is a newer Cumulative Update (CU 5) available, which will bring my instance up to the very latest version (as of this writing).

Brent Ozar Unlimited has a simpler site, SQL Server Updates, if all you want is the very latest Service Pack or Cumulative Update for supported versions of SQL Server.

If you have any comments about Cumulative Updates, Service Packs, or your favourite kind of chocolate, feel free to reach out to me on Twitter at @bornsql .

Data Efficiency in SQL Server: DECIMAL

This is part two of a short series of posts about how I assign efficient data types when designing a new table or database.

Use less space with BIGINT

Last week, I spoke about a more efficient DATETIME than DATETIME. This week I have a more efficient DECIMAL than DECIMAL.

If you’re planning to store a long number that will never have a fraction, and you were considering using DECIMAL, I’m going to ask you to think about BIGINT first.

Granted, this is a very specific use-case, and requires some forethought, but that’s our job: to think ahead.

Here’s the scenario where I opted in for a BIGINT over a DECIMAL: IMEI numbers.

According to Wikipedia, the International Mobile Station Equipment Identity (that long number on your mobile device that identifies it uniquely—well, mostly) is usually from 14 to 16 digits in length, and may be as short as 8 digits in length.

A DECIMAL data definition requires two values: precision and scale.

Let’s say you want to store the value 123,456.789. The number of digits in total (both to the left and right of the decimal point) is nine. Thus, the precision is 9.

The scale is everything to the right of the decimal point, which in this case is 3.

In other words, we would define the this value as DECIMAL(9,3).

A precision of 1–9 digits only requires 5 bytes of storage, which makes this far more efficient than a BIGINT, which requires the full 8 bytes of storage, plus it cannot store the fraction.

However, as the precision increases, the storage requirements increase. A precision of 10–19 digits requires 9 bytes (a 45% jump in storage required), and this is the sweet spot where a BIGINT is an effective alternative.

If you do not need to store a fraction, as in the case of an IMEI number, consider BIGINT. 8 bytes will safely store a value up to 19 digits in length (for a maximum value of 2⁶³-1).

So if you’re storing IMEI numbers for a country’s worth of mobile devices, say 250 million rows, one byte will save you 238MB in memory, queries, storage, indexing, backups, and so on.

I think that’s worth thinking about.

Hit me up on Twitter, at @bornsql, to tell me how you’ve made efficient data design choices.

Data Efficiency in SQL Server: DATETIME

This is the first in a short series of posts about how I assign efficient data types when designing a new table or database.

Use less space with DATETIME2

We all know that the DATETIME column uses 8 bytes of space to store the date and time, to an accuracy of a paltry 3 milliseconds.

This used to cause no end of drama in the days before SQL Server 2008 because milliseconds were rounded to end in 0, 3 or 7. When trying to calculate the last possible time in a day, you had to work with some crazy values like 2016-03-15 23:59:59.997.

Fortunately, SQL Server 2008 introduced DATETIME2, which put paid to that drama. We now have a precision of up to 7 places after the decimal, and it still only uses a maximum of 8 bytes!

For example, we can now store a value of 2015-03-15 23:59:59.9999999 which is mind-bogglingly close to midnight, and not worry about insane values ending in a 7.

Do you really need that precision, though? How about a way to actually use less storage and store more accurate data?

If 0–2 places of precision (after the decimal) requires only 6 bytes, 3–4 places requires 7 bytes, and 5–7 places requires the full 8 bytes, we can save ourselves a whole byte and not lose any precision by using DATETIME2(3) as a drop-in replacement for DATETIME.

We get precision down to the millisecond (even more accurate than before) and shave off one byte for every row in that column. In a table with ten million rows, that’s 9.5MB of space we no longer have to store, or query, or index, or back up.

What data efficiency method have you used in production? Let me know on Twitter at @bornsql.

What is your SQL Server 2005 Upgrade Plan?

SQL Server 2005 will no longer be supported by Microsoft as of 12 April 2016.

One of the services I offer my customers is an upgrade from any* version of SQL Server to the latest and greatest version.

It is helpful to remember that version upgrades are quite tricky. The database engine does not support upgrades of more than two major version numbers at a time.

That means that, if you were to upgrade to SQL Server 2014 or 2016, you would need to have an intermediate step if you’re coming from anything before 2008.

Note: this rule applies to in-place upgrades only. If you are migrating to a new server and perform a backup/restore or detach/reattach, any database from 2005 and upwards does not require this intermediate step.

Even with upgrading to a higher version, you can still run your database in the same compatibility level as the version you’re upgrading from if it is in that two version window. This gives you compatibility with legacy code and applications that assume they are working with an older version.

For example, if you upgrade a SQL Server 2005 instance to SQL Server 2012, you can still run the database in compatibility level 90, which is equivalent to SQL Server 2005.

In fact, SQL Server 2014 will allow you to attach a SQL Server 2005 database, but the minimum compatibility level will be automatically updated to 100 (Source).

Installed Version Can Upgrade To Compatibility Level
SQL Server 6.5 (65) SQL Server 2000 2000: 80, 70, 65
SQL Server 7.0 (70) SQL Server 2005 2005: 90, 80, 70
SQL Server 2000 (80) SQL Server 2008/R2 2008/R2: 100, 90, 80
SQL Server 2005 (90) SQL Server 2012 2012: 110, 100, 90
SQL Server 2008/R2 (100) SQL Server 2014 2014: 120, 110, 100
SQL Server 2012 (110) SQL Server 2016 2016: 130, 120, 110
SQL Server 2014 (120) SQL Server 2018** 2018**: 140, 130, 120

Thomas LaRock covers some of this in a good checklist to look through when you’re considering an upgrade to SQL Server 2014.

* Any version of Microsoft SQL Server from 6.5 and higher.

** Based on current trends. Microsoft has not announced any product that will be available after SQL Server 2016.

Lazy Loading and Tries

This post has nothing to do with SQL Server, but if you like performance tuning, stick around.

I learn technology by using it, pushing it to its limits, finding out how it breaks. I jokingly refer to myself as a living edge case because I will do something with your product that you might not have expected.

(If you’re looking for a beta tester, I’m definitely your man.)

By the same token, I write a lot of applications (in C#, my language of choice) to test theories and learn new things.

I have created a word puzzle game (no, you’ll never see it in the wild) that has as its list of permitted words a dictionary of approximately 300,000 English words. The list is a text file in ANSI format and smaller than 3MB.

The puzzle game works along the lines of the Scrabble® board game, with a set of random letters assigned to each player.

The trick for me was to limit the possible words from the main dictionary, based on the current player’s letters, to validate against. Unfortunately, even holding the full dictionary in memory in a List<string>() object was very (very) slow to filter. It was taking up to 7 seconds each time the letters changed.

I wrote to my friend, André van der Merwe, to ask him for help. My goal was to find all possible words with each letter combination, in the fastest possible way, ordered by longest to shortest. Performance is a feature.

André suggested I use a trie to hold my word list. This is principally how autocomplete algorithms work, where each letter in a word is the root of one or more words starting with that same sequence of letters. The computer reduces the list of possible words by following the path down the tree.

Trie Example

(I also call this a radix tree, but André correctly informed me that it’s not quite the same thing. Radix trees are more compact.)

A trie would make my search for words extremely fast because one of the properties of a trie is that each letter has a Boolean value (true or false) if it is the final letter of a word.

Unfortunately, every time I switched to a new player in the game, the entire word list had to be loaded into memory to be filtered against. For four players, this needed 250MB of RAM because a trie uses a class for every letter, and my word list was consuming over 60MB.

I work with data a lot. This is after all a SQL Server blog. I realised I didn’t need the entire dictionary in memory, ever. My players get up to 12 letters to work with, so I could eliminate words longer than that, which meant I could filter the word list before even knowing the player’s letters. What if I filtered the length and the player’s letters at the same time as building the possible word list?

Clearly this was a problem for lazy loading, a design pattern that says you should only load data into memory when you need it.

My solution was to read the word list off persisted storage (the hard drive) one word at a time, and if it had 12 letters or less, and could be made up using the letters in the player’s set, only then would it be loaded into that player’s trie.

Problem solved! Each player’s trie loads in under 30ms off SSD storage and 70ms if the word list is loading off a spinning hard drive. Even better, the memory footprint for each trie is only 12MB. This is still much larger than the 2.7MB of the List<string>() object, but a good trade-off with performance.

For reference, I eventually used this C# implementation, which André and I adapted.

What coding performance tricks have you used lately? Let me know on Twitter, at @bornsql.