Look, Ma, No Surprises

Last week I demonstrated at least 30% performance improvement by switching to memory optimised table-valued parameters on SQL Server 2016.

This week I will demonstrate the same test using Azure SQL Database, on the Premium tier, where In-Memory OLTP is supported.

My test harness is the same. I will create a temp table and a table variable:

-- Temp table creation
CREATE TABLE #temptable
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

-- Table variable creation
DECLARE @tablevariable AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

Now I will create a standard table-valued parameter:

CREATE TYPE dbo.TVPStandard AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

DECLARE @TempTVPStandard AS dbo.TVPStandard;

Finally, I will create a memory-optimized table-valued parameter (there is no requirement to separately enable In-Memory OLTP in Azure SQL Database):

CREATE TYPE dbo.TVPInMemory AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY NONCLUSTERED)
WITH (MEMORY_OPTIMIZED = ON);

DECLARE @TempTVPMemory AS dbo.TVPInMemory;

So far, nothing is different from last week’s scenario. We now have the following structures at our disposal:

  • #temptable (a temp table)
  • @tablevariable (a table variable)
  • @TempTVPStandard (a standard TVP)
  • @TempTVPMemory (a memory-optimized TVP)

I’m going to use the same WHILE loop again, but instead of a million runs, I’ll do 1000, 10,000 and 100,000, because I’m paying for this instance of Azure SQL Database (I picked a Premium P1, with 125 DTUs) and I’m a cheapskate. I doubt the 125 DTUs is even enough to run a million times for the fourth option.

PRINT SYSUTCDATETIME();

SET NOCOUNT ON;

DECLARE @i INT = 0;
WHILE @i < <number of executions>
BEGIN
INSERT INTO <object name>
SELECT NEWID();
SELECT @i = @i + 1;
END;

SET NOCOUNT OFF;

PRINT SYSUTCDATETIME();

Results!

Like last week, at low row counts all four data structures performed around the same speed. Tempdb is shared with other Azure SQL Database customers, so I expected to see that slower.

I ran each series three times and took the lowest value from each run.

Data Structure 1,000 10,000 100,000 1,000,000
Temp Table (Clustered) 94 ms 453 ms 4,266 ms 44,955 ms
Table Variable (Clustered) 93 ms 344 ms 3,484 ms 34,673 ms
Standard TVP (Clustered) 94 ms 343 ms 3,500 ms 34,610 ms
Memory-Optimized TVP
(Non-Clustered)
78 ms 203 ms 1,797 ms No Time

Unsurprisingly, because Azure SQL Databases share tempdb with other customers, the IO-bound figures are higher than those on my dedicated laptop, no matter how old it is.

The big winner here, again, is the memory-optimized table-valued parameter. More than twice as fast as a temp table, and almost twice as fast as a table variable or standard TVP.

Note, however, that because my Azure SQL Database was resource-constrained, I was unable to run the one-million step WHILE loop.

This is an excellent example of how In-Memory OLTP is a trade-off, no matter whether you’re running on premises or in the cloud. While you do consistently get much better performance, it is not reliable for large data structures.

If you have any In-Memory OLTP tweaks you’d like to share, find me on Twitter at @bornsql.

How to make your T-SQL thirty percent faster

Last week, I mentioned a feature in SQL Server, and now in Azure SQL Database, called Memory-Optimized Table-Valued Parameters, which makes use of In-Memory OLTP structures to work around a dependency on tempdb, and thereby drastically improve performance.

Short version: We can see more than double speed improvement by switching from temp tables to Memory-Optimized TVPs, and a 30% speed improvement by switching from standard TVPs.

This week, I will compare the performance characteristics of the following three data structures:

  • temp tables (and table variables),
  • standard table-valued parameters, and
  • memory-optimised table-valued parameters.

These tests will be performed on my venerable Asus laptop, which is around four years old now, running SQL Server 2016 Developer Edition. It has 16GB of RAM, a 2.4GHz quad-core CPU, and an Intel 530-series SSD.

(Edit: With SQL Server 2016 Service Pack 1, you can now create Memory-Optimized objects in all editions of SQL Server.)

The test is very simple and makes use of a technique that one of my customers uses extensively: writing some rows to a temp table, so that another process can reuse those values as parameters, and perform an action, after which it destroys the temp table.

Unfortunately, the method my customer uses to populate a temp table, does not work on Azure SQL Database, because they explicitly refer to the temp tables with three-part naming (tempdb..#temptable), which is not permitted.

For the sake of the exercise, I will have a process that writes to a data structure and compare the times.

What is a Table-Valued Parameter anyway?

At its heart, a Table-Valued Parameter is a way to pass more than one row of data around in T-SQL. Instead of having many individual parameters in a stored procedure, or passing around values in temp tables, you could simply pass in the TVP, which is defined as a table type.

Warning sounds should already be going off in your head, because, yes, TVPs are just table variables. This means that they don’t have associated statistics and cannot be indexed (apart from a primary key). In other words, you do not want to use TVPs to pass in large amounts of data. Performance is going to suck.

So what do we know this far? Table variables, temp tables, and standard table-valued parameters are pretty much equivalent, in that they are backed by tempdb, which is persisted on the hard drive, which means we will be bound by I/O.

Memory-Optimized TVPs, by definition, are in-memory structures. By making use of the In-Memory OLTP engine, we know that they use optimistic concurrency and have no locks, only lightweight latches. This makes them fast. Because they reside in memory, they are compiled into native code, which means they’re even faster.

Create the test objects!

Let’s start with a temp table. For your amusement, I will be creating both a temp table and a table variable.

I’m lumping these together, because they’re almost the same thing under the covers for this scenario. The table is allocated on tempdb, data is inserted into it—which has to be fully logged (I/O bound)—and if for any reason the query spills to disk (and there are a lot of reasons, including sorts), it will incur an I/O penalty in the data file too.

I was not going to use any indexes or primary keys here, to try and keep the performance comparable, but memory-optimized data structures require indexing, so I’ve added a primary key. As we know, adding a clustered index, common on temp tables, will make it slower because it has to sort the table on the clustering key.

To make this even simpler, my sample data will only comprise one column. I’ve picked a bad example, because developers love GUIDs, but SQL Server does not.

-- Temp table creation
CREATE TABLE #temptable
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

-- Table variable creation
DECLARE @tablevariable AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

Now we have two tables backed by tempdb. One of them has statistics, and the other (depending on the version) has 1 or 100 rows, according to SQL Server. Fantastic.

Let’s create a standard table-valued parameter with the same structure:

CREATE TYPE dbo.TVPStandard AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

DECLARE @TempTVPStandard AS dbo.TVPStandard;

Now let’s create a memory-optimized table-valued parameter. Note that the major difference here is that we specify the MEMORY_OPTIMIZED state (there are some additional requirements for setting up In-Memory OLTP, which are outside of the scope of this post).

CREATE TYPE dbo.TVPInMemory AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY NONCLUSTERED)
WITH (MEMORY_OPTIMIZED = ON);

Note: I had to cheat here. You’ll notice that my previous data structures default to a clustered index on a primary key. Memory-Optimized Table-Valued Parameters cannot use a clustered index, but instead require a non-clustered index.

DECLARE @TempTVPMemory AS dbo.TVPInMemory;

We now have the following structures at our disposal:

  • #temptable (a temp table)
  • @tablevariable (a table variable)
  • @TempTVPStandard (a standard TVP)
  • @TempTVPMemory (a memory-optimized TVP)

Test the Performance!

For repeatable tests, I was going to use STATISTICS IO and STATISTICS TIME, and insert 100, then 1000, then 10,000 rows into each data structure.

I ran into a minor problem at first: I can’t use the GO <n> batch command to make things execute a lot of times, because table variables and TVPs are local only to the current batch. I had to rely on a WHILE loop.

After going through this a few times, I also couldn’t actually measure performance with STATISTICS TIME, because even my old laptop was too quick. It’s a common problem we see in performance tuning: small data structures are fast.

Instead, I used a start and end time, using SYSUTCDATETIME, and a WHILE loop, inserting ten thousand, then a hundred thousand, and then a million rows into each structure. A contrived example, sure, but one necessary to demonstrate how performance differs with larger rowsets.

PRINT SYSUTCDATETIME();

SET NOCOUNT ON;

DECLARE @i INT = 0;
WHILE @i < <number of executions>
BEGIN
INSERT INTO <object name>
SELECT NEWID();
SELECT @i = @i + 1;
END;

SET NOCOUNT OFF;

PRINT SYSUTCDATETIME();

Results

At low row counts, all four data structures performed around the same speed. Provided tempdb’s data and log files are correctly sized and don’t have to grow, it averages out.

Things got more interesting at 10,000 rows. I ran each series three times, and took the lowest value from each run.

Data Structure 10,000 100,000 1,000,000
Temp Table (Clustered) 291 ms 2,872 ms 29,110 ms
Table Variable (Clustered) 173 ms 1,708 ms 17,665 ms
Standard TVP (Clustered) 184 ms 1,694 ms 17,764 ms
Memory-Optimized TVP (Non-Clustered) 120 ms 1,174 ms 12,541 ms

Because of the type of query I’m running, the performance profile is linear. A million rows took 100x more time than ten thousand.

Unsurprisingly, the standard TVP and table variable performed the same, since they’re effectively the same data structure. Why the temp table is slower, is an exercise for a later blog post.

(When all three disk-bound structures used Non-Clustered Primary Keys, they were significantly slower than with clustered indexes. Performance with clustered indexes, versus no primary key at all, was the same.)

Keep in mind that these are row-by-row operations. If we were populating these structures from a set-based operation, it would be a lot faster. However, this demonstrates clearly how much more efficient a memory-optimized table-valued parameter is compared to the other data structures.

Caveat: memory-optimized TVPs reside exclusively in memory. We have to remember that it’s possible to run out of memory when using these structures. The last thing we want to see is the following error message:

The statement has been terminated.
Msg 701, Level 17, State 109, Line 92
There is insufficient system memory in resource pool 'default' to run this query.

Memory-Optimized Table-Valued Parameters are at least 30% faster than any of the other disk-bound data structures and more than twice as fast as temp tables specifically, but we need to ensure they are used correctly.

For more reading, check out the MSDN article “Faster temp table and table variable by using memory optimization”.

Next week, I will present the findings for this same test, but performed on an Azure SQL Database on the Premium Tier (because that’s where In-Memory OLTP lives).

If you have any performance tricks you would like to share, please find me on Twitter at @bornsql .

Memory-Optimized Table-Valued Parameters

November brings me back to my regular weekly posting schedule.

I attended PASS Summit 2016 last week, and saw a lot of great sessions. Aside from the new stuff, what I also noticed is how people are making use of existing technology to solve performance problems.

I have a customer that is targeting their vendor app to run in Azure SQL Database. They’ve run the Advisor and have discovered a number of places where they make use of temp tables. While this is not in itself a problem, how they check for their existence is a problem.

Azure SQL Database does not allow you to refer to tempdb by name. Any T-SQL construct referring to a temp table using the form tempdb..#table is explicitly forbidden.

One of the workarounds I recommended to them was to replace their use of temp tables with TVPs (table-valued parameters). There is no performance benefit to doing so, but it works.

Then I remembered that Azure SQL Database Premium Tier now supports In-Memory OLTP.

If you recall in a previous post, I suggested that Standard Tier and Premium Tiers are functionally equivalent to Standard Edition and Enterprise Edition respectively. This is no exception. You can create Memory Optimized TVPs in the Premium Tier of Azure SQL Database.

(Edit: With SQL Server 2016 Service Pack 1, you can now create Memory-Optimized objects in all editions of SQL Server.)

In other words, for these short-lived temp tables, there’s not only an added benefit of no longer worrying about referring to tempdb..#table, but you also get a massive performance improvement as well.

There are some legitimate concerns, though. Any Memory Optimized structures naturally require dedicated memory, so if you are already memory-bound on your Azure SQL Database (see the DTU post I wrote last month), this is going to hurt you. On the other hand, by not using tempdb at all, the performance improvement (as much as 100% in what I saw in the sessions last week) may offset that enough to warrant its use.

Next week, I will compare the performance characteristics of temp tables, standard TVPs, and Memory Optimised TVPs, both on-premises (using SQL Server 2016 Developer Edition, which is the same as Enterprise Edition) and Azure SQL Database (using a Premium Tier database).

If you have any other quick performance wins using new features of SQL Server 2016 or Azure SQL Database, find me on Twitter at @bornsql .

Azure SQL Database Limits

Let’s talk briefly about resource limits with Azure SQL Database. Because we have to share resources with other users, and because Microsoft doesn’t want us affecting others dramatically, they have implemented some limits.

If for some reason our database does get overeager, the operations relating to CPU, RAM and I/O will be queued up by the underlying resource manager, which will allow those operations to continue once the resources are free. Yes, that will slow down our system.

Resources other than CPU, Memory, Log I/O, and Data I/O, as noted by the documentation, will be denied, and clients will see an error message.

Microsoft uses something called DTUs (Database Throughput Units, a combination of CPU, RAM and log writes) to figure out our Azure SQL Database’s resource usage.

Check out the DTU Calculator to find out more, and see how your on-premises database compares. Next week, we will go into more detail about DTUs, and how to calculate the DTU usage of an on-premises database.

It gets really complicated, really quickly, but using the DTU Calculator before we decide to move to Azure SQL Database, and being conservative with our estimates, will make our lives easier.

The biggest an Azure SQL Database can be on the Standard Tier, is 250 GB. The biggest it can be on the Premium Tier, is 1000 GB.

Coincidentally, the SQL Server Customer Advisory Team (SQLCAT) wrote a blog post titled “Real-World Azure SQL DB: Unexpected Database Maximum Size Limit” last month, about a gotcha when upgrading between service tiers. It is recommended reading.

It helps to draw parallels between Standard Tier and Standard Edition of SQL Server, and Premium Tier and Enterprise Edition of SQL Server.

(Edit: With SQL Server 2016 Service Pack 1, you can now create Memory-Optimized objects in all editions of SQL Server.)

There is a Basic Tier for Azure SQL Database, but the resource limits are so strict that I barely consider it. Perhaps if you run a small MySQL database for a WordPress website, with a caching plugin, this may be appropriate.

Using elastic pools (the ability to share the count of resource DTUs between several databases in a pool, that shrinks and grows as your requirements change) can help with balancing costs across several databases, but it is going to seem more expensive at the outset.

The bottom line with Azure SQL Server is that the resource limits may take you by surprise if you don’t plan correctly. Spend the time to evaluate the requirements of your database, and don’t just use Azure SQL Database because it’s new and shiny. If you currently have an Enterprise licence for SQL Server, chances are you’ll spend the equivalent (or more) on a Premium Tier database.

Azure SQL Database is not a way to save money on your SQL Server licences. It is a different technology, with different resource requirements, that cater to a specific workload. Choose wisely.

If you have some thoughts about DTUs, come and talk to me on Twitter, at @bornsql.

Speaking at SQLSaturday 507 this weekend in Edmonton

On Saturday 9 April 2016, four days from today, I will be speaking at the first ever SQLSaturday in the province of Alberta.

Edmonton is the provincial capital, so it makes sense to have it there, though it’s a bit of a drive to get there from Calgary, where I live.

Fortunately, Red Deer is about halfway between the two cities, and I have a place where I can eat and buy some Rooibos tea in Gasoline Alley.

My talk will be similar to the one I gave last year at the Calgary SQL PASS User Group, The SQL Server Performance Starter Kit, which was well received by the attendees.

If you’re in the area, please come and say hello. It’ll be great fun, and you might even learn something.

You can find out more on the event’s home page: SQLSaturday #507.

Lazy Loading and Tries

This post has nothing to do with SQL Server, but if you like performance tuning, stick around.

I learn technology by using it, pushing it to its limits, finding out how it breaks. I jokingly refer to myself as a living edge case because I will do something with your product that you might not have expected.

(If you’re looking for a beta tester, I’m definitely your man.)

By the same token, I write a lot of applications (in C#, my language of choice) to test theories and learn new things.

I have created a word puzzle game (no, you’ll never see it in the wild) that has as its list of permitted words a dictionary of approximately 300,000 English words. The list is a text file in ANSI format and smaller than 3MB.

The puzzle game works along the lines of the Scrabble® board game, with a set of random letters assigned to each player.

The trick for me was to limit the possible words from the main dictionary, based on the current player’s letters, to validate against. Unfortunately, even holding the full dictionary in memory in a List<string>() object was very (very) slow to filter. It was taking up to 7 seconds each time the letters changed.

I wrote to my friend, André van der Merwe, to ask him for help. My goal was to find all possible words with each letter combination, in the fastest possible way, ordered by longest to shortest. Performance is a feature.

André suggested I use a trie to hold my word list. This is principally how autocomplete algorithms work, where each letter in a word is the root of one or more words starting with that same sequence of letters. The computer reduces the list of possible words by following the path down the tree.

Trie Example

(I also call this a radix tree, but André correctly informed me that it’s not quite the same thing. Radix trees are more compact.)

A trie would make my search for words extremely fast because one of the properties of a trie is that each letter has a Boolean value (true or false) if it is the final letter of a word.

Unfortunately, every time I switched to a new player in the game, the entire word list had to be loaded into memory to be filtered against. For four players, this needed 250MB of RAM because a trie uses a class for every letter, and my word list was consuming over 60MB.

I work with data a lot. This is after all a SQL Server blog. I realised I didn’t need the entire dictionary in memory, ever. My players get up to 12 letters to work with, so I could eliminate words longer than that, which meant I could filter the word list before even knowing the player’s letters. What if I filtered the length and the player’s letters at the same time as building the possible word list?

Clearly this was a problem for lazy loading, a design pattern that says you should only load data into memory when you need it.

My solution was to read the word list off persisted storage (the hard drive) one word at a time, and if it had 12 letters or less, and could be made up using the letters in the player’s set, only then would it be loaded into that player’s trie.

Problem solved! Each player’s trie loads in under 30ms off SSD storage and 70ms if the word list is loading off a spinning hard drive. Even better, the memory footprint for each trie is only 12MB. This is still much larger than the 2.7MB of the List<string>() object, but a good trade-off with performance.

For reference, I eventually used this C# implementation, which André and I adapted.

What coding performance tricks have you used lately? Let me know on Twitter, at @bornsql.

The ongoing saga of a slow tempdb on a fast server

Last week, I wrote about what should have been a fast system but was performing very poorly, weighing heavily on tempdb I/O performance.

Our original changes did not make much of a difference, but now we could rule out errant file growth. Our next step was to investigate how tempdb was being used by the application. There was one obvious culprit.

The warehouse frequently* runs a query to find out how much stock* they have on hand. This is a query that has to be almost instantaneous. In our best case scenario, it runs in seven to ten seconds, but it should be sub-second.

Did you notice the asterisks? Those are red flags: a big query, running frequently, and spilling to disk (which means spilling to tempdb).

Our first rule of performance tuning is to keep everything in memory. Disk is two to three orders of magnitude slower than memory. We need to make this query a thousand times faster.

The query itself is based on a view. Nothing wrong with that in principle, assuming you keep your table joins to a minimum (a rule of thumb is fewer than ten). I found 25 tables, two of which were derived tables.

After rewriting the two derived tables as common table expressions (CTEs) at the top of the query, it was much easier to see what was happening.

Asking For Too Much

The first derived table was a partition query using ROW_NUMBER to find records with the latest date by product type, and then fetching the first row. It makes use of a sort operator, which is a blocking operation. In many cases, sorts will spill to tempdb.

The second derived query was the one doing actual stock counts. It used a GROUP BY, which also requires a sort operation, and therefore potentially spilling to tempdb.

Twenty three other tables were joining in some way or another (mostly LEFT JOINs, some INNER JOINs) to the derived tables. Each of these joins was represented in the query plan as a hash join, which also spills to tempdb.

(Hash joins aren’t bad in and of themselves. They are an efficient way to join large rowsets, but they often force the data to spill to tempdb.)

Every time this query runs, the second derived table always reads 8 million rows, or 360MB worth of data, before joining to the first derived table.

Predicate Pushdown

Because both derived tables are only looking for a single row of data, based on the product being counted (the predicate), what we want is for the the predicate to be pushed down into both derived tables, right at the start of the query, so that any joins done on the tables are performed on a single row of data.

This is called predicate pushdown. In our case, the query optimizer could not find the most efficient query plan, and was timing out, so the second derived table never saw the predicate until the end.

Limiting the derived tables to one row also changes the joins in the plan from hash joins to nested loop joins, avoiding tempdb altogether, because nested loops are far more efficient for small rowsets.

The Fix

There is an extremely simple way to get the predicate pushed down to the derived tables, and that is to physically put it there. Unfortunately we cannot specify a value for the product in the view definition.

A stored procedure with a single @ProductID parameter would allow us to add WHERE ProductID = @ProductID to both derived tables, which would make the query really fast. In my testing, it ran in under 100ms.

The problem is that this would require numerous code changes to the existing system. Many of the queries that rely on the view also use additional JOINs and WHERE conditions, meaning that a stored procedure is impractical.

Enter the table-valued user-defined function, or TVF.

I copied the exact SELECT statement from the view definition, making two small changes: I added WHERE ProductID = @ProductID to both derived tables.

Then, using the output of the view as the basis of my table definition, I populated the top of the table-valued function with that.

So now, whenever we query the TVF, it builds a table definition in memory, runs the original query with its 20-odd joins, but limits the data to a single row, which is then returned in the same way if querying the view.

Outcome

  • Original query with a single ProductID: 46,636 logical reads, 10 seconds.
  • New query with a single ProductID: 35 logical reads, 109 milliseconds.
  • One thousand times faster: check.

98% of I/O is taken up by tempdb

During a recent engagement, the customer explained that the performance on their relatively small system was especially slow on Monday morning, but generally bad that entire day.

The SQL Server 2008 R2 instance, which runs in a virtual machine, has 30GB of RAM allocated, but the SQL Server memory usage never exceeds 16GB.

Digging deeper, we discovered that 98% of all I/O was consumed by tempdb, described by Brent Ozar as the public toilet of SQL Server.

The system makes extensive use of temporary tables and table variables in its queries.

The tempdb data file hovers around 2.5GB, and the log file is around 1.5GB. Certainly not a huge database, and puzzling why this small company is experiencing performance issues.

It all fits in memory! Which is fast!

We noticed that the SQL Server instance had been restarted the previous Sunday, as part of a maintenance cycle, and that the file allocation for tempdb was set to the defaults:

  • 8MB for data file, 10% growth
  • 1MB for log file, 10% growth

In other words, no matter how big the tempdb database was before the restart, it would always shrink down to that tiny size, needing to zero out the log file when it grew again.

To get to the size it was today, tempdb had to grow the data file 60 times and the log file 77 times. Each time it took 10% longer than before to grow, requiring 10% more space.

Of note is that SQL Server does not have any global trace flags enabled, in particular, TF1118.

Every half hour, a virtual machine snapshot is taken, freezing the I/O on the server as well.

Our resolution was as follows:

  • Set Windows Power Saving Mode to High Performance
  • Increase tempdb data files to match number of CPU cores
  • Change tempdb default file sizes to the current usage
  • Change tempdb growth to fixed value, appropriate for I/O subsystem
  • Shrink and resize tempdb log to current size
  • Enable Trace Flags 1118 and 1117 to improve data file contention
  • Switch from VM snapshots to Full Recovery Mode and enable transaction log backups

We will have to wait to find out what impact these changes had on the environment over the course of a week. I’ll report back next time.

Max Server Memory Matrix

When we perform diagnostics on an existing SQL Server instance, or if we are setting up a new machine, it’s very important to make sure that we change the Max Server Memory setting to an appropriate level.

If we don’t, then there’s a good chance (especially on Enterprise Edition) that SQL Server will use all the available memory on the server. Unfortunately, this means the operating system can be starved of RAM, which creates a performance bottleneck.

Additionally, other components (including other applications, services, and hardware on the machine) may have specific memory requirements, which need to be taken into account.

To get a baseline, I use a formula developed by Jonathan Kehayias (blog | twitter) which he wrote in his book, Troubleshooting SQL Server: A Guide for the Accidental DBA, and repeated in his blog post, “How much memory does my SQL Server actually need?”.

After many months of confusing myself with mental arithmetic, I decided to create a straightforward list based on Jonathan’s algorithm, in 4 GB increments, for my own reference. I would like to share this with you today.

You can bask in the glory that is my Max Server Memory Matrix at https://bornsql.ca/memory/.

Next time you (or I) need to check if the Max Server Memory setting is too high, simply open that link and check the matrix. Over time I will make it look prettier, but for now it is already helping one of my clients.

Let me know what you think on Twitter, with @bornsql.

Forecast: Azure Skies Over the Amazon

Microsoft Azure and Amazon Web Services are taking over.

Last week, a client approached me to migrate their complex environment in two phases from two physical data centers, first to Azure VMs, and ultimately to an Azure SQL Database, doing away with logical servers completely.

In my mind, this trend is only going to accelerate. There are obvious exclusions that I won’t go into around privacy, large data and so on, but in general, I see a mass migration to “the Cloud”.

The primary motivator for virtualisation has always been cost. For example, Azure Blob Storage offers fully redundant global replication, where your data is copied to two other centers in other parts of the world, for a very small price bump and zero additional administration. In a traditional hosted environment, you would be looking at a significant outlay for ensuring global redundancy, not to mention maintaining and administering it.

A trade-off of virtualisation is performance. Virtual machines, by their nature, incur at least a 5% resource overhead, and if set up incorrectly or over-provisioned, can cut performance by as much as 50%. That is slowly changing as we become wise to the needs of SQL Server in a virtual environment, and Microsoft is driving the recommendations.

Licensing is also changing as a result of the Cloud. When setting up a traditional environment, software licenses can easily account for 80% or more of your initial outlay, depending on number of CPU cores, SQL Server edition, and so on.

With Azure and Amazon, just as you would pay a monthly fee for a Windows VM, you can now pay an additional monthly fee for a SQL Server license, that (depending on configuration) could take up to six years to become more expensive than buying a SQL Server Enterprise license outright. Plus, in those six years, you can get the latest version of the product as quickly as it takes to spin up a new virtual machine.

An operational expense like that makes developing for a SQL Server platform very attractive for developers, especially for companies that don’t necessarily have capital to pay hundreds of thousands of dollars on licensing.

It behooves us, as data professionals and consultants, to understand these risk factors, to ensure that our customers can get the most bang for their buck. If they want performance, and their budget is limited, Azure virtual machines, or even database-as-a-service, might be the best thing for them.

Yes, the Cloud is just someone else’s data center, but Microsoft and Amazon are lowering the barrier to entry for a number of businesses, large and small, to scale up their environment and help grow their businesses. It would be foolish to dismiss it out of hand.

If I ran a large data center, I’d be concerned. Take me to the clouds above.