Testing for Object Existence: CREATE OR ALTER

For the longest time, T-SQL writers have had to wrestle with ways of testing for an object’s existence so that it can either be dropped and recreated, or modified as needed.

Last week we covered the new DROP ... IF EXISTS syntax. This week goes into how we handle changes to objects.

We’ve spent many hours of our lives fighting with an object existence check so that ALTER commands don’t fail.

The common implementation pattern now is to CREATE a dummy object with the appropriate name and then use ALTER to write the actual code. That way, future changes can just be done by using the ALTER keyword.

We can see this in a famous example: sp_WhoIsActive, an extremely popular stored procedure written by Adam Machanic, which I highly recommend installing when setting up a new SQL Server instance.

USE master;
GO

IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_NAME = 'sp_WhoIsActive')
EXEC ('CREATE PROC dbo.sp_WhoIsActive AS
SELECT ''stub version, to be replaced''');
GO
...
ALTER PROC dbo.sp_WhoIsActive (...);

Look at all that unnecessary code. It’s messy; it’s prone to errors. There are so many ways to do it, which makes it inconsistent, meaning that automatically searching through a code base isn’t reliable. Additionally, if we miss a single quotation mark, the entire script fails.

In a word: Ugh!

What happens if we forget to run the stub first? What happens if we have an existing object and run the CREATE accidentally, then we have to manually change it to an ALTER

(Side note: I spent a good few minutes one day a few years ago not remembering the ALTER keyword.)

SQL Server 2016 Service Pack 1 has finally added in a feature that many DBAs and database developers have been clamouring for: CREATE OR ALTER.

In the above example, the entire IF NOT EXISTS section can be replaced with:

USE master;
GO

CREATE OR ALTER PROC dbo.sp_WhoIsActive (...);

While this has been a long time coming, causing many anguished cries from people writing T-SQL scripts over the years, we are going to love using this new, small, yet significant, syntax.

Share your best ALTER story with me on Twitter, at @bornsql.

Testing for Object Existence: DROP … IF EXISTS

For the longest time, T-SQL writers have had to wrestle with ways of testing for an object’s existence so that it can either be dropped and recreated, or modified as needed.

This is especially common in the case of temp tables or table variables. If the object already exists when our script runs, the script will fail and leave our workflow in an inconsistent state.

Consider this script:

IF OBJECT_ID('tempdb..#doTheThing') IS NOT NULL
DROP TABLE #doTheThing;

CREATE TABLE #doTheThing (
ImportantColumn1 BIGINT,
ImportantColumn2 TINYINT,
ImportantColumn3 NVARCHAR(255)
);

With SQL Server 2016, we can now do this:

DROP TABLE IF EXISTS #doTheThing;

CREATE TABLE #doTheThing (
ImportantColumn1 BIGINT,
ImportantColumn2 TINYINT,
ImportantColumn3 NVARCHAR(255)
);

DROP ... IF EXISTS can be used on many objects, including DATABASE, FUNCTION, INDEX, PROCEDURE, ROLE, SCHEMA, SEQUENCE, SYNONYM, TABLE, TRIGGER, TYPE, USER and VIEW.

Share your object existence check nightmares with me on Twitter at @bornsql.

Temporal Tables and Hidden Period Columns

In my November 2015 post, An Introduction to Temporal Tables in SQL Server 2016 using a DeLorean, I wrote:

The HIDDEN property is optional and will hide these columns from a standard SELECT statement for backward compatibility with our application and queries. You cannot apply the HIDDEN property to an existing column.

It turns out that this is no longer true. You can apply the HIDDEN property to an existing period column.

Let’s assume you have a temporal table containing two visible period columns, StartDate and EndDate, which you’d like to hide from a typical SELECT statement.

Using an ALTER TABLE ... ALTER COLUMN statement, simply place the ADD HIDDEN syntax after the period column name(s).

ALTER TABLE [dbo].[Account] ALTER COLUMN [StartDate] ADD HIDDEN;
ALTER TABLE [dbo].[Account] ALTER COLUMN [EndDate] ADD HIDDEN;

You can also remove this flag if you wish, using DROP HIDDEN:

ALTER TABLE [dbo].[Account] ALTER COLUMN [StartDate] DROP HIDDEN;
ALTER TABLE [dbo].[Account] ALTER COLUMN [EndDate] DROP HIDDEN;

This is a great improvement to an already fantastic feature of SQL Server 2016. Thanks to Borko Novakovic for this tip.

If you have any more temporal table tricks you want to share, find me on Twitter at @bornsql.

Temporal Tables and History Retention

I’m a huge fan of Temporal Tables in SQL Server 2016. I first wrote about them, in a four-part series in November 2015, before SQL Server was even released. I don’t always get this excited about new features.

However, it has some limitations. As part of this week’s T-SQL Tuesday, hosted by the attractive and humble Brent Ozar, I have discovered a Microsoft Connect item I feel very strongly about.

Adam Machanic, the creator of an indispensable tool, sp_WhoIsActive, has created a Connect item entitled Temporal Tables: Improve History Retention of Dropped Columns.

As my readers know, temporal tables have to have the same schema as their base tables (the number and order of columns, and their respective data types, have to match).

Where this breaks down is when a table structure has changed on the base table. The history table also needs to take those changes into account, which could potentially result in data loss or redundant columns in the base table.

Adam suggests allowing columns which no longer appear in the base table to be retained in the history table and marked as nullable (or hidden), and should only appear when performing a point-in-time query by referring to the column(s) explicitly.

I have voted for this suggestion, and at the time of writing, it has 16 upvotes. I encourage you to add your voice to this suggestion.

If you have any other suggestions, or wish to discuss temporal tables, please contact me on Twitter at @bornsql .

Updated Max Server Memory Script

Earlier this year I released a free T-SQL script that will calculate the correct amount of RAM you should allocate to SQL Server, assuming it is a standalone instance.

After attending the PASS Summit in Seattle in October, I visited the SQL Server Tiger team’s GitHub repository and discovered something similar, but not quite the same, in the Maintenance Solution folder.

I have taken the best ideas from their Database Server Options script and merged them into my Max Server Memory Calculator script.

New Features

The SQL Server thread stack is now taken into account. This value depends on the CPU architecture (32-bit, or x64 / IA64) and the maximum worker threads configured for the SQL Server instance.

On my 64-bit laptop with 16GB RAM, the new recommended amount for Max Server Memory has dropped from 11,264 MB to 10,112 MB (1,125 MB of RAM is now reserved for the thread stack).

Improvements

By default, the generated script will enable show advanced options before trying to set the max server memory (MB) value.

The @ProductVersion parameter uses a new method to calculate the major SQL Server version.  Previously it was a hack based on the string returned by the @@VERSION function, but now it uses the @@MICROSOFTVERSION function.

This code is also from the Tiger team’s repository, and I’m sharing it here because I think it’s pretty clever how it works.

-- Get SQL Server Major Version
SELECT CONVERT(INT, (@@MICROSOFTVERSION / 0x1000000) & 0xFF);

I have also added a note on the Max Server Memory Matrix page to note that the script now accounts for the thread stack.

I hope you enjoy this new version of the script. If you have any comments or suggestions, please contact me on Twitter at @bornsql .

Look, Ma, No Surprises

Last week I demonstrated at least 30% performance improvement by switching to memory optimised table-valued parameters on SQL Server 2016.

This week I will demonstrate the same test using Azure SQL Database, on the Premium tier, where In-Memory OLTP is supported.

My test harness is the same. I will create a temp table and a table variable:

-- Temp table creation
CREATE TABLE #temptable
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

-- Table variable creation
DECLARE @tablevariable AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

Now I will create a standard table-valued parameter:

CREATE TYPE dbo.TVPStandard AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

DECLARE @TempTVPStandard AS dbo.TVPStandard;

Finally, I will create a memory-optimized table-valued parameter (there is no requirement to separately enable In-Memory OLTP in Azure SQL Database):

CREATE TYPE dbo.TVPInMemory AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY NONCLUSTERED)
WITH (MEMORY_OPTIMIZED = ON);

DECLARE @TempTVPMemory AS dbo.TVPInMemory;

So far, nothing is different from last week’s scenario. We now have the following structures at our disposal:

  • #temptable (a temp table)
  • @tablevariable (a table variable)
  • @TempTVPStandard (a standard TVP)
  • @TempTVPMemory (a memory-optimized TVP)

I’m going to use the same WHILE loop again, but instead of a million runs, I’ll do 1000, 10,000 and 100,000, because I’m paying for this instance of Azure SQL Database (I picked a Premium P1, with 125 DTUs) and I’m a cheapskate. I doubt the 125 DTUs is even enough to run a million times for the fourth option.

PRINT SYSUTCDATETIME();

SET NOCOUNT ON;

DECLARE @i INT = 0;
WHILE @i < <number of executions>
BEGIN
INSERT INTO <object name>
SELECT NEWID();
SELECT @i = @i + 1;
END;

SET NOCOUNT OFF;

PRINT SYSUTCDATETIME();

Results!

Like last week, at low row counts all four data structures performed around the same speed. Tempdb is shared with other Azure SQL Database customers, so I expected to see that slower.

I ran each series three times and took the lowest value from each run.

Data Structure 1,000 10,000 100,000 1,000,000
Temp Table (Clustered) 94 ms 453 ms 4,266 ms 44,955 ms
Table Variable (Clustered) 93 ms 344 ms 3,484 ms 34,673 ms
Standard TVP (Clustered) 94 ms 343 ms 3,500 ms 34,610 ms
Memory-Optimized TVP
(Non-Clustered)
78 ms 203 ms 1,797 ms No Time

Unsurprisingly, because Azure SQL Databases share tempdb with other customers, the IO-bound figures are higher than those on my dedicated laptop, no matter how old it is.

The big winner here, again, is the memory-optimized table-valued parameter. More than twice as fast as a temp table, and almost twice as fast as a table variable or standard TVP.

Note, however, that because my Azure SQL Database was resource-constrained, I was unable to run the one-million step WHILE loop.

This is an excellent example of how In-Memory OLTP is a trade-off, no matter whether you’re running on premises or in the cloud. While you do consistently get much better performance, it is not reliable for large data structures.

If you have any In-Memory OLTP tweaks you’d like to share, find me on Twitter at @bornsql.

How to make your T-SQL thirty percent faster

Last week, I mentioned a feature in SQL Server, and now in Azure SQL Database, called Memory-Optimized Table-Valued Parameters, which makes use of In-Memory OLTP structures to work around a dependency on tempdb, and thereby drastically improve performance.

Short version: We can see more than double speed improvement by switching from temp tables to Memory-Optimized TVPs, and a 30% speed improvement by switching from standard TVPs.

This week, I will compare the performance characteristics of the following three data structures:

  • temp tables (and table variables),
  • standard table-valued parameters, and
  • memory-optimised table-valued parameters.

These tests will be performed on my venerable Asus laptop, which is around four years old now, running SQL Server 2016 Developer Edition. It has 16GB of RAM, a 2.4GHz quad-core CPU, and an Intel 530-series SSD.

(Edit: With SQL Server 2016 Service Pack 1, you can now create Memory-Optimized objects in all editions of SQL Server.)

The test is very simple and makes use of a technique that one of my customers uses extensively: writing some rows to a temp table, so that another process can reuse those values as parameters, and perform an action, after which it destroys the temp table.

Unfortunately, the method my customer uses to populate a temp table, does not work on Azure SQL Database, because they explicitly refer to the temp tables with three-part naming (tempdb..#temptable), which is not permitted.

For the sake of the exercise, I will have a process that writes to a data structure and compare the times.

What is a Table-Valued Parameter anyway?

At its heart, a Table-Valued Parameter is a way to pass more than one row of data around in T-SQL. Instead of having many individual parameters in a stored procedure, or passing around values in temp tables, you could simply pass in the TVP, which is defined as a table type.

Warning sounds should already be going off in your head, because, yes, TVPs are just table variables. This means that they don’t have associated statistics and cannot be indexed (apart from a primary key). In other words, you do not want to use TVPs to pass in large amounts of data. Performance is going to suck.

So what do we know this far? Table variables, temp tables, and standard table-valued parameters are pretty much equivalent, in that they are backed by tempdb, which is persisted on the hard drive, which means we will be bound by I/O.

Memory-Optimized TVPs, by definition, are in-memory structures. By making use of the In-Memory OLTP engine, we know that they use optimistic concurrency and have no locks, only lightweight latches. This makes them fast. Because they reside in memory, they are compiled into native code, which means they’re even faster.

Create the test objects!

Let’s start with a temp table. For your amusement, I will be creating both a temp table and a table variable.

I’m lumping these together, because they’re almost the same thing under the covers for this scenario. The table is allocated on tempdb, data is inserted into it—which has to be fully logged (I/O bound)—and if for any reason the query spills to disk (and there are a lot of reasons, including sorts), it will incur an I/O penalty in the data file too.

I was not going to use any indexes or primary keys here, to try and keep the performance comparable, but memory-optimized data structures require indexing, so I’ve added a primary key. As we know, adding a clustered index, common on temp tables, will make it slower because it has to sort the table on the clustering key.

To make this even simpler, my sample data will only comprise one column. I’ve picked a bad example, because developers love GUIDs, but SQL Server does not.

-- Temp table creation
CREATE TABLE #temptable
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

-- Table variable creation
DECLARE @tablevariable AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

Now we have two tables backed by tempdb. One of them has statistics, and the other (depending on the version) has 1 or 100 rows, according to SQL Server. Fantastic.

Let’s create a standard table-valued parameter with the same structure:

CREATE TYPE dbo.TVPStandard AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY);

DECLARE @TempTVPStandard AS dbo.TVPStandard;

Now let’s create a memory-optimized table-valued parameter. Note that the major difference here is that we specify the MEMORY_OPTIMIZED state (there are some additional requirements for setting up In-Memory OLTP, which are outside of the scope of this post).

CREATE TYPE dbo.TVPInMemory AS TABLE
(ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY NONCLUSTERED)
WITH (MEMORY_OPTIMIZED = ON);

Note: I had to cheat here. You’ll notice that my previous data structures default to a clustered index on a primary key. Memory-Optimized Table-Valued Parameters cannot use a clustered index, but instead require a non-clustered index.

DECLARE @TempTVPMemory AS dbo.TVPInMemory;

We now have the following structures at our disposal:

  • #temptable (a temp table)
  • @tablevariable (a table variable)
  • @TempTVPStandard (a standard TVP)
  • @TempTVPMemory (a memory-optimized TVP)

Test the Performance!

For repeatable tests, I was going to use STATISTICS IO and STATISTICS TIME, and insert 100, then 1000, then 10,000 rows into each data structure.

I ran into a minor problem at first: I can’t use the GO <n> batch command to make things execute a lot of times, because table variables and TVPs are local only to the current batch. I had to rely on a WHILE loop.

After going through this a few times, I also couldn’t actually measure performance with STATISTICS TIME, because even my old laptop was too quick. It’s a common problem we see in performance tuning: small data structures are fast.

Instead, I used a start and end time, using SYSUTCDATETIME, and a WHILE loop, inserting ten thousand, then a hundred thousand, and then a million rows into each structure. A contrived example, sure, but one necessary to demonstrate how performance differs with larger rowsets.

PRINT SYSUTCDATETIME();

SET NOCOUNT ON;

DECLARE @i INT = 0;
WHILE @i < <number of executions>
BEGIN
INSERT INTO <object name>
SELECT NEWID();
SELECT @i = @i + 1;
END;

SET NOCOUNT OFF;

PRINT SYSUTCDATETIME();

Results

At low row counts, all four data structures performed around the same speed. Provided tempdb’s data and log files are correctly sized and don’t have to grow, it averages out.

Things got more interesting at 10,000 rows. I ran each series three times, and took the lowest value from each run.

Data Structure 10,000 100,000 1,000,000
Temp Table (Clustered) 291 ms 2,872 ms 29,110 ms
Table Variable (Clustered) 173 ms 1,708 ms 17,665 ms
Standard TVP (Clustered) 184 ms 1,694 ms 17,764 ms
Memory-Optimized TVP (Non-Clustered) 120 ms 1,174 ms 12,541 ms

Because of the type of query I’m running, the performance profile is linear. A million rows took 100x more time than ten thousand.

Unsurprisingly, the standard TVP and table variable performed the same, since they’re effectively the same data structure. Why the temp table is slower, is an exercise for a later blog post.

(When all three disk-bound structures used Non-Clustered Primary Keys, they were significantly slower than with clustered indexes. Performance with clustered indexes, versus no primary key at all, was the same.)

Keep in mind that these are row-by-row operations. If we were populating these structures from a set-based operation, it would be a lot faster. However, this demonstrates clearly how much more efficient a memory-optimized table-valued parameter is compared to the other data structures.

Caveat: memory-optimized TVPs reside exclusively in memory. We have to remember that it’s possible to run out of memory when using these structures. The last thing we want to see is the following error message:

The statement has been terminated.
Msg 701, Level 17, State 109, Line 92
There is insufficient system memory in resource pool 'default' to run this query.

Memory-Optimized Table-Valued Parameters are at least 30% faster than any of the other disk-bound data structures and more than twice as fast as temp tables specifically, but we need to ensure they are used correctly.

For more reading, check out the MSDN article “Faster temp table and table variable by using memory optimization”.

Next week, I will present the findings for this same test, but performed on an Azure SQL Database on the Premium Tier (because that’s where In-Memory OLTP lives).

If you have any performance tricks you would like to share, please find me on Twitter at @bornsql .

Memory-Optimized Table-Valued Parameters

November brings me back to my regular weekly posting schedule.

I attended PASS Summit 2016 last week, and saw a lot of great sessions. Aside from the new stuff, what I also noticed is how people are making use of existing technology to solve performance problems.

I have a customer that is targeting their vendor app to run in Azure SQL Database. They’ve run the Advisor and have discovered a number of places where they make use of temp tables. While this is not in itself a problem, how they check for their existence is a problem.

Azure SQL Database does not allow you to refer to tempdb by name. Any T-SQL construct referring to a temp table using the form tempdb..#table is explicitly forbidden.

One of the workarounds I recommended to them was to replace their use of temp tables with TVPs (table-valued parameters). There is no performance benefit to doing so, but it works.

Then I remembered that Azure SQL Database Premium Tier now supports In-Memory OLTP.

If you recall in a previous post, I suggested that Standard Tier and Premium Tiers are functionally equivalent to Standard Edition and Enterprise Edition respectively. This is no exception. You can create Memory Optimized TVPs in the Premium Tier of Azure SQL Database.

(Edit: With SQL Server 2016 Service Pack 1, you can now create Memory-Optimized objects in all editions of SQL Server.)

In other words, for these short-lived temp tables, there’s not only an added benefit of no longer worrying about referring to tempdb..#table, but you also get a massive performance improvement as well.

There are some legitimate concerns, though. Any Memory Optimized structures naturally require dedicated memory, so if you are already memory-bound on your Azure SQL Database (see the DTU post I wrote last month), this is going to hurt you. On the other hand, by not using tempdb at all, the performance improvement (as much as 100% in what I saw in the sessions last week) may offset that enough to warrant its use.

Next week, I will compare the performance characteristics of temp tables, standard TVPs, and Memory Optimised TVPs, both on-premises (using SQL Server 2016 Developer Edition, which is the same as Enterprise Edition) and Azure SQL Database (using a Premium Tier database).

If you have any other quick performance wins using new features of SQL Server 2016 or Azure SQL Database, find me on Twitter at @bornsql .

Automation is the new Black

I’ve been toying with the idea of automating a set of diagnostic scripts that I run on customer sites, when doing my checkup. It mirrors the automation that a lot of consulting shops do, but when I run them, I like to spend more time with the customer explaining what each script does and why it only forms a small part of a big picture.

That way, when I present my findings, a lot of the key points have already been covered, and it’s extremely effective (and comforting) to see a customer understand instantly what I’m referring to and ask questions relating directly to that issue.

Although I call that “face time”, it’s more accurately described as “side time” because the customer is sitting beside me, watching me run scripts and dump the results into Excel, talking a lot, speaking with my hands, and so on. Numbers start to blur and they stop caring about what is obviously a very important problem if I’m being paid to figure it out.

It does get a bit overwhelming for the customer, though, especially if they aren’t technically inclined. This is obviously not an efficient use of our time.

So I’m changing things up a little from my side. I’ve figured out how to automate the diagnostics in such a way that I can let them run for about 15 minutes in total (automation means I don’t have to run them in sequence, I can run them all at the same time with a random delay), and once they’re done, produce what is effectively a highlight reel.

Then I can use the rest of the hour to go through these results and explain, using graphs if necessary, the most interesting things I can see. Normally the customer has to wait as long as a day until I produce a 16- to 20-page document, but this way they can see things almost instantly, and if there’s something important to discuss, we can do it immediately.

My consulting style is very conversational (much like this blog), because I want my customer to understand, if not in the same detail as I do (though a significant percentage do), at least what to look for when troubleshooting performance problems and identifying issues with maintenance and disaster recovery planning.

For competitive reasons I can’t disclose my full methodology (I still have to eat!), but I thought I would share some consulting mindshare for a change.

A dalliance with distance

During a recent engagement, I was tasked with making a query faster. Since I signed an NDA (Non-Disclosure Agreement), I can’t go into detail, but it made extensive use of the STDistance() function of the GEOGRAPHY data type.

The query is being called many times a second, and owing to the way the application was designed, is effectively running in a cursor. For each row in a table that contains GPS coordinates, it has to calculate how far the principle GPS coordinates are from that row’s latitude and longitude.

As you can imagine, even with only a few hundred rows, this does not scale well. The database had hundreds of thousands of rows that needed to be calculated each time the query ran because these values are constantly changing.

The data was then being ordered on the STDistance result so that the nearest neighbour showed up first.

After some investigation, I discovered that the spatial index on the coordinates column was not set up correctly. However, we were unable to schedule a maintenance window to rebuild that index. Therefore, I was tasked to make the query faster without modifying any indexes.

To begin with, removing the ORDER BY clause made the query run much more quickly, even though it was running in a loop. After removing the looping structure and refactoring the query to be set based, it ran even more quickly. No surprises there, but the result set was now much larger.

However, it was still causing blocking on the table where coordinates were being stored, and the returning data was unsorted.

Enter the Great Circle Rule. In principle, it states that any distance on the surface of a sphere can be calculated using a simple mathematical formula (known as the Haversine Formula). If there’s one thing I know about computers, it’s that they perform mathematical formulas really well.

The haversine formula is not simple to express in a T-SQL query. Instead, we can use the spherical law of cosines, which is not as accurate (there is a maximum drift of around 2 metres, or a 0.3% margin of error).

(cos c = cos a cos b + sin a sin b cos C)

The advantage of this cosine law is that it’s one line of code, which means it can be used as a drop-in replacement for STDistance.

The customer agreed that for the purpose of this query, 0.3% was well within an acceptable error range. I helped the lead developer rewrite the query to replace STDistance value with the cosine formula, which increased performance by a factor of 1000 and did not need to use the spatial index at all.

In purely T-SQL terms, this looks as follows (the [Latitude] and [Longitude] are columns in the table I’m querying):

-- Latitude of source
DECLARE @Latitude FLOAT(53) = 34.09833
-- Longitude of source
DECLARE @Longitude FLOAT(53) = -118.32583
-- Diameter of the earth, in miles
DECLARE @Diameter FLOAT(53) = 3959

SELECT [ColPK], [Latitude], [Longitude],
ACOS(COS(RADIANS(90 - @Latitude)) * COS(RADIANS(90 - [Latitude]))
+ SIN(RADIANS(90 - @Latitude)) * SIN(RADIANS(90 - [Latitude]))
* COS(RADIANS(@Longitude - [Longitude]))) * @Diameter AS [Distance]
FROM [InterestingTable]

(Source: http://gis.stackexchange.com/a/40929)

Now of course this can be ordered and sorted as required. Because the distance is expressed in miles, we can filter the results to only display values that fall within a specific radius from the source, making the query run even faster.

If you have any neat geography data things to share, find me on Twitter at @bornsql .