Don’t do these things in SQL Server

teacher at whiteboard

Recently Brent Ozar posted a link to the PostgreSQL “Don’t do this” page, which I am shamelessly reproducing below, re-tailored for a SQL Server audience.


Don’t use -P with sqlcmd

sqlcmd is a cross-platform interactive command-line utility that allows you to connect to a SQL Server (or Azure SQL Database) instance and perform database operations. One of the parameters is -P which allows you to supply a password.

Why not?

If sqlcmd requires a password at the command-line, it will ask for one. Generally speaking we should be using Active Directory authentication (which is supported on Windows and Linux), so specifying a command-line password is redundant in that case. As well, if we have a password hard-coded into a script, we’re doing it wrong.

When should we?

Never. It’s not worth it to save a network round-trip to the server.

 

Don’t use NOT IN

It’s a common practice to use the IN condition in a WHERE clause, so that we can limit our search arguments to specific values. However using NOT IN can have some unexpected results for the same reasons that we see in PostgreSQL.

Why not?

The first reason has to do with NULL values. Let’s use the same example as the PostgreSQL page:

This type of query will always return zero rows. It is impossible to know if something is NOT NULL because NULL is always unknown, thus NOT (NULL) returns NULL. The WHERE condition will never resolve to a valid value in the above example. Since col cannot be evaluated, the query optimizer simplifies that condition to a Constant Scan against an empty internal table.

The second reason why this is a bad idea is because it makes the query optimizer pick a really bad plan. Consider this example:

If one of the sub-query values is a NULL, this query will return zero rows as noted above. More importantly and perhaps surprisingly, the execution plan will use a Hash Match (Right Anti Semi Join). On smaller sets this won’t be noticeable, but it will matter a lot when you have more than a few thousand rows and your performance tanks, as a Hash Match is a very expensive join operator.

When should we?

If we use NOT IN, we should ensure our list of elements doesn’t contain a NULL value, and we should also avoid using a sub-query in the WHERE clause to avoid that blocking Hash Match.

 

Don’t use TIMESTAMP

We covered this in detail in a previous post, What about TIMESTAMP? It’s better to pretend that this data type doesn’t exist.

Why not?

It is not what you think it is. TIMESTAMP is actually a row version value based on the amount of time since SQL Server was started. If you need to record an actual date and time, use DATETIME2 instead.

When should we?

Never.

 

Don’t use SMALLMONEY, FLOAT or REAL for money

If we need to store values that represent amounts of money (or any values that require accuracy), avoid SMALLMONEY, FLOAT and REAL data types.

Why not?

In SQL Server and Azure SQL Database, the SMALLMONEY data type has a maximum range of -214,748.3648 to 214,748.3647. It’s clearly inappropriate for larger amounts. It may be tempting to use the MONEY data type instead, but be wary of rounding when performing calculations, since MONEY only stores four decimal places.

Speaking of rounding during calculations, FLOAT and REAL are terrible choices for calculations on any value that require accurate results. Floating point arithmetic will result in lower accuracy (sometimes significant), and on financial values this can cause incorrect results.

When should we?

Any time we have to deal with money, use the DECIMAL data type with an appropriate scale. In a pinch we could use the MONEY data type, but certainly never SMALLMONEY, FLOAT or REAL for money.

 

Don’t use BETWEEN

This is an easy one to get wrong, especially with dates.

Why not?

BETWEEN uses a closed-interval comparison: the values of both ends of the specified range are included in the result. That might sound like what we’d want, but consider the following example where we want to get all the values in the next week:

This will include results up to where the date and time is exactly 2018-08-08 00:00:00.0000000, but not dates and times later in that same day. We may not actually want any events that might have occurred at exactly midnight on the second value.

Instead, do:
SELECT * FROM table WHERE datecol >= '2018-08-01' AND datecol < '2018-08-08';
With this construction, we are getting all the values up to but not including midnight on the morning of August 8th.

When should we?

BETWEEN is safe for discrete quantities like integers or dates, as long as you remember that both ends of the range are included in the result. It is better to use combinations of >, = and < operators instead.

 

Summary

SQL Server and PostgreSQL have a lot in common, especially when it comes to what not to do. I’d love to hear your own rules for either platform, with a focus on SQL Server of course. Sound out in the comments, or find me on Twitter at @bornsql.

Photo by rawpixel on Unsplash.

7 thoughts on “Don’t do these things in SQL Server

  • I don’t think TIMESTAMP was intended for that use. We use it as a simple way to filter ETL processes. If the source and destination timestamps do not match then the source row changed. (Destination timestamp in this case is the one copied from, the source when the row was added to/modified on the destination.) Even if the destination row was updated by hand this test forces the row to come back to matching the true source. No date math or having to consider time (like your BETWEEN comments).

    • TIMESTAMP is the wrong name for the data type, per Microsoft’s own guidance. Call it ROWVERSION. Otherwise your comment is perfectly valid.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: