How to write an UPDATE query

My First UPDATE Statement

Last week we covered how to put information into a table using an INSERT statement.

This week we will learn how to make changes to data that is already in a table using an UPDATE statement. We are also going to learn all about why the WHERE clause is so important.

UPDATE

Updating information stored in a database can be as simple as identifying which column to update and specifying the new data to write in that column:

UPDATE [dbo].[Stores]
SET [StoreName] = N'West Edmonton Mall';
GO

The problem with this particular UPDATE statement, though, is that it will update every row, when we may only want to change one record. That’s why we need to add WHERE. The purpose of a WHERE clause is to restrict the data modification using a filter or limit.

The WHERE clause

In ANSI SQL (remember, ANSI is the standards organisation that defines how SQL works with different database platforms), a WHERE clause goes somewhere near the end of a query, to filter or limit the data being affected.

The WHERE keyword can be used on all four basic data manipulation queries: SELECT, INSERT, UPDATE and DELETE. The reason we’re only learning about it today, is that it is a critical part of the UPDATE and DELETE statements.

Limiting the damage

An UPDATE statement without a WHERE condition, or clause, will update the entire table.

Consider these two examples, and imagine what is happening in the table:

-- Without a WHERE clause
UPDATE [dbo].[Stores]
SET [StoreName] = N'West Edmonton Mall';
GO

-- With a WHERE clause
UPDATE [dbo].[Stores]
SET [StoreName] = N'West Edmonton Mall'
WHERE [StoreID] = 2;
GO;

In the first example, every single row in our table will now have ‘West Edmonton Mall’ for the store name. In the second example, only the row (or rows) that match the condition in the WHERE clause will be updated.

Notice how the WHERE condition relates to a value in the [StoreID] column, which is this table’s Primary Key. When updating individual rows in a table, it’s always better to use a unique value to guarantee that we only update a single row.

We could use a WHERE condition on a different column, or combinations of columns, using AND and OR logic statements. WHERE clauses can be extremely complex.

Note: the UPDATE statement will update whatever we tell it to. Even though the original value of the column was ‘West Edmonton Mall’, the database engine will happily update that value again and again, to the same string, if we ask it to.

SET

Astute readers will recognise a new keyword in the UPDATE statement: SET.

The first part of an UPDATE specifies the table we wish to modify.

Then, the SET keyword specifies the column we want to change. We can add more than one column to our SET clause, separated by commas, if we have more than one column in the same row (or rows) that we wish to update.

For instance, let’s assume the West Edmonton Mall store has a new manager. We can modify the [ManagerName] and [ManagerEmail] columns at the same time.

UPDATE [dbo].[Stores]
SET [ManagerName] = N'Wesley Wensleydale',
[ManagerEmail] = N'wesley@example.com'
WHERE [StoreID] = 2;
GO;

This operation, where more than one thing is being changed at the same time, is called a set-based operation. In other words, a set of data is being updated at once, instead of writing an UPDATE statement for every single column.

Set-based operations can run on the entire table, or on a filtered set of data specified by the WHERE clause.

Ambiguity Verboten

Unlike the INSERT statement, where column names were implied, an UPDATE statement has to explicitly list the column (or columns) that we are updating. In our examples above, we had [StoreName], [ManagerName] and [ManagerEmail]. This is because there can be absolutely no ambiguity when modifying data.

This is also why a WHERE clause is so important. I have personally run an UPDATE or DELETE without adding a WHERE clause, and it happens a lot in this field. Make sure to add a WHERE keyword before writing the rest of the statement.

Some tools that plug into SQL Server Management Studio will detect if we have missed a WHERE clause on an UPDATE or DELETE, but we can’t always rely on plugins. For example, we might have to use a different computer one day, or write our SQL code in a text editor like Notepad, and only good habits can avoid disaster.

The Final Results

If you’re following along in SQL Server Management Studio, we can run the three-column UPDATE statement like so:

UPDATE [dbo].[Stores]
SET [StoreName] = N'West Edmonton Mall',
[ManagerName] = N'Wesley Wensleydale',
[ManagerEmail] = N'wesley@example.com'
WHERE [StoreID] = 2;
GO

Once we have executed the statement (using F5 on the keyboard, or clicking the Execute button in the toolbar), we see a message in the Messages pane:

(1 row(s) affected)

Using our recently-learned skill to SELECT the data from this table, we will see the new result set, containing the new row:

Congratulations! We have modified a row of data in our table, and a SELECT query proved that it was inserted.

Next time, we will be removing data from a table using the DELETE command. Stay tuned.

Look me up on Twitter if you want to ask any questions about this series, at @bornsql.

How to write an INSERT query

My First INSERT Statement

Last week we covered how to get information out of a table, using a SELECT query.

This week, we will discover some of the myriad ways to put data into a table.

The good news is the concept is straightforward: we have a list of columns in a table, and each column has a datatype. We will insert a row into the table, according to the column order and datatype.

In reality, inserting data into a table is fraught with complexity.

Using our Stores table from before, the simplest way to write an INSERT statement is as follows:

INSERT INTO [dbo].[Stores]
VALUES (NULL,
N'West Edmonton Mall',
N'8882-170 Street, Edmonton, AB, T5T 4M2',
N'Stephanie West', N'stephanie@example.com');

Note: Remember that our store’s name and address, and manager’s name and email address, are all stored as Unicode (NVARCHAR), so the string has to be prefixed with an N. This guarantees that whatever is between the quotation marks is Unicode already and won’t be converted behind the scenes and potentially cause problems.

Ambiguity with column order

Our first problem is that it’s not clear which columns are being inserted into, nor their order.

Without a list of columns in the INSERT statement, the database engine will insert in the order provided, into whatever columns are on the table, and it might even do implicit conversions on data types behind the scenes.

It is therefore good practice to include the column names when writing an INSERT statement, like so:

INSERT INTO [dbo].[Stores] (
[StoreCode],
[StoreName],
[Address],
[ManagerName],
[ManagerEmail]
)
VALUES (
NULL,
N'West Edmonton Mall',
N'8882-170 Street, Edmonton, AB, T5T 4M2',
N'Stephanie West', N'stephanie@example.com'
);

Now we are sure that the column order is correct, and even if the table somehow has a different column order (for instance, the ManagerEmail and ManagerName are swapped around), this statement will succeed.

Where’s the StoreID column?

The astute reader has noticed there is no reference to the StoreID column, which happens to be the Primary Key for this table.

This is one of the ways a lot of accidental DBAs (and even experienced DBAs) get tripped up. If we think back to the table definition, we used an IDENTITY value.

To refresh our memories, an IDENTITY value is an auto-incrementing integer value, generated by the database engine, in order to ensure that the Primary Key is unique.

Note: It is because the IDENTITY is set that this column is being excluded from the INSERT statement above. Primary Keys which don’t have an IDENTITY set must be included in INSERT statements, provided that the value(s) being inserted will be unique.

NULL Values

We spoke about null values in the beginning of the series, and in this example we can see how to insert a NULL value into a table, provided the column definition allows nulls in the first place. We happen to know that the StoreCode is a nullable column.

Default Values

It is possible to exclude columns that have default values on them. If we think about it, an IDENTITY column is a column that has a default value assigned to it, which just happens to be an auto-incrementing integer.

We might want to have columns that have a default value of the current date and time. This convention is used when auditing database events by adding a CreatedDate column (for example) that defaults to the current date and time, using DATETIME2(7), when a row is inserted.

Another example might be to use a default value of 0 in a bit column and update that value later on.

In these cases, columns with default values can be excluded from the INSERT statement, because the database engine will automatically put the default value into that column.

That being said, there’s nothing stopping us from using a different value for a column that has a default value. If we have a default value on a column, like our DATETIME2(7) example, we could override that default value as long as we include the column and a new value in the INSERT statement.

Adding a column to the table

What happens if, during the course of regular business, a column is added to the Stores table? Unless that column has a default value, both examples of the INSERT statement above will fail.

The final results

If you’re following along in SQL Server Management Studio, we can run the INSERT statement like so:

INSERT INTO [dbo].[Stores] (
[StoreCode],
[StoreName],
[Address],
[ManagerName],
[ManagerEmail]
)
VALUES (
NULL,
N'West Edmonton Mall',
N'8882-170 Street, Edmonton, AB, T5T 4M2',
N'Stephanie West', N'stephanie@example.com'
);

Once we have executed the statement (using F5 on the keyboard, or clicking the Execute button in the toolbar), we see a message in the Messages pane:

(1 row(s) affected)

Using our recently-learned skill to SELECT the data from this table, we will see the new result set, containing the new row:

Congratulations! We have put a new row of data into a table, and a SELECT query proved that it was inserted.

Notice that the NULL value has a different background colour to the rest of the data. This is a way to distinguish actual null values from string columns that might just have the word “NULL” stored there.

Next time, we will be updating data in a table using the UPDATE command. Stay tuned.

Look me up on Twitter if you want to ask any questions about this series, on @bornsql.

How to write a SELECT query

My First SELECT Statement

Microsoft SQL Server makes it really easy for us to query tables. In SQL Server Management Studio (SSMS) for instance, we can right-click on any table we have access to and select the top 1000 rows from that table.

Don’t do this

Please don’t query tables this way in a production environment. It’s a bad way to do it, and you should feel bad.

Writing a SELECT statement should be done manually, the way Ada Lovelace and Grace Hopper intended.

Let’s assume we want to get a list of the stores in our database that we created in the First Look at Normalization post.

The table had the following columns: StoreID, StoreCode, StoreName, Address, ManagerName, and ManagerEmail.

To get a list of all rows and all columns in a table, we write the following statement:

SELECT
[StoreID],
[StoreCode],
[StoreName],
[Address],
[ManagerName],
[ManagerEmail]
FROM
[dbo].[Stores];

Remember from previous posts that the square brackets are a matter of style, and we could just as easily exclude them.

I leave them in because humans are terrible at naming things, and the square brackets make the code less likely to fail.

You’ll notice that there is a semi-colon at the end of the statement. We could have placed the entire statement on one line, like so:

SELECT [StoreID], [StoreCode], [StoreName], [Address], [ManagerName], [ManagerEmail] FROM [dbo].[Stores];

This is more difficult to read. SQL Server doesn’t really care about white space, including carriage returns, so feel free to format your code nicely so that it’s easy for you to read.

If you’re typing this in Management Studio, you can now press the Execute button in the menu, or the F5 key on the keyboard, to run the command.

Tip: in Management Studio, we can select just the text (T-SQL code) we want to run, using the mouse, and then press F5. This will guarantee that no other part of the script will run as well. It’s a useful way to run portions of code in a longer script.

Once we run the SELECT statement, we see a result set.

Click to enlarge

Congratulations! We have asked SQL Server for data, and it has responded with the data we asked for.

Next time, we will be adding data to a table using the INSERT command. Stay tuned.

Look me up on Twitter if you want to ask any questions about this series, on @bornsql.

Querying a Database

When we want to retrieve information from a database, we query the structure with language appropriate to the database.

Remember right at the start of this series we saw that a database could be a phone book or a recipe book. So how do we find the phone number of Randolph West? By looking up the surnames starting with W, and going through all the Wests in alphabetical order until we get to that entry.

The same goes for finding the recipe for lemon meringue in a recipe book. Start at the index at the back, look through the desserts, and then search for meringue.

In a RDBMS (relational database management system), the language for querying data is called Structured Query Language, or SQL. We can pronounce it like “sequel”, or sound out each letter.

SQL Server is commonly pronounced “Sequel Server”. MySQL is pronounced “My-S-Q-L”, sounding out the letters (some people pronounce it “My-Sequel”). It all depends on who’s saying it. PostgreSQL is just pronounced “Postgres”, because seriously.

These, along with Oracle, are the major players in the RDBMS industry.

ANSI SQL

When it comes to putting information into a database server, and getting information out, we can write queries that look very similar across platforms. This is thanks to a standards body called ANSI (American National Standards Institute), which (with proposals from each vendor) has suggested ANSI SQL syntax that all vendors should use.

For the sake of this series, we will assume that the vendors do follow the standard exactly, but in reality it isn’t that simple.

Putting data in: INSERT

Adding new data to a database is performed using an INSERT operation.

Changing data: UPDATE

Modifying existing data is done with an UPDATE operation.

Getting rid of data: DELETE

Removing rows from a table is performed with a DELETE operation.

Getting data out: SELECT

The vast majority of operations in a database has to do with data retrieval. To get data out, we SELECT it.

CRUD

In technical circles, between software developers and database developers, we might refer to these four operations using the mnemonic CRUD, which stands for Create, Read, Update, Delete.

When referring to a specific database or application, it could mean that the database is just being used as a data store (a virtual box of random stuff) and may not have proper relationships between the tables, nor be normalized.

There’s nothing necessarily evil about denormalized data, because the application code may handle that intelligence. Be wary though. Using an RDBMS to store non-relational data might cause headaches.

Next time we will write our first SELECT statement to query a database. Stay tuned!

Look me up on Twitter if you want to ask any questions about this series, on @bornsql.

So, like, what is a byte?

A friend of mine in the filmmaking business, who is exceedingly bright but has never worked with SQL Server before, was reading through the first five posts of this Database Fundamentals series, and asked a great question:

“I guess I’m not understanding what a byte is. I think I’m circling the drain in understanding it, but not floating down.”

She has a way with words.

I answered her immediately, but it reminded me that I did get a little carried away with data types, assuming that everyone reading that post would understand what a byte is.

In the innards of the computer is the CPU, or Central Processing Unit (there might be more than one in a server). The CPU is best described as a hot mess of on-off switches. Just as it is in your house, a switch only has two states.

David Hasselhoff, SQL Server DBA

This is what “binary” means. When the CPU clock ticks over, billions of times per second, if a switch is closed, it’s a 1 (electricity can flow to complete the circuit). If the switch is open, it’s a 0 (electricity cannot pass through it).

(Source: https://diytechpro.com/electric-circuit-simple-concept/)

The CPU (and memory, and storage system, and network) understand binary, and the software that sits on top of it uses binary as well.

We end up with a series of 1s and 0s that, when arranged in different combinations, represent information in some form or another. Each of these is a binary digit, or bit.

Through a series of decisions in the old days of computing, when we stick eight of these bits of data together, they form a byte.

Now comes the mathematical part of today’s post.

If we have 8 digits that can store two values each, we get a total of 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 combinations. This is more easily typed as 2^8, or 256. In other words, a byte can store a maximum of 256 values.

Here’s a short list of bytes to give you an example (I have not listed every one of the 256 possibilities). We write the bits in groups of four to make them easier to read.

Binary Decimal ASCII Binary Decimal ASCII
0010 0000 32 <space> 1000 0001 129 Å
0010 0001 33 ! 1000 0010 130 Ç
0010 0010 34 1000 0011 131 É
0010 0011 35 # 1000 0100 132 Ñ
0010 0100 36 $ 1000 0101 133 Ö
0010 0101 37 % 1000 0110 134 Ü
0010 0110 38 & 1000 0111 135 á
0010 0111 39 1000 1000 136 à
0010 1000 40 ( 1000 1001 137 â
0010 1001 41 ) 1000 1010 138 ä
0010 1010 42 * 1000 1011 139 ã
0010 1011 43 + 1000 1100 140 å
0010 1100 44 , 1000 1101 141 ç
0010 1101 45 1000 1110 142 é
0010 1110 46 . 1000 1111 143 è
0010 1111 47 / 1001 0000 144 ê
0011 0000 48 0 1001 0001 145 ë
0011 0001 49 1 1001 0010 146 í
0011 0010 50 2 1001 0011 147 ì
0011 0011 51 3 1001 0100 148 î
0011 0100 52 4 1001 0101 149 ï
0011 0101 53 5 1001 0110 150 ñ
0011 0110 54 6 1001 0111 151 ó
0011 0111 55 7 1001 1000 152 ò
0011 1000 56 8 1001 1001 153 ô
0011 1001 57 9 1001 1010 154 ö
0011 1010 58 : 1001 1011 155 õ
0011 1011 59 ; 1001 1100 156 ú
0011 1100 60 < 1001 1101 157 ù
0011 1101 61 = 1001 1110 158 û
0011 1110 62 > 1001 1111 159 ü
0011 1111 63 ? 1010 0000 160
0100 0000 64 @ 1010 0001 161 °
0100 0001 65 A 1010 0010 162 ¢
0100 0010 66 B 1010 0011 163 £
0100 0011 67 C 1010 0100 164 §
0100 0100 68 D 1010 0101 165
0100 0101 69 E 1010 0110 166
0100 0110 70 F 1010 0111 167 ß
0100 0111 71 G 1010 1000 168 ®
0100 1000 72 H 1010 1001 169 ©
0100 1001 73 I 1010 1010 170
0100 1010 74 J 1010 1011 171 ´
0100 1011 75 K 1010 1100 172 ¨
0100 1100 76 L 1010 1101 173
0100 1101 77 M 1010 1110 174 Æ
0100 1110 78 N 1010 1111 175 Ø
0100 1111 79 O 1011 0000 176
0101 0000 80 P 1011 0001 177 ±
0101 0001 81 Q 1011 0010 178
0101 0010 82 R 1011 0011 179
0101 0011 83 S 1011 0100 180 ¥
0101 0100 84 T 1011 0101 181 µ
0101 0101 85 U 1011 0110 182
0101 0110 86 V 1011 0111 183
0101 0111 87 W 1011 1000 184
0101 1000 88 X 1011 1001 185 π
0101 1001 89 Y 1011 1010 186
0101 1010 90 Z 1011 1011 187 ª
0101 1011 91 [ 1011 1100 188 º
0101 1100 92 \ 1011 1101 189 Ω
0101 1101 93 ] 1011 1110 190 æ
0101 1110 94 ^ 1011 1111 191 ø
0101 1111 95 _ 1100 0000 192 ¿
0110 0000 96 ` 1100 0001 193 ¡
0110 0001 97 a 1100 0010 194 ¬
0110 0010 98 b 1100 0011 195
0110 0011 99 c 1100 0100 196 ƒ
0110 0100 100 d 1100 0101 197
0110 0101 101 e 1100 0110 198
0110 0110 102 f 1100 0111 199 «
0110 0111 103 g 1100 1000 200 »
0110 1000 104 h 1100 1001 201
0110 1001 105 i 1100 1010 202
0110 1010 106 j 1100 1011 203 À
0110 1011 107 k 1100 1100 204 Ã
0110 1100 108 l 1100 1101 205 Õ
0110 1101 109 m 1100 1110 206 Œ
0110 1110 110 n 1100 1111 207 œ
0110 1111 111 o 1101 0000 208
0111 0000 112 p 1101 0001 209
0111 0001 113 q 1101 0010 210
0111 0010 114 r 1101 0011 211
0111 0011 115 s 1101 0100 212
0111 0100 116 t 1101 0101 213
0111 0101 117 u 1101 0110 214 ÷
0111 0110 118 v 1101 0111 215
0111 0111 119 w 1101 1000 216 ÿ
0111 1000 120 x 1101 1001 217 Ÿ
0111 1001 121 y 1101 1010 218
0111 1010 122 z 1101 1011 219
0111 1011 123 { 1101 1100 220
0111 1100 124 | 1101 1101 221
0111 1101 125 } 1101 1110 222
0111 1110 126 ~ 1101 1111 223
0111 1111 127 1110 0000 224
1000 0000 128 Ä 1110 0001 225 ·

There are values missing from the above table, for characters that cannot be displayed correctly in a web browser. For a complete table showing all 256 characters, visit PC Guide.com.

How does this affect Unicode values? If you remember in our post about CHAR, NCHAR, VARCHAR and NVARCHAR data types, we discovered that the Unicode versions (those types starting with N) will use two bytes in memory and on disk to store a single character, compared to the non-Unicode (sometimes called ASCII or plain text) data types, which use only one byte per character.

The high-level reason for this is that some alphabets have more than 256 characters, so the code page (the full set of characters in upper- and lower-case where applicable, plus all the numbers, punctuation marks, and so forth) won’t fit in the 256 possibilities available in a single byte.

When we stick two bytes together however, we suddenly have as many as 2^16 values that we can store, for a total of 65,536 possibilities. This is mostly good enough if you’re not storing Japanese in SQL Server.

There are exceptions to this, where some kanji takes up four bytes per character. This is known as UTF-32 (Unicode Transformation Format, 32 bits per character). The good news is, SQL Server does support multi-byte characters wider than standard (UTF-16) Unicode, as long as we pick the correct collation.

I hope this answers any burning questions you may have had about bits and bytes.

Feel free to reach out to me on Twitter at @bornsql.

Max Server Memory and SQL Server 2016 Service Pack 1

Everything changed for SQL Server Standard Edition on 16 November 2016, and how memory limits work.

On that day, a slew of Enterprise Edition features made their way into editions across the board, including Express Edition and LocalDB.

The memory limit of 128GB RAM applies only to the buffer pool (the 8KB data pages that are read from disk into memory — in other words, the database itself).

For servers containing more than 128GB of physical RAM, and running SQL Server 2016 with Service Pack 1 or higher, we now have options.

The max server memory setting always did only refer to the buffer pool, but for many reasons there was misunderstanding from a lot of people that it included other caches as well.

Because ColumnStore and In-Memory OLTP have their own cache limits over and above the 128GB buffer pool limit, the guidance around assigning max server memory is no longer simple.

ColumnStore now gets an extra 32GB of RAM per instance, while In-Memory OLTP gets an extra 32GB of RAM per database.

With that in mind, you are still welcome to use the Max Server Memory Matrix and associated calculator script for lower versions of SQL Server (up to and including 2014), but I will not be maintaining it further, unless someone finds a bug.

How much should I assign to max server memory? It depends.

It would be very easy to spec a server with 256GB RAM, install a single instance of SQL Server 2016 Standard Edition (with Service Pack 1, of course), have 128GB for the buffer pool, 32GB for the ColumnStore cache, three databases with 32GB of RAM each for In-Memory OLTP, and still run out of memory.

This is a brave new world of memory management. Tread carefully.

If you’d like to share your thoughts, find me on Twitter at @bornsql.

Temporal Tables and Hidden Period Columns

In my November 2015 post, An Introduction to Temporal Tables in SQL Server 2016 using a DeLorean, I wrote:

The HIDDEN property is optional and will hide these columns from a standard SELECT statement for backward compatibility with our application and queries. You cannot apply the HIDDEN property to an existing column.

It turns out that this is no longer true. You can apply the HIDDEN property to an existing period column.

Let’s assume you have a temporal table containing two visible period columns, StartDate and EndDate, which you’d like to hide from a typical SELECT statement.

Using an ALTER TABLE ... ALTER COLUMN statement, simply place the ADD HIDDEN syntax after the period column name(s).

ALTER TABLE [dbo].[Account] ALTER COLUMN [StartDate] ADD HIDDEN;
ALTER TABLE [dbo].[Account] ALTER COLUMN [EndDate] ADD HIDDEN;

You can also remove this flag if you wish, using DROP HIDDEN:

ALTER TABLE [dbo].[Account] ALTER COLUMN [StartDate] DROP HIDDEN;
ALTER TABLE [dbo].[Account] ALTER COLUMN [EndDate] DROP HIDDEN;

This is a great improvement to an already fantastic feature of SQL Server 2016. Thanks to Borko Novakovic for this tip.

If you have any more temporal table tricks you want to share, find me on Twitter at @bornsql.

My surname is NULL

Last Wednesday on Twitter, Abayomi Obawomiye (@SQLAmerica) wrote:

I just met someone with the last name NULL today. I really want to ask if they had issues with the last name but worried might be offensive

Abayomi goes on further to say that the word “NULL” gets replaced with a space in any CSV files generated for reports, which got me thinking about this problem.

Then I realised I had already written about it. We make assumptions on a daily basis when it comes to working with data. In this case, Abayomi assumed that no one’s name could ever be “Null”.

Yes, I’ve made that assumption too.

This sparked a memory in the deep and rusty recesses of my brain. I remember reading an article about someone whose last name is Null. It took me just a few seconds to find the article on Wired Magazine’s website, and the pullout phrase is right in the headline:

Hello, I’m Mr. Null. My Name Makes Me Invisible to Computers

It turns out that Abayomi is not alone with this chance meeting. How many of us consider the possibility that legitimate data in our databases could match reserved keywords?

The author of that article, Christopher Null, has an amusing workaround:

My usual trick is to simply add a period to my name: “Null.” This not only gets around many “null” error blocks, it also adds a sense of finality to my birthright.

In my goal to build systems that are easier to use, that should adapt to the consumer, this is clearly problematic. We shouldn’t be forcing Christopher to change his own name to fit our database model.

How would I go about dealing with this problem? Clearly when exporting data to CSV or generating extracts for other other third-party use, this problem has to be solved.

Delimiters are Good

The easiest way I can think of is for data types to be delimited correctly. When exporting to CSV format, we would ensure that all strings be surrounded by quotation marks. This way, Christopher Null’s record would show up as follows (assuming last name, first name ordering):

"Null","Christopher"

Then the parsing process on the reporting side would import that into whichever system without stumbling over the value.

Another issue raised by Christopher in his Wired article is that his email address is rejected outright by Bank of America because it contains the word null before the @ sign.

First and Last Name columns are bad

I’m going to call myself out on the above example and say that assuming names can fit neatly into two columns is bad. Firstly, Cher and Madonna would complain.

Secondly, Vice Admiral Horatio Nelson, 1st Viscount Nelson, 1st Duke of Bronté, would run out of space before his name could be entered correctly.

This doesn’t even begin to address names that are outside of our mainly Western perspective. In South Africa, I saw names that few developers in North America would consider, making use of all manner of punctuation (including spaces).

My recommendation here is to have a 250-character NVARCHAR column called FullName and leave it at that. If you really want your mail merge software to send “personalised” emails or letters, add an additional PreferredName column, and make sure it’s properly delimited when exporting.

Christopher Null wouldn’t have to bat an eyelid here. It would solve for his problem.

(While I’m on this matter, don’t ask for Gender or Sex. You don’t need it. If you do, you’ll know how to ask properly.)

Email Validation is Stupid

My third rule for ensuring that Nulls can exist in this world with Wests and Obawomiyes is to stop validating email addresses!

Firstly, most developers use a regular expression to do it, which is wrong and terrible and slow, and (clearly) imperfect.

This is a particular bugbear of mine, since I have one character before the @ in my personal email address. Many systems fail when trying to validate it, including WestJet. My own workaround is to create a westjet@ alias so that I can receive email from WestJet.

The best way to validate an email address is to send an email to that address with a link for someone to click on. If they don’t click on the link to close the loop, the email address isn’t valid, and your well-designed system won’t allow them to continue to the next step unless that link is clicked.

Summary

Three simple recommendations could help the Nulls find visibility again and ensure data inserted into a system is not modified unnecessarily.

For more reading on this and similar problems, I recommend visiting Bobby Tables.

If you have any experience in this area, please feel free to contact me on Twitter, at @bornsql .

Temporal Tables and History Retention

I’m a huge fan of Temporal Tables in SQL Server 2016. I first wrote about them, in a four-part series in November 2015, before SQL Server was even released. I don’t always get this excited about new features.

However, it has some limitations. As part of this week’s T-SQL Tuesday, hosted by the attractive and humble Brent Ozar, I have discovered a Microsoft Connect item I feel very strongly about.

Adam Machanic, the creator of an indispensable tool, sp_WhoIsActive, has created a Connect item entitled Temporal Tables: Improve History Retention of Dropped Columns.

As my readers know, temporal tables have to have the same schema as their base tables (the number and order of columns, and their respective data types, have to match).

Where this breaks down is when a table structure has changed on the base table. The history table also needs to take those changes into account, which could potentially result in data loss or redundant columns in the base table.

Adam suggests allowing columns which no longer appear in the base table to be retained in the history table and marked as nullable (or hidden), and should only appear when performing a point-in-time query by referring to the column(s) explicitly.

I have voted for this suggestion, and at the time of writing, it has 16 upvotes. I encourage you to add your voice to this suggestion.

If you have any other suggestions, or wish to discuss temporal tables, please contact me on Twitter at @bornsql .

Wait For Service Pack 1

Conventional wisdom tells us that when Microsoft releases a new version of any server product, we should wait until Service Pack 1 before deploying it to production.

This hasn’t been true for a while now, since Microsoft recommended that Cumulative Updates for SQL Server carry the same confidence:

SQL Server CUs are certified to the same levels as Service Packs, and should be installed at the same level of confidence.

However, Service Pack 1 itself has been mired in some controversy. Microsoft didn’t make things any easier for us with SQL Server 2012, or 2014. Both versions had issues with their respective Service Pack 1.

Fortunately, SQL Server 2016 has broken the cycle, and along with all of the fixes in Cumulative Updates 1–3, and a security fix, we get better feature parity between Enterprise Edition and lower editions, including Standard, Web, Express and LocalDB.

There are some restrictions, of course, but the idea is that developers and vendors can write T-SQL for features that now appear across the board.

SQL Server 2016 Service Pack 1 now includes the following features for Enterprise, Standard, and Web Edition:

  • In-Memory OLTP
  • In-Memory Columnstore
  • Always Encrypted
  • Partitioning
  • Data Compression
  • Change data capture
  • And more!

If you want to take advantage of these features, but you use an older version of SQL Server, you will need to upgrade to SQL Server 2016 with Service Pack 1, but I think this is a no-brainer.

The good news is licences have cost the same since SQL Server 2012, and Standard Edition is almost a quarter of the price of Enterprise Edition.

I maintain that this is the most exciting release of SQL Server since 2005. If you want to upgrade, contact us and we’ll help. We will even upgrade you from SQL Server 6.5.