My surname is NULL

Last Wednesday on Twitter, Abayomi Obawomiye (@SQLAmerica) wrote:

I just met someone with the last name NULL today. I really want to ask if they had issues with the last name but worried might be offensive

Abayomi goes on further to say that the word “NULL” gets replaced with a space in any CSV files generated for reports, which got me thinking about this problem.

Then I realised I had already written about it. We make assumptions on a daily basis when it comes to working with data. In this case, Abayomi assumed that no one’s name could ever be “Null”.

Yes, I’ve made that assumption too.

This sparked a memory in the deep and rusty recesses of my brain. I remember reading an article about someone whose last name is Null. It took me just a few seconds to find the article on Wired Magazine’s website, and the pullout phrase is right in the headline:

Hello, I’m Mr. Null. My Name Makes Me Invisible to Computers

It turns out that Abayomi is not alone with this chance meeting. How many of us consider the possibility that legitimate data in our databases could match reserved keywords?

The author of that article, Christopher Null, has an amusing workaround:

My usual trick is to simply add a period to my name: “Null.” This not only gets around many “null” error blocks, it also adds a sense of finality to my birthright.

In my goal to build systems that are easier to use, that should adapt to the consumer, this is clearly problematic. We shouldn’t be forcing Christopher to change his own name to fit our database model.

How would I go about dealing with this problem? Clearly when exporting data to CSV or generating extracts for other other third-party use, this problem has to be solved.

Delimiters are Good

The easiest way I can think of is for data types to be delimited correctly. When exporting to CSV format, we would ensure that all strings be surrounded by quotation marks. This way, Christopher Null’s record would show up as follows (assuming last name, first name ordering):

"Null","Christopher"

Then the parsing process on the reporting side would import that into whichever system without stumbling over the value.

Another issue raised by Christopher in his Wired article is that his email address is rejected outright by Bank of America because it contains the word null before the @ sign.

First and Last Name columns are bad

I’m going to call myself out on the above example and say that assuming names can fit neatly into two columns is bad. Firstly, Cher and Madonna would complain.

Secondly, Vice Admiral Horatio Nelson, 1st Viscount Nelson, 1st Duke of Bronté, would run out of space before his name could be entered correctly.

This doesn’t even begin to address names that are outside of our mainly Western perspective. In South Africa, I saw names that few developers in North America would consider, making use of all manner of punctuation (including spaces).

My recommendation here is to have a 250-character NVARCHAR column called FullName¬†and leave it at that. If you really want your mail merge software to send “personalised” emails or letters, add an additional PreferredName column, and make sure it’s properly delimited when exporting.

Christopher Null wouldn’t have to bat an eyelid here. It would solve for his problem.

(While I’m on this matter, don’t ask for Gender or Sex. You don’t need it. If you do, you’ll know how to ask properly.)

Email Validation is Stupid

My third rule for ensuring that Nulls can exist in this world with Wests and Obawomiyes is to stop validating email addresses!

Firstly, most developers use a regular expression to do it, which is wrong and terrible and slow, and (clearly) imperfect.

This is a particular bugbear of mine, since I have one character before the @ in my personal email address. Many systems fail when trying to validate it, including WestJet. My own workaround is to create a westjet@ alias so that I can receive email from WestJet.

The best way to validate an email address is to send an email to that address with a link for someone to click on. If they don’t click on the link to close the loop, the email address isn’t valid, and your well-designed system won’t allow them to continue to the next step unless that link is clicked.

Summary

Three simple recommendations could help the Nulls find visibility again and ensure data inserted into a system is not modified unnecessarily.

For more reading on this and similar problems, I recommend visiting Bobby Tables.

If you have any experience in this area, please feel free to contact me on Twitter, at @bornsql .

Author: randolph

Randolph West is a Microsoft Data Platform MVP, and has worked with SQL Server since the late 1990s. When not consulting, he can be seen acting on the stage and screen, or doing voices for independent video games. Randolph is available for talks on SQL Server, and technology in general. He also offers training for junior DBAs. Connect with Randolph on Google+ or Twitter.