The easy way to handle UTF-8 in a .NET application when dealing with SQL Server 2019

A short post this week.

On a mailing list recently, someone noticed that a .NET application writing to SQL Server did not have the expected behaviour with UTF-8 collation and data types.

To refresh our memories, UTF-8 is newly supported in SQL Server 2019, and provides potential savings of up to 50% when storing strings, as opposed to using traditional Unicode data types (which generally means two bytes per character).

In the example on the mailing list, the person created a table on a database with UTF-8 collation, and then using the SqlDbTypes of Char and VarChar, wrote three sample strings containing extended characters to the table. When querying the table again, the three values were returned in ASCII format, so it was clear that Unicode support was ignored.

After confirming this scenario on my own machine (with a different .NET framework and different operating system), I narrowed down the issue to SqlDbType. When I converted these to NChar and NVarChar respectively, the values written to the database were correctly rendered as UTF-8.

The moral of the story is that — as with all strings in .NET applications — they should be treated as Unicode in all scenarios, which means using NChar and NVarChar data types when reading from and writing to SQL Server. By leaving UTF-8 handling to the SQL Server engine directly, it’s one less thing to worry about. After all, UTF-8 is ultimately a storage feature.

Share your string conversion stories in the comments below.

Photo by Jason Leung on Unsplash.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: