The easy way to handle UTF-8 in a .NET application when dealing with SQL Server 2019

A short post this week.

On a mailing list recently, someone noticed that a .NET application writing to SQL Server did not have the expected behaviour with UTF-8 collation and data types.

To refresh our memories, UTF-8 is newly supported in SQL Server 2019, and provides potential savings of up to 50% when storing strings, as opposed to using traditional Unicode data types (which generally means two bytes per character).

In the example on the mailing list, the person created a table on a database with UTF-8 collation, and then using the SqlDbTypes of Char and VarChar, wrote three sample strings containing extended characters to the table. When querying the table again, the three values were returned in ASCII format, so it was clear that Unicode support was ignored.

After confirming this scenario on my own machine (with a different .NET framework and different operating system), I narrowed down the issue to SqlDbType. When I converted these to NChar and NVarChar respectively, the values written to the database were correctly rendered as UTF-8.

The moral of the story is that — as with all strings in .NET applications — they should be treated as Unicode in all scenarios, which means using NChar and NVarChar data types when reading from and writing to SQL Server. By leaving UTF-8 handling to the SQL Server engine directly, it’s one less thing to worry about. After all, UTF-8 is ultimately a storage feature.

Share your string conversion stories in the comments below.

Photo by Jason Leung on Unsplash.

One thought on “The easy way to handle UTF-8 in a .NET application when dealing with SQL Server 2019”

  • Back in May 2019 (the date this blog post is timestamped with), using System.Data.SqlClient was the only option for .NET, and it indeed does not support UTF-8 collations natively, so one has to resort to NChar/NVarChar. Since then Microsoft released a .NET provider that supports UTF-8 collations natively – Microsoft.Data.SqlClient (https://devblogs.microsoft.com/dotnet/introducing-the-new-microsoftdatasqlclient/), as well as updated ODBC and OleDB providers – to support UTF-8.

    Another novelty to be aware of – not so much for .NET but for ODBC/OleDB – is that Windows 10 now allows to change the system locale to be UTF-8 as well, which plays well with the updated providers. At the moment of this writing Win10 UTF-8 system locale is marked as Beta, but it is functioning well.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: