A short post this week.
On a mailing list recently, someone noticed that a .NET application writing to SQL Server did not have the expected behaviour with UTF-8 collation and data types.
To refresh our memories, UTF-8 is newly supported in SQL Server 2019, and provides potential savings of up to 50% when storing strings, as opposed to using traditional Unicode data types (which generally means two bytes per character).
In the example on the mailing list, the person created a table on a database with UTF-8 collation, and then using the
VarChar, wrote three sample strings containing extended characters to the table. When querying the table again, the three values were returned in ASCII format, so it was clear that Unicode support was ignored.
After confirming this scenario on my own machine (with a different .NET framework and different operating system), I narrowed down the issue to
SqlDbType. When I converted these to
NVarChar respectively, the values written to the database were correctly rendered as UTF-8.
The moral of the story is that — as with all strings in .NET applications — they should be treated as Unicode in all scenarios, which means using
NVarChar data types when reading from and writing to SQL Server. By leaving UTF-8 handling to the SQL Server engine directly, it’s one less thing to worry about. After all, UTF-8 is ultimately a storage feature.
Share your string conversion stories in the comments below.