Handling implicit varchar conversions

Got some quirky unexpected behavior when loading data from text files and SQL statements when working with extended ASCII characters and Unicode.

These files in particular are UTF8 encoded as is standard when doing text editing and contain strings with contents like so æ2Ç9£╖♫↕◄‼↨∟.

File editing example Visual Studio Code

Example

  1. Text file data: æ2Ç9£╖♫↕◄‼↨∟
  2. What is stored in SQL as a VARCHAR(100) : æ2Ç9£+??????

æ2Ç9£╖♫↕◄‼↨∟ does not equal æ2Ç9£+??????

What is happening?

The data is being implicitly coerced to the limits of the data type.  VARCHAR(100) can not hold values outside it's bounded collation/charset, so what get stored and converted is dependent on your installation and what is loaded so... chaos.

What should you do?

If you can convert to using a proper unicode supported type well do that which means use NVARCHAR(100), sure this doubles the size of your data types, but it also means data consistency and the ability to handle unicode properly which means your application is set for the foreseeable future and you can treat input data as it is intended and things just work.

Example illustrated

SELECT 'æ2Ç9£╖♫↕◄‼↨∟' = æ2Ç9£+??????

SELECT N'æ2Ç9£╖♫↕◄‼↨∟' = æ2Ç9£╖♫↕◄‼↨∟

What else can you do?

Well you can have the data implicitly convert however it loaded into the target data type by performing a quick SQL statement against the in question target SQL Server like so: SELECT CAST('æ2Ç9£╖♫↕◄‼↨∟' as VARCHAR(100)) against the file input.  Then turn this statement into a Dapper extension method like so

public static class DbExtensions
{
    public static string ToVarChar(this IDbConnection db, string textToConvert)
    {
        return db.QueryFirst<string>("SELECT CAST(@textToConvert as VARCHAR(MAX))", new {textToConvert});
    }
}

And used like so

using var conn = new SqlConnection(new SqlConnectionStringBuilder
{
	DataSource = "localhost",
	InitialCatalog = "tempdb",
	IntegratedSecurity = true
}.ConnectionString);

var textToConvert = "æ2Ç9£╖♫↕◄‼↨∟";

var loadedTextValue = conn.QueryFirst<string>(@"
	CREATE TABLE #test (LoadedText VARCHAR(100));
	INSERT INTO #test (LoadedText) VALUES (@textToLoad);
	SELECT LoadedText FROM #test;", new
{
	textToLoad = textToConvert
});
var implicitConverted = conn.ToVarChar(textToConvert);
Assert.Equal(0, string.CompareOrdinal(loadedTextValue, implicitConverted));

Again the best thing to do would be to use NVARCHAR which would look like this and then doesn't involve any special conversion and lets the data retain it's original properties.

using var conn = new SqlConnection(new SqlConnectionStringBuilder
{
	DataSource = "localhost",
	InitialCatalog = "tempdb",
	IntegratedSecurity = true
}.ConnectionString);

var textToConvert = "æ2Ç9£╖♫↕◄‼↨∟";

var loadedTextValue = conn.QueryFirst<string>(@"
	CREATE TABLE #test (LoadedText NVARCHAR(100));
	INSERT INTO #test (LoadedText) VALUES (@textToLoad);
	SELECT LoadedText FROM #test;", new
{
	textToLoad = textToConvert
});
Assert.Equal(0, string.CompareOrdinal(loadedTextValue, textToConvert));