sqlitecsv

interpret hex as numbers, when importing csv into sqlite3


I have trouble with hex numbers in sqlite.

Given a CSV file with numbers, one of them is written as hex (0x1)

❯ cat my.csv
A,B
1,0x1

Import it and check the schema

❯ sqlite3 my.db ".import --csv my.csv somet"
❯ sqlite3 my.db ".schema somet"
CREATE TABLE IF NOT EXISTS "somet"(
"A" TEXT, "B" TEXT);

Now we can select A as a real number (no single quotes), but B not.

❯ sqlite3 my.db "select * from somet where a = 0x1;"
1|0x1

❯ sqlite3 my.db "select * from somet where b = 0x1;"

This is unexpected to me. We can see that SQLite understand hex representation of digits, when filtering on column A, but not on B.

Whe looking at the output for A, we also see that b hasnt been stored as a number, but as a litteral hex string it seems.

So, given sqlite3 understands hex representation, and it will use numbers as "numbers", when importing csv, why is B not imported as number?


Solution

  • CSV import via the sqlite3 shell inserts values as strings. Any conversion to other storage types is done by the underlying INSERT based on column types and their affinity.

    The B column of your table has a TEXT type, which means it has text affinity; everything is stored as a string (except for nulls and blobs). If you give the column a type that has numeric affinity, like INTEGER, strings that contain integer literals would be converted to ints, except

    ... hexadecimal integer literals are not considered well-formed and are stored as TEXT. (This is done for historical compatibility with versions of SQLite prior to version 3.8.6 2014-08-15 where hexadecimal integer literals were first introduced into SQLite.)

    You'd have to change your ETL process to replace those hex values with base 10 integers before importing (and create the table with the correct column types ahead of time), or otherwise write a more intelligent import script.