I have a CSV that I want to use to create a table, but I want to make some transformations as I load it into Clickhouse. For example, I want to select a subset for the CSV columns, and apply some functions to some of the data. I thought I could do this as follows:
CREATE TABLE test_csv
ORDER BY some_csv_column
AS SELECT some_csv_column, some_other_column
FROM file('myuserfiles/mydata.csv.gz', 'CSVWithNames', 'auto', 'gzip')
Unfortunately I get:
Code: 44. DB::Exception: Received from localhost:9000. DB::Exception:
Sorting key contains nullable columns, but merge tree setting `allow_nullable_key` is disabled. (ILLEGAL_COLUMN)
It seems to me that the field I used for ORDER BY is nullable in the auto-generated schema, and that is not allowed by the engine. My Clickhouse knowledge is quite limited (<24hours) so my questions are:
Can I change this statement somehow to still use auto
but tell it that the some_csv_column
should not be considered nullable?
How do I disable the allow_nullable_key
setting and then re-enable it? (there will be no nulls in actuality, if there are it should fail).
Any other way to achieve what I am trying to do?
Thanks!
The default behavior of ClickHouse's schema inference is to make all the inferred columns NULLABLE
, but it is actually not a best practice to use NULLABLE
. You can simply disable this behavior:
CREATE TABLE test_csv
ORDER BY some_csv_column
AS SELECT some_csv_column, some_other_column
FROM file('myuserfiles/mydata.csv.gz', 'CSVWithNames', 'auto', 'gzip')
SETTINGS schema_inference_make_columns_nullable = 0