clickhouse

Clickhouse create table from select using CSV auto schema


I have a CSV that I want to use to create a table, but I want to make some transformations as I load it into Clickhouse. For example, I want to select a subset for the CSV columns, and apply some functions to some of the data. I thought I could do this as follows:

CREATE TABLE test_csv
ORDER BY some_csv_column
AS SELECT some_csv_column, some_other_column
FROM file('myuserfiles/mydata.csv.gz', 'CSVWithNames', 'auto', 'gzip')

Unfortunately I get:

Code: 44. DB::Exception: Received from localhost:9000. DB::Exception: 
Sorting key contains nullable columns, but merge tree setting `allow_nullable_key` is disabled. (ILLEGAL_COLUMN)

It seems to me that the field I used for ORDER BY is nullable in the auto-generated schema, and that is not allowed by the engine. My Clickhouse knowledge is quite limited (<24hours) so my questions are:

  1. Can I change this statement somehow to still use auto but tell it that the some_csv_column should not be considered nullable?

  2. How do I disable the allow_nullable_key setting and then re-enable it? (there will be no nulls in actuality, if there are it should fail).

  3. Any other way to achieve what I am trying to do?

Thanks!


Solution

  • The default behavior of ClickHouse's schema inference is to make all the inferred columns NULLABLE, but it is actually not a best practice to use NULLABLE. You can simply disable this behavior:

    CREATE TABLE test_csv
    ORDER BY some_csv_column
    AS SELECT some_csv_column, some_other_column
    FROM file('myuserfiles/mydata.csv.gz', 'CSVWithNames', 'auto', 'gzip')
    SETTINGS schema_inference_make_columns_nullable = 0