databasearchitecturestorageclickhousebin

Clickhouse architecture


Just learned course of clickhouse architecure and per that information clickhouse table separated into "parts" (screenshot example):

enter image description here

every part consists of few main files , one of that files is column1.bin for example where specific column data is stored , so per cource we should have separate bin for every column (screenshot example from cource):

enter image description here

Here is screenshot from one of my folders , and despite I have few columns in my table i have only one bin file , why ?

enter image description here


Solution

  • In ClickHouse there are two types of parts: wide, and compact parts (there are memory parts also but let's keep simple)

    Here you can find the definition of both types: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-data-storage

    Data parts can be stored in Wide or Compact format. In Wide format each column is stored in a separate file in a filesystem, in Compact format all columns are stored in one file. Compact format can be used to increase performance of small and frequent inserts.

    Data storing format is controlled by the min_bytes_for_wide_part and min_rows_for_wide_part settings of the table engine. If the number of bytes or rows in a data part is less then the corresponding setting's value, the part is stored in Compact format. Otherwise it is stored in Wide format. If none of these settings is set, data parts are stored in Wide format.

    Basically, you're seeing a single bin file because data is too small to be worth splitting each column into a single file.

    If you perform an big insert the new part will be created as wide. Also, if you continue doing small inserts, the background merge task will eventually merge those files into a single part big enough to be created as wide.

    If you want more details about both files structure check this: