databasecolumnstoreclickhouse

How to create primary keys in ClickHouse


I did found few examples in the documentation where primary keys are created by passing parameters to ENGINE section. But I did not found any description about any argument to ENGINE, what it means and how do I create a primary key. Thanks in advance. It would be great to add this info to the documentation it it's not present.


Solution

  • Primary key is supported for MergeTree storage engines family. https://clickhouse.tech/docs/en/engines/table_engines/mergetree_family/mergetree/

    Note that for most serious tasks, you should use engines from the MergeTree family.

    It is specified as parameters to storage engine.

    The engine accepts parameters: the name of a Date type column containing the date, a sampling expression (optional), a tuple that defines the table's primary key, and the index granularity.

    Example without sampling support:

    MergeTree(EventDate, (CounterID, EventDate), 8192)
    

    Example with sampling support:

    MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)
    

    So, (CounterID, EventDate) or (CounterID, EventDate, intHash32(UserID)) is primary key in these examples.

    When using ReplicatedMergeTree, there are also two additional parameters, identifying shard and replica.

    https://clickhouse.tech/docs/en/engines/table_engines/mergetree_family/replication/#creating-replicated-tables

    Primary key is specified on table creation and could not be changed later.

    Despite the name, primary key is not unique. It just defines sort order of data to process range queries in optimal way. You could insert many rows with same value of primary key to a table.