postgresqletldata-warehousebusiness-intelligenceclickhouse

How to ETL my PostgreSQL data into a ClickHouse datawarehouse?


I have data stored in postgreSQL as data source and I want to load dimensions and fact tables of the clickhouse datawarehouse , I am new to clickhouse and used to use traditional integration tools like Talend and Microsoft SSIS to perform ETL (PS i'm using docker images for both clickhouse and postgreSQL)


Solution

  • Here is an example of how to import your PostgreSQL data into a new ClickHouse table of the same shape. Let's assume you have a single table named foo with two columns, id and foo.

    1. Create a table using the PostgreSQL Table Engine to get a view into your pg table (replacing the connection info with your own):
    CREATE TABLE IF NOT EXISTS foo_pg
    (
        id UUID,
        foo String
    ) ENGINE = PostgreSQL('host:port', 'database', 'tablename', 'username', 'password');
    
    1. Create a new ClickHouse table with the same shape (to avoid type casting on INSERT):
    CREATE TABLE IF NOT EXISTS foo
    (
        id UUID,
        foo String
    ) ENGINE = MergeTree()
    ORDER BY tuple()
    PRIMARY KEY(id);
    
    1. Use INSERT INTO to import data:
    INSERT INTO foo (id, foo)
    SELECT id, foo FROM foo_pg;