I have a ClickHouse server and Airflow as Docker containers in WSL. They're both running and I even managed to connect to ClickHouse somehow, but when trying to query I get the following traceback in Airflow logs:
[2024-06-04, 12:17:49 UTC] {logging_mixin.py:188} INFO - >>> Running 'etl' function. Logged at 2024-06-04 12:17:49.678034+00:00
.....
[2024-06-04, 12:17:53 UTC] {logging_mixin.py:188} INFO - >>> Running 'clickhouse_connect' function. Logged at 2024-06-04 12:17:53.590231+00:00
[2024-06-04, 12:17:53 UTC] {logging_mixin.py:188} INFO - Connecting to ClickHouse Cluster
[2024-06-04, 12:17:53 UTC] {base.py:84} INFO - Using connection ID 'my_clickhouse' for task execution.
[2024-06-04, 12:17:53 UTC] {logging_mixin.py:188} INFO - 8123
[2024-06-04, 12:17:53 UTC] {logging_mixin.py:188} INFO - Connected successfully
[2024-06-04, 12:17:53 UTC] {logging_mixin.py:188} INFO - Querying CHDB
[2024-06-04, 12:17:53 UTC] {connection.py:408} WARNING - Failed to connect to localhost:9000
Traceback (most recent call last):
.....
clickhouse_driver.errors.NetworkError: Code: 210. Connection refused (localhost:9000)
It seems that Airflow is trying to connect to localhost:9000 which is supposed to be only for CLI usage, and not to localhost:8123 which is for the HTTP web-interface.
What should I do to force the connection to go to port 8123, or am I getting it wrong? Anyways, the main problem is that I can't get it to work with ClickHouse via Airflow.
PS: I use clickhouse_driver.Client for connection inside PythonOperator.
And this code to connect:
class ClickHouseConnection:
connection = None
def get_connection(connection_name='my_clickhouse'):
from clickhouse_driver import Client
if ClickHouseConnection.connection:
return connection
db_props = BaseHook.get_connection(connection_name)
ClickHouseConnection.connection = Client(db_props.host)
return ClickHouseConnection.connection, db_props
@logger
def clickhouse_connect():
print("Connecting to ClickHouse Cluster")
ch_connection, db_props = ClickHouseConnection.get_connection()
print(db_props.port)
print("Connected successfully")
filename = '/opt/airflow/data/test_csv_file.csv'
if ch_connection:
print("Querying CHDB")
ch_connection.execute(
f"INSERT INTO maindb.monitor FROM INFILE {filename} FORMAT CSV"
)
Thank you in advance!
I tried to change the port to 9000 in Airflow's Connections and restart containers, but nothing has changed.
The issue was in host name of clickhouse connection. Don't quite remember where I read it, but I should've used host name exactly like host.docker.internal
(as long as I run docker multi-container application) and port 9000 is somehow worked for me.