I want to read Parquet files stored in a GCS bucket via DuckDB as CLI i.e. duckdb
in an environment where I setup a Service Account and I created the HMAC credentials like
gcloud storage hmac create \
my-sa@my-project.iam.gserviceaccount.com \
--project=my-project > `pwd`/sensitive/hmac.yaml
I cannot find a way to persist the GCS credentials across duckdb
sessions. I am using a recent version 1.2.2 of the duckdb
CLI.
I setup the duckdb_wrapper
below, which works, but it's a bit ugly and convoluted. Is there a way to store somewhere some .config
or .duckdb
config file with this sensitive information. Or perhaps some environment variables containing these secrets?
# setup the secret to be able to access the GCS bucket with the HMAC credentials
cat <<EOL > "/sensitive/duckdb_init.sql"
CREATE SECRET (
TYPE gcs,
KEY_ID '${GCS_ACCESS_KEY_ID}',
SECRET '${GCS_SECRET_ACCESS_KEY}'
);
EOL
duckdb -c ".read /sensitive/duckdb_init.sql"
# https://duckdb.org/docs/stable/guides/network_cloud_storage/gcs_import.html
duckdb -s "
INSTALL httpfs;
LOAD httpfs;
" || exit 1
# DuckDB wrapper script in /usr/local/bin
cat <<EOF > /usr/local/bin/duckdb_wrapper
#!/bin/bash
# run the original duckdb command with any arguments provided
exec duckdb -c ".read /sensitive/duckdb_init.sql" "\$@"
EOF
chmod +x /usr/local/bin/duckdb_wrapper
After doing the above, I am able to invoke duckdb
in both ways:
duckdb -c '.read /sensitive/duckdb_init.sql' \
-s "SELECT * FROM read_parquet('gs://my-bucket/date=2018-03-03/*.parquet');"
duckdb_wrapper \
-s "SELECT * FROM read_parquet('gs://my-bucket/date=2018-03-03/*.parquet');"
but as mentioned before it looks ugly and convoluted, is there anything better and cleaner I could do?
The ~/.duckdbrc
file described here https://duckdb.org/docs/stable/operations_manual/footprint_of_duckdb/files_created_by_duckdb#global-files-and-directories worked out pretty well, here's an example configuration setup that is executed at each DuckDB session init when the CLI duckdb
command is invoked:
cat <<EOL > ~/.duckdbrc
-- setup the secret to be able to access the GCS bucket with the HMAC credentials
-- https://duckdb.org/docs/stable/operations_manual/footprint_of_duckdb/files_created_by_duckdb#global-files-and-directories
--
-- these environment variables must be present:
-- GCS_ACCESS_KEY_ID and GCS_SECRET_ACCESS_KEY
CREATE SECRET (
TYPE gcs,
KEY_ID '${GCS_ACCESS_KEY_ID}',
SECRET '${GCS_SECRET_ACCESS_KEY}'
);
-- parquet and GCS buckets setup
-- https://duckdb.org/docs/stable/guides/network_cloud_storage/gcs_import.html
INSTALL httpfs;
LOAD httpfs;
SELECT 'My DuckDB session initialized!' AS my_message;
EOL