we want to keep data for the last three months only, and move the remaining data to S3. I am trying to create a script that will run on a daily or weekly basis. Is the below script correct?
#!/bin/bash
# Define the table name and partition date format
TABLE="x"
DATE_FORMAT="%Y-%m-%d"
# Get the current date and the date 3 months ago
CURRENT_DATE=$(date +"$DATE_FORMAT")
THREE_MONTHS_AGO=$(date -d "3 months ago" +"$DATE_FORMAT")
# QuestDB SQL query to detach partitions older than 3 months
SQL_QUERY="ALTER TABLE $TABLE DETACH PARTITION WHERE timestamp < '$THREE_MONTHS_AGO';"
# Execute the SQL query using questdb's CLI or REST API (adjust accordingly)
# Assuming you can run SQL commands via the QuestDB CLI, for example:
# questdb-cli --execute "$SQL_QUERY"
# Now, move detached partitions to S3 (if needed)
# Assuming partitions are stored in /var/lib/questdb/db/x/ and end with '.detached'
for partition in $(ls /var/lib/questdb/db/x/*.detached); do
PARTITION_NAME=$(basename "$partition")
# Create a tarball of the partition and move it to S3
tar -czf - "$partition" | aws s3 cp - "s3://questdb-internal/blobs/$PARTITION_NAME.tar.gz"
# Optionally, remove the local detached partition after moving it
rm "$partition"
done
We can make the script a bit better with some changes.
The script is calculating now the date, which is fine, but it might be easier to delegate this directly to the query, as in:
ALTER TABLE $TABLE DETACH PARTITION WHERE timestamp < dateadd('M', -3, date_trunc('day', now()))
The script was mentioning executing the SQL via questdb-cli
, but that's not really a thing. You can instead issue a call to the /exec
endpoint:
curl -G \
--data-urlencode "query=ALTER TABLE $TABLE DETACH PARTITION WHERE timestamp < dateadd('M', -3, date_trunc('day', now()));" \
--data-urlencode "count=true" \
http://localhost:9000/exec
And the last thing would be that your script is uploading now to S3 using the original partition name. Imagine you delete a partition, and 3 months later, for whichever reason, you receive a row for a partition that was already deleted. QuestDB will create that partition again. Now, three months down the line your script would detach the partition and would compress and upload to S3 again, overwriting your existing object, which is not what you want.
If this scenario is not part of your regular operations and it can happen only because of an error or faulty data, you might want to use conditional writes to S3 that send data only if it does not exist.
If this scenario can happen in your business (very old data is received after a partition is detached) you need to make sure the new partition does not override the data on S3 (maybe adding some date suffix, or enabling object versioning in the bucket), but you also need to be careful when reattaching the partition to QuestDB, as as of today merging partitions is not allowed. You would probably need to attach one of the partitions, move the partition data to a temporary table, detach again, attach the other partition from the same data-range, then insert the data from the temp table into this partition and remove the temp table.