I'm running flink v1.18.1
workloads on on-premise K8s cluster via deployment of FlinkDeployment
CR jobs.
The on-prem environment has no access to the public internet - and I've been advised to configure my deployments to use a privatelink S3 endpoint that's been setup - and not the public s3.eu-central-1.amazonaws.com
endpoint.
The AWS privatelink endpoint looks like this - vpce-<abc>-<efg>.s3.eu-central-1.vpce.amazonaws.com
.
I've updated the flinkConfiguration
in my k8s deployment like so:
s3.access-key: "$S3_ACCESS_KEY"
s3.secret-key: "$S3_SECRET_KEY"
s3.endpoint: vpce-<abc>-<efg>.s3.eu-central-1.vpce.amazonaws.com
s3.region: eu-central-1
The issue that I have is that the checkpoint init timesout - I've attached the JobManager log and the Flink UI checkpoints tab for this.
Is there anything else I need to configure?
I've got a working solution for this now - these are the config key/values in my flinkConfiguration:
state.checkpoints.dir: s3a://<bucket>/<folder>/state-checkpoints
state.savepoints.dir: s3a://<bucket>/<folder>/state-savepoints
execution.savepoint.path: s3a://<bucket>/<folder>/execution-savepoints
# secrets
s3.access-key: "$S3_ACCESS_KEY"
s3.secret-key: "$S3_SECRET_KEY"
## hadoop configs
s3.endpoint: https://bucket.vpce-<abc>-<efg>.s3.eu-central-1.vpce.amazonaws.com
s3.endpoint.region: eu-central-1
s3.region: eu-central-1
s3.path.style.access: "true"
I had to explicitly use s3a
for the S3 urls - with just the s3
URL, the checkpointing operation seemed to use the Presto filesytem implementation to write the checkpoint.
This URL for the Hadoop S3 connector had pretty good details on how to configure this - however the Presto S3 connector doc didn't seem to touch on via privatelink connections at all.