apache-flink

Configure AWS S3 as state backend via AWS Privatelink from on-prem


I'm running flink v1.18.1 workloads on on-premise K8s cluster via deployment of FlinkDeployment CR jobs.

The on-prem environment has no access to the public internet - and I've been advised to configure my deployments to use a privatelink S3 endpoint that's been setup - and not the public s3.eu-central-1.amazonaws.com endpoint.

The AWS privatelink endpoint looks like this - vpce-<abc>-<efg>.s3.eu-central-1.vpce.amazonaws.com.

I've updated the flinkConfiguration in my k8s deployment like so:

s3.access-key: "$S3_ACCESS_KEY"
s3.secret-key: "$S3_SECRET_KEY"
s3.endpoint: vpce-<abc>-<efg>.s3.eu-central-1.vpce.amazonaws.com
s3.region: eu-central-1

The issue that I have is that the checkpoint init timesout - I've attached the JobManager log and the Flink UI checkpoints tab for this.

Is there anything else I need to configure?

Flink UI checkpoints tab JobManager log


Solution

  • I've got a working solution for this now - these are the config key/values in my flinkConfiguration:

    state.checkpoints.dir: s3a://<bucket>/<folder>/state-checkpoints
    state.savepoints.dir: s3a://<bucket>/<folder>/state-savepoints
    execution.savepoint.path: s3a://<bucket>/<folder>/execution-savepoints
    
    # secrets
    s3.access-key: "$S3_ACCESS_KEY"
    s3.secret-key: "$S3_SECRET_KEY"
    
    ## hadoop configs
    s3.endpoint: https://bucket.vpce-<abc>-<efg>.s3.eu-central-1.vpce.amazonaws.com
    s3.endpoint.region: eu-central-1
    s3.region: eu-central-1
    s3.path.style.access: "true"
    

    I had to explicitly use s3a for the S3 urls - with just the s3 URL, the checkpointing operation seemed to use the Presto filesytem implementation to write the checkpoint.

    This URL for the Hadoop S3 connector had pretty good details on how to configure this - however the Presto S3 connector doc didn't seem to touch on via privatelink connections at all.