postgresqlamazon-web-servicesdockerapache-sparkdocker-compose

Connection from Docker Container to AWS RDS PostgreSQL Failed


I am trying to connect to a PostgreSQL database on AWS RDS from a spark master container running on docker.

I installed both the AWS CLI and the AWS session manager plugin inside the spark master container to be able to connect from the container itself to the DB.

Additionally,the spark master container is responsible for submitting a spark job that will make use of the port forwarding ( aws ssm start-session ) we did in the same container and connect to the DB.

Now when I run AWS configure and aws ssm start-session commands inside the container, I get

Starting session with session id xxx.
Port xxxx opened for session ID XXX
Waiting for connections...

When I run the spark-submit right after on the same container, I get Connection accepted for session xxxx from the aws ssm terminal, then right after, the spark job stops with an error:

org.postgresql.util.PSQLException: Connection to localhost:5555 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

So it looks like connection is being accepted for a really short period of time, then refused? two processes are reporting two completely different results.


Solution

  • Ok, I resolved the issue and I am leaving what worked for me in case someone faces the same problem in the future.

    What I did wrong is running the aws ssm start-session process only in the spark master container, the connection accepted message was the master's connection being accepted, but spark works with lazy transformations, so when the master actually finds actions to be executed ( the sales.show(10) in my case ), the workers are the ones who will do the actual work, so we have spark worker containers that need a piece of data not read yet ( because lazy transformations ) to manipulate, yet they are not connected to the DB. So the connection accepted is the master's connection being accepted, and the connection error was the spark workers reporting back to the master, hence the contradiction.

    The solution is to run aws ssm start-session in both master and worker containers. And each worker container will connect separately to the DB whenever assigned with a piece of work involving data for the first time.