dockerpysparkcassandraairflowspark-cassandra-connector

Cannot connect to Cassandra in Docker, getting "Unable to connect to any servers" with cqlsh


I am trying to build a containerized mini batch data processing pipeline using PySpark and Docker after which the processed data would be stored in Cassandra. I am utilizing a docker-compose file for pulling the images for spark and Cassandra, I am able to run my pyspark file without errors, however I have erros when trying to run the cassandra lines such as creating keyspace and tables which is why I tried to use cqlsh in the container after which I got the following error

Connection error: ('Unable to connect to any servers', { \
  '127.0.0.1:9042': ConnectionRefusedError(111, "Tried connecting to \
  [('127.0.0.1', 9042)]. Last error: Connection refused")})

docker commands: -

docker compose up -d

docker ps 

docker exec -it container-id cqlsh: I get the error after this command

I have tried pulling various types of Cassandra images which comes up with the same error and I have checked several sources to identify how to schedule this using airflow in the container to no avail

I used the following docker-compose: -

version: '3'

networks:
  app-tier:
    driver: bridge


services:
  spark:
    image: docker.io/bitnami/spark:3.3
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_USER=spark
    ports:
      - '8080:8080'

    volumes: 
    - ".:/opt/spark"
  spark-worker:
    image: docker.io/bitnami/spark:3.3
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark:7077
      - SPARK_WORKER_MEMORY=1G
      - SPARK_WORKER_CORES=1
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_USER=spark
    networks:
      - app-tier

    
  cassandra:
    image: 'bitnami/cassandra:latest'
    #image: docker.io/bitnami/cassandra:4.1
    #image: cassandra:latest
    ports:
      - '7000:7000'
      - '127.0.0.1:9042:9042'
    volumes:
      #- 'cassandra_data:/bitnami'
      - ".:/opt/cassandra"
    environment:
      - CASSANDRA_SEEDS=cassandra
      - CASSANDRA_PASSWORD_SEEDER=yes
      - CASSANDRA_PASSWORD=cassandra
    networks:
      - app-tier ```



Solution

  • Docker containers run in their own network so when you connect to a container, you need to specify the network to connect to.

    In your case, you've named the network app-tier so specify the network in your command with --network app-tier.

    Additionally, you also need to specify the name of the container you're connecting to. You can find out the container's name from the docker ps output.

    If you're interested, the Quickstart Guide on the official Apache Cassandra website has detailed steps for running Cassandra in Docker.

    Finally, I'd highly recommend you spend some time learning Docker first so you understand the basics. Otherwise, you will be wasting a lot of time running into other simple issues unrelated to Cassandra or Spark. Cheers!