apache-sparkcassandraspark-cassandra-connectoramazon-keyspaces

Problem to write on keyspace with new versions spark 3.x


I'm trying to write on aws keyspace, but the following message appears:

enter image description here

Spark version: 3.0.1
Connector: 3.0
Java: 1.8
Scala: 2.12

Respecting by the version on github: enter image description here

In other previus version like Connector = 2.5.2 and spark = 2.4.6 works fine.


Solution

  • You should be able to connect using spark 3 and connector 3. Here are some steps to validate you setup connection accordingly and you have the right permissions.

    You should be able to execute the following query against your system.peers table and retrieve the ips from the endpoint public/private. If you have 1 or no peers you need to take the steps above. Remember the AWS console is not in your vpc and will contact the public endpoint similar to s3.

    SELECT * FROM system.peers
    

    Sample Policy. You need to provide access to resource /keyspace/system* and ec2:DescribeNetworkInterfaces" and "ec2:DescribeVpcEndpoints" on your vpc.

        {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Effect":"Allow",
             "Action":[
                "cassandra:Select",
                "cassandra:Modify"
             ],
             "Resource":[
                "arn:aws:cassandra:us-east-1:111122223333:/keyspace/mykeyspace/table/mytable",
                "arn:aws:cassandra:us-east-1:111122223333:/keyspace/system*"
             ]
          },
          {
             "Sid":"ListVPCEndpoints",
             "Effect":"Allow",
             "Action":[
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeVpcEndpoints"
             ],
             "Resource":"*"
          }
       ]
    }
    

    Setup the connection by referencing the external config.

    -conf":"spark.cassandra.connection.config.profile.path=application.conf"
    

    Sample driver config.

    datastax-java-driver {
      basic.request.consistency = "LOCAL_QUORUM"
      basic.contact-points = [ "cassandra.us-east-1.amazonaws.com:9142"]
    
      advanced.reconnect-on-init = true
    
       basic.load-balancing-policy {
            local-datacenter = "us-east-1"
         }
    
       advanced.auth-provider = {
           class = PlainTextAuthProvider
           username = "user-at-sample"
           password = "S@MPLE=PASSWORD="
        }
    
        advanced.throttler = {
           class = ConcurrencyLimitingRequestThrottler
           max-concurrent-requests = 30
           max-queue-size = 2000
        }
    
    
    
       advanced.ssl-engine-factory {
          class = DefaultSslEngineFactory
          hostname-validation = false
        }
    
        advanced.connection.pool.local.size = 1
    
    
    }