I'm trying to write on aws keyspace, but the following message appears:
Spark version: 3.0.1
Connector: 3.0
Java: 1.8
Scala: 2.12
Respecting by the version on github:
In other previus version like Connector = 2.5.2 and spark = 2.4.6 works fine.
You should be able to connect using spark 3 and connector 3. Here are some steps to validate you setup connection accordingly and you have the right permissions.
You should be able to execute the following query against your system.peers table and retrieve the ips from the endpoint public/private. If you have 1 or no peers you need to take the steps above. Remember the AWS console is not in your vpc and will contact the public endpoint similar to s3.
SELECT * FROM system.peers
Sample Policy. You need to provide access to resource /keyspace/system* and ec2:DescribeNetworkInterfaces" and "ec2:DescribeVpcEndpoints" on your vpc.
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[
"cassandra:Select",
"cassandra:Modify"
],
"Resource":[
"arn:aws:cassandra:us-east-1:111122223333:/keyspace/mykeyspace/table/mytable",
"arn:aws:cassandra:us-east-1:111122223333:/keyspace/system*"
]
},
{
"Sid":"ListVPCEndpoints",
"Effect":"Allow",
"Action":[
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeVpcEndpoints"
],
"Resource":"*"
}
]
}
Setup the connection by referencing the external config.
-conf":"spark.cassandra.connection.config.profile.path=application.conf"
Sample driver config.
datastax-java-driver {
basic.request.consistency = "LOCAL_QUORUM"
basic.contact-points = [ "cassandra.us-east-1.amazonaws.com:9142"]
advanced.reconnect-on-init = true
basic.load-balancing-policy {
local-datacenter = "us-east-1"
}
advanced.auth-provider = {
class = PlainTextAuthProvider
username = "user-at-sample"
password = "S@MPLE=PASSWORD="
}
advanced.throttler = {
class = ConcurrencyLimitingRequestThrottler
max-concurrent-requests = 30
max-queue-size = 2000
}
advanced.ssl-engine-factory {
class = DefaultSslEngineFactory
hostname-validation = false
}
advanced.connection.pool.local.size = 1
}