I have a Pyspark file which will be submitted to Dataproc.
try:
print("Start writing")
url = "jdbc:postgresql://some-ip:5432/postgres"
properties = {
"driver": "org.postgresql.Driver",
"user": "postgres",
"password": "root"
}
df.write.jdbc(url=url, table="result", mode="overwrite", properties=properties)
except Exception as e:
print(e)
sc.stop()
I use postgresql-42.6.0.jar JDBC driver and my database is postgresql 14.
Here is the error.
An error occurred while calling o86.jdbc.
: org.postgresql.util.PSQLException: The connection attempt failed.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:331)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:247)
at org.postgresql.Driver.makeConnection(Driver.java:434)
at org.postgresql.Driver.connect(Driver.java:291)
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
...
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
...
Here is how I submit my job through google cloud shell
gcloud beta dataproc jobs submit pyspark gs://taro-de-intern/pyspark_postgresql.py\
--cluster my-cluster \
--jars gs://my-bucket/postgresql-42.6.0.jar
I suspect that it has something to do with driver so I downgrade my jar file version to 42.4.2. But it didn't work and yield the same error.
I even tried to change the format to
df.write \
.format("jdbc") \
.option("driver", "org.postgresql.Driver") \
.option("url", "jdbc:postgresql://some-ip:5432/postgres") \
.option("dbtable", "schema.result") \
.option("user", "postgres") \
.option("password", "root") \
.save()
also yield the same error
I already sort it out so here is the solution. If you are using any cloud database(SQL instance on GCP, AWS, Azure)
Don't forget to allow outside connection
Here is where you can enable outside connection on GCP cloud SQL instance.
Go to connections
Add network(You won't have the allow all network when you first open) by entering the name of your connection(name doesn't matter) and your IP address.
For more information please visit Microsoft website on subnet mask.
Note: This is for the sake of example so don't allow all connection (0.0.0.0) in the real production.
Scroll down to the bottom and click save.