mongodbaws-glueaws-glue-data-catalogaws-glue-connection

Not able to create dynamic frame, After crawling MONGODB table to AWS Data Catalog successfully


I created a mongodb connection successfully, my connection tests successfully and was able to use a Crawler to create metadata in the Glue Data Catalog. However, when i use below where i am adding my mongodb database name and collection name in additional_options parameter i get an error:

data_catalog_database = 'tinkerbell'data_catalog_table = 'tinkerbell_funds'glueContext.create_dynamic_frame_from_catalog(database = data_catalog_database,table_name = data_catalog_table,additional_options = {"database":"tinkerbell","collection":"funds"})

following is the error: An error was encountered: An error occurred while calling o177.getDynamicFrame. : java.lang.NoSuchMethodError: com.mongodb.internal.connection.DefaultClusterableServerFactory.<init>(Lcom/mongodb/connection/ClusterId;Lcom/mongodb/connection/ClusterSettings;Lcom/mongodb/connection/ServerSettings;Lcom/mongodb/connection/ConnectionPoolSettings;Lcom/mongodb/connection/StreamFactory;Lcom/mongodb/connection/StreamFactory;Lcom/mongodb/MongoCredential;Lcom/mongodb/event/CommandListener;Ljava/lang/String;Lcom/mongodb/MongoDriverInformation;Ljava/util/List;)V

When I use it without additional parameters

glueContext.create_dynamic_frame_from_catalog(database = data_catalog_database,table_name = data_catalog_table)

I get following error: An error was encountered: Missing collection name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.collection' property Traceback (most recent call last): File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/context.py", line 179, in create_dynamic_frame_from_catalog return source.getFrame(**kwargs) File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame jframe = self._jsource.getDynamicFrame() File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.IllegalArgumentException: Missing collection name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.collection' property

Can someone please help me pass these parameters correctly?

Have explained above on what I tried but what I was expecting the dynamic frame to be created using the catalog table.


Solution

  • You are getting that error as mongo is expecting a connection with spark and need the input and output property.

    Please refer to below link- https://www.mongodb.com/docs/spark-connector/master/python-api/#std-label-pyspark-shell