amazon-web-servicesaws-glueaws-glue-spark

What options can be passed to AWS Glue DynamicFrame.toDF()?


The documentation on toDF() method specifies that we can pass an options parameter to this method. But it does not specify what those options can be (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html). Does anyone know if there is further documentation on this? I am specifically interested in passing in a schema when creating a DataFrame from DynamicFrame.


Solution

  • Unfortunately there's not much documentation available, yet R&D and analysis of source code for dynamicframe suggests the following:

    My understanding after seeing the specs, toDF implementation of dynamicFrame and toDF from spark is that we can't pass schema when creating a DataFrame from DynamicFrame, but only minor column manipulations are possible.

    Saying this, a possible approach is to obtain a dataframe from dynamic frame and then manipulate it to change its schema.