amazon-web-servicesaws-glueaws-cdkaws-glue-data-catalog

How to add SerDe parameters in CDK?


I am using CDK to create a Glue table like this:

  const someTable = new Glue.Table(
      scope,
      "some-table",
      {
        tableName: "some-table",
        columns: [
          {
            name: "value",
            type: Glue.Schema.DOUBLE,
          },
          {
            name: "user_id",
            type: Glue.Schema.STRING,
          },
        ],
        partitionKeys: [
          {
            name: "region_id",
            type: Glue.Schema.BIG_INT,
          },
        ],
        database: glueDb,
        dataFormat: Glue.DataFormat.PARQUET,
        bucket: props.bucket,
      }
    );

It looks like this this is creating my Glue table as expected, but it's also doing some things behind the scenes like setting up a a Serde serialization lib (org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe). For my use case, I have to also specify some Serde parameters in the table configuration, but I can't find how to do it in the CDK documentation (https://docs.aws.amazon.com/cdk/api/v1/docs/@aws-cdk_aws-glue.Table.html) even though it looks like something you can configure in the console under "Edit Table".

enter image description here

Has anyone run into this and have any suggestions about how to update this?

Thanks!


Solution

  • Pass serde settings to a Table (@aws-cdk/aws-glue-alpha) using the dataFormat (type of DataFormat) prop.

    // TableProps
    {
      dataFormat:  glue.DataFormat.PARQUET
    }
    

    For finer-grained control, use the L1 CfnTable (aws-cdk-lib) construct, whose API matches the CloudFormation AWS::Glue::Table resource.

    // CfnTableProps
    tableInput: {
      // ...
      storageDescriptor: {
        inputFormat: 'org.apache.hadoop.mapred.TextInputFormat',
        outputFormat: 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat',
        serdeInfo: {
          serializationLibrary: 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe',
          parameters: { 'serialization.format': 1 },
        },