I am using CDK to create a Glue table like this:
const someTable = new Glue.Table(
scope,
"some-table",
{
tableName: "some-table",
columns: [
{
name: "value",
type: Glue.Schema.DOUBLE,
},
{
name: "user_id",
type: Glue.Schema.STRING,
},
],
partitionKeys: [
{
name: "region_id",
type: Glue.Schema.BIG_INT,
},
],
database: glueDb,
dataFormat: Glue.DataFormat.PARQUET,
bucket: props.bucket,
}
);
It looks like this this is creating my Glue table as expected, but it's also doing some things behind the scenes like setting up a a Serde serialization lib (org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe). For my use case, I have to also specify some Serde parameters in the table configuration, but I can't find how to do it in the CDK documentation (https://docs.aws.amazon.com/cdk/api/v1/docs/@aws-cdk_aws-glue.Table.html) even though it looks like something you can configure in the console under "Edit Table".
Has anyone run into this and have any suggestions about how to update this?
Thanks!
Pass serde settings to a Table (@aws-cdk/aws-glue-alpha
) using the dataFormat
(type of DataFormat) prop.
// TableProps
{
dataFormat: glue.DataFormat.PARQUET
}
For finer-grained control, use the L1 CfnTable (aws-cdk-lib
) construct, whose API matches the CloudFormation AWS::Glue::Table
resource.
// CfnTableProps
tableInput: {
// ...
storageDescriptor: {
inputFormat: 'org.apache.hadoop.mapred.TextInputFormat',
outputFormat: 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat',
serdeInfo: {
serializationLibrary: 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe',
parameters: { 'serialization.format': 1 },
},