I am setting up an glue job in aws via terraform (sample below ) . based on the docs here - https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html, any additional package needed for the glue job can be passed via --additional-python-modules under job parameter reference, but how do i pass this information when setting up a job via terraform like below .
resource "aws_glue_job" "example" {
name = "example job"
role_arn = aws_iam_role.example.arn
number_of_workers=4
command {
name = "gluestreaming"
script_location = "s3://${aws_s3_bucket.test.data}/temp.script"
}
# ... other configuration ...
default_arguments = {
# ... potentially other arguments ...
"--continuous-log-logGroup" = aws_cloudwatch_log_group.example.name
"--enable-continuous-cloudwatch-log" = "true"
"--enable-continuous-log-filter" = "true"
"--enable-metrics" = ""
}
}
You need to add it to your default_arguments
block. Make sure the packages you specify are separated by ,
.
Here's how you can do it:
default_arguments = {
# ... potentially other arguments ...
"--continuous-log-logGroup" = aws_cloudwatch_log_group.example.name
"--enable-continuous-cloudwatch-log" = "true"
"--enable-continuous-log-filter" = "true"
"--enable-metrics" = ""
"--additional-python-modules" = "package1==1.0.0,pacakge2"
}
}
It's optional to use a version for the packages, but usually it's the better practice to include that.