amazon-web-servicesterraformaws-lake-formation

How to Terraform Lake formation Governed Tables


Just looking to get an answer for a Terraform question. I'm interested in using Governed Tables in Lake Formation and using Terraform to allocate the resources. Is is currently possible to terraform these? Documentation is looking sparse. This is currently the documentation for glue tables.

It does not mention governed tables anywhere.


Solution

  • Ok I found an answer. Word of caution for anyone going down this route. While this is possible, I don't think that Terraform is necessarily the best solution for working with Lakeformation. It is a non-standard api and because of that the work flow is kind of jenky. For example, it doesn't appear like you can create two governed tables at one Terraform apply as each table needs to have its own transaction id.

    resource "aws_glue_catalog_table" "etl_glue_catalog_extract_table" {
      count = var.extract_format == "csv" ? 1 : 0
      name          = "${var.env}_${var.etl_name}_extract"
      database_name = aws_glue_catalog_database.etl_glue_catalog_database.name
    
      table_type = "GOVERNED"
    
      storage_descriptor {
        input_format  = "org.apache.hadoop.mapred.TextInputFormat"
        location      = "s3://<bucket_name>/datasets/extract/${var.etl_name}/csv/"
        output_format = "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
        ser_de_info {
          name                  = "${var.env}_${var.etl_name}_extract"
          parameters            = {
            "serialization.format" = "1"
          }
          serialization_library = "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
        }
      }
    }
    

    The main piece that is needed is the table_type being governed. You will also need to set up your provider so that it has data lake admin permissions. Refer to the Terraform Guide for help with that.

    Good luck.