I am using Terraform to set up Trino cluster managed by Amazon EMR.
Here is my Terraform code:
resource "aws_emr_cluster" "hm_amazon_emr_cluster" {
name = "hm-trino"
release_label = "emr-7.1.0"
applications = ["HCatalog", "Trino"]
master_instance_fleet {
name = "Primary"
target_on_demand_capacity = 3
launch_specifications {
on_demand_specification {
allocation_strategy = "lowest-price"
}
}
instance_type_configs {
weighted_capacity = 1
instance_type = "r7g.xlarge"
}
}
configurations_json = <<EOF
[
{
"Classification": "trino-connector-hive",
"Properties": {
"hive.metastore": "glue"
}
}
]
EOF
# ...
}
To enable High availability (HA) for this Trino cluster, besides
HCatalog
in applications.master_instance_fleet.target_on_demand_capacity = 3
.trino-connector-hive
to use glue
in configurations_json.I need enable "Use for Hive table metadata" in "AWS Glue Data Catalog settings" like this UI:
However, I didn't find any info about setting this config at https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_cluster
Any ideas?
I found I can add hive-site
with "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
in configurations_json to reflect the "Use for Hive table metadata" in "AWS Glue Data Catalog settings" in the UI.
Here is the final code:
resource "aws_emr_cluster" "hm_amazon_emr_cluster" {
name = "hm-trino"
release_label = "emr-7.1.0"
applications = ["HCatalog", "Trino"]
master_instance_fleet {
name = "Primary"
target_on_demand_capacity = 3
launch_specifications {
on_demand_specification {
allocation_strategy = "lowest-price"
}
}
instance_type_configs {
weighted_capacity = 1
instance_type = "r7g.xlarge"
}
}
configurations_json = <<EOF
[
{
"Classification": "hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
},
{
"Classification": "trino-connector-hive",
"Properties": {
"hive.metastore": "glue"
}
}
]
EOF
placement_group_config = [
{
instance_role = "MASTER"
placement_strategy = "SPREAD"
}
]
# ...
}
References: