amazon-web-servicesamazon-s3amazon-athena

How to allow AWS Athena access to cloudfront standard logs in s3 bucket?


I am delivering Cloudfront standard logs into one s3 bucket.

The bucket is created via terragrunt/form

terraform {
  source = "github.com/terraform-aws-modules/terraform-aws-s3-bucket?ref=v3.15.0"
}

include "root" {
  path   = find_in_parent_folders()
  expose = true
}

locals {
  vars = include.root.locals.vars

  app_name = basename(dirname(abspath(get_terragrunt_dir())))
  bucket   = "${include.root.locals.tags.Environment}-${local.app_name}-logs-${get_aws_account_id()}"
}

inputs = {
  tags = include.root.locals.tags

  bucket              = local.bucket
  acl                 = "log-delivery-write"
  block_public_policy = true

  control_object_ownership = true
  object_ownership         = "BucketOwnerPreferred"

  attach_public_policy = false

  attach_policy = true
  policy = {
    Statement = [
      {
        Sid    = "ListObjectsInBucket"
        Effect = "Allow"
        Principal = {
          Service = "athena.amazonaws.com"
        }
        Action   = "s3:ListBucket"
        Resource = "arn:aws:s3:::${local.bucket}"
      },
      {
        Sid    = "AllObjectsActions"
        Effect = "Allow"
        Principal = {
          Service = "athena.amazonaws.com"
        }
        Action   = "s3:*"
        Resource = "arn:aws:s3:::${local.bucket}/*"
      }
    ]
  }

  lifecycle_rule = [
    {
      id      = "cloudfront-logs-expiration"
      enabled = true

      prefix = "" # Apply to all objects
      tags   = {} # Apply regardless of tags

      expiration = {
        days = 30
      }
    }
  ]

  force_destroy = local.vars.env != "prod" ? true : false
}

The logs are delivered perfectly fine from cloudfront -> s3 above

I create the table in AWS ATHENA with this query: (of course using my s3 bucket)

CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_standard_logs (
  `date` DATE,
  time STRING,
  x_edge_location STRING,
  sc_bytes BIGINT,
  c_ip STRING,
  cs_method STRING,
  cs_host STRING,
  cs_uri_stem STRING,
  sc_status INT,
  cs_referrer STRING,
  cs_user_agent STRING,
  cs_uri_query STRING,
  cs_cookie STRING,
  x_edge_result_type STRING,
  x_edge_request_id STRING,
  x_host_header STRING,
  cs_protocol STRING,
  cs_bytes BIGINT,
  time_taken FLOAT,
  x_forwarded_for STRING,
  ssl_protocol STRING,
  ssl_cipher STRING,
  x_edge_response_result_type STRING,
  cs_protocol_version STRING,
  fle_status STRING,
  fle_encrypted_fields INT,
  c_port INT,
  time_to_first_byte FLOAT,
  x_edge_detailed_result_type STRING,
  sc_content_type STRING,
  sc_content_len BIGINT,
  sc_range_start BIGINT,
  sc_range_end BIGINT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
LOCATION 's3://DOC-EXAMPLE-BUCKET/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )

The above is documented here https://docs.aws.amazon.com/athena/latest/ug/cloudfront-logs.html also https://repost.aws/knowledge-center/analyze-logs-athena

So far, so good!!! the table is created.

The problem happens when I try to query data:

SELECT
 sc_status,
 COUNT(*) AS request_count
FROM
 cloudfront_standard_logs
GROUP BY
 sc_status

enter image description here

How can I allow AWS Athena to access this bucket and fix the error ? What changes should I make in my terragrunt code policy for it. I don't understand why the permissions I gave are not enough

The terragrunt/form policy above is converted and attached as this json to the bucket

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListObjectsInBucket",
            "Effect": "Allow",
            "Principal": {
                "Service": "athena.amazonaws.com"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::secret"
        },
        {
            "Sid": "AllObjectsActions",
            "Effect": "Allow",
            "Principal": {
                "Service": "athena.amazonaws.com"
            },
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::secret*"
        }
    ]
}

Solution

  • I fixed the issue but not 100% HOW and WHY

    1. I changed in my terragrunt code and removed acl = "log-delivery-write"
    2. I deleted this bucket
    3. I created it again with only step 1 changing!!!

    MAGIC, now athena can access the files in my new bucket..