amazon-web-servicesetlaws-glue

AWS glue to extract json referenced by id


I have a json datafile on s3 structured like this, with the id of each object used as a key

{
"id_01": {"name": "Julie", "city": "Paris"},
"id_02": {"name": "Marc", "city": "Lyon"},
etc.
}

Is it possible for a crawler to generate a schema like this one? :

id|name|city

if not, is it good practice to get the file directly from s3 without crawling it first?


Solution

  • No, AWS Glue Crawler won’t automatically generate a schema from that JSON structure because it expects data in a uniform format, like an array of objects, rather than an object with dynamic keys.

    You could preprocess the JSON into an array format, like this:

    [
      {"id": "id_01", "name": "Julie", "city": "Paris"},
      {"id": "id_02", "name": "Marc", "city": "Lyon"}
    ]
    

    As for fetching directly from S3 without crawling, it’s fine for small or infrequently accessed files. But if you need to query it regularly, consider transforming the data and using something like Athena for better performance.