githubaws-glueaws-glue-data-catalogaws-data-pipeline

How to integrate Github with Data Catalog in AWS Glue


This question is about the Data Catalog of AWS Glue.

I want to build a process like this:

Connect Github to AWS Glue Data Catalog -> Pull Request about data catalog code(source) -> Merge -> Reflecting Modified Code in the AWS Glue Data Catalog -> The changed Data Catalog information is created by Markdown. Or update information in Confluence

The purpose of this work is to make the Data Catalog readable by non-developers.

Is this possible? What literature should I read? Any advice is welcome! Help!!


Solution

  • Option 1: You can use boto3 glue APIs to retrieve information about tables - get_table or get_tables()

    You may refer https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_tables It also contains usage and response examples.

    Once response is received, you may show it in web-page.

    Advantage: Non-tech person can access without any setup

    Disadvatange: Developer have to write code

    Option 2: Use AWS CLI command tool. Link: https://docs.aws.amazon.com/cli/latest/reference/glue/get-table.html

    Advantage: No code needed from developer

    Disadvantage: Client should know how to setup and use AWS CLI commands and their output.