githubgithub-apigithub-graphql

How to lookup GitHub usernames by emails in a single GitHub API call?


I'm trying to look up the GitHub username for a few hundred users based on their email (which I pulled from the git log). Unfortunately I can't figure out how to do this without making a single call per email.

How do I look up many GitHub usernames by email in as few queries as possible?

Previous answers that didn't work for me:


Solution

  • GitHub API doesn't support looking up multiple users by email at once. However, you can minimize the number of requests you need to make by using GitHub's GraphQL API instead of the REST API. This will allow you to retrieve multiple users' information in a single request.

    Here's an example script using the GraphQL API to perform multiple email lookups in a single request. It has to be run from the existing GitHub repository directory. It will, first, read the unique list of committers' emails using git log command and then it will build a list of GraphQL queries for each email. The queries will be written to query.json file and passed as an argument to curl command that will execute all of them in a single HTTP call. Finally. jq command is used to parse the response. To run the script, you have to have GITHUB_TOKEN environment variable set. This is required to access Github GraphQL API without limits imposed on anonymous access.

    #!/usr/bin/env bash
    
    # more reliable error handling
    set -eua pipefail
    
    # read unique emails from git log and store them in an array
    read -ra emails <<< "$(git log --format='%ae' | sort -u | xargs)"
    
    # Build the GraphQL query string with one search query per email address
    # See https://docs.github.com/en/graphql/reference/queries
    query="query {"
    for idx in "${!emails[@]}"; do
      query+=" query${idx}: search(query: \\\"in:email ${emails[$idx]}\\\", type: USER, first: 1) { nodes { ... on User { login email } } }"
    done
    query+=" }"
    
    # Write the GraphQL query to a query.json file
    # See https://docs.github.com/en/graphql/overview/resource-limitations
    echo "{\"query\": \"$query\"}" > query.json
    
    # Execute the GraphQL query
    curl --fail-with-body -sH "Authorization: token $GITHUB_TOKEN" --data @query.json https://api.github.com/graphql |
      # Parse the JSON response and build the email => login mapping
      jq -r '.data | to_entries[] | .value.nodes[] | "\(.email) => \(.login)"'
    

    Keep in mind that there is a limit to the number of simultaneous queries you can send in a single request. If you need to look up more emails, you may have to divide them into smaller chunks and make multiple requests. The exact limit will depend on the rate limits set by GitHub for your account. You can check your rate limits in the API response headers as well.

    Please keep in mind the generated GraphQL query will not return the mapping if there's no matching login found for the given email (eg.: the user does not exist anymore)

    You can also use the GitHub GraphQL API Explorer to test your queries.