restgithubgraphqlgithub-apigithub-graphql

GitHub REST and GraphQL API are returning different data


I am scraping some data from GitHub. The RESTful URL to this particular PR shows that it has a merge_commit_sha value: https://api.github.com/repos/ansible/ansible/pulls/15088

However, when I try to get the same PR using GitHub GraphQL API, it shows it does not have any mergedCommit value.

  resource(
    url: "https://github.com/ansible/ansible/pull/15088"
  ) { 
    ...on PullRequest {
      id
      number
      title
      merged
      mergeCommit {
        message
      }
    }
  }

For context, the PR of interest is actually merged and should have a merged-commit value. I am looking for an explanation of the difference between these two APIs.


Solution

  • This link posted in the other answer contains the explanation:

    As in, Git doesn’t have the originalCommit (which makes sense). Presumably the original commit SHA is there, but the graphQL API actually checks to see if git has it, whereas the REST API doesn’t?

    If you search for the commit SHA the API returns, you can't find it in the repo.

    https://github.com/ansible/ansible/commit/d7b54c103050d9fc4965e57b7611a70cb964ab25

    Since this is a very old pull request on an active repo, there's a good chance some old commits were cleaned up or other maintenance on the repo. It's hard to tell as that kind of maintenance obviously isn't version controlled.

    Another option is the pull request was merged with fast-forward, which does not involve a merge commit. But that wouldn't explain the SHA on the REST API response.

    So probably at some point they removed old merge commits to save some space, or something similar. Some objects still point to removed SHAs, but GraphQL API filters on existing objects.