rgithubweb-scrapingrvest

How can I web scrape GitHub project contributor information in R?


I would like to write a function that extracts some contributor data from a GitHub project's contributor page. For example: https://github.com/easystats/report/graphs/contributors

How can I extract, using R, for example the username, number of commits, number of additions, and number of removals?

Here is my attempt at web scraping using rvest (https://github.com/tidyverse/rvest):

library(rvest)

contribs <- read_html("https://github.com/easystats/report/graphs/contributors")

section <- contribs %>% html_elements("section")
section
#> {xml_nodeset (0)}

contribs$node
#> <pointer: 0x0000027d9b9e9f10>
contribs$doc
#> <pointer: 0x0000027d9e03d140>

Created on 2023-01-29 with reprex v2.0.2

But I think I am not getting the expected result.

However, I would much prefer a solution where I could use an existing R package for this, or the GitHub API (https://github.com/r-lib/gh).
But is it possible at all?


Solution

  • Found their API in the network section in the developer tools

    library(tidyverse)
    library(httr2)
    
    "https://github.com/easystats/report/graphs/contributors-data" %>%
      request() %>%
      req_headers("x-requested-with" = "XMLHttpRequest",
                  accept = "appliacation/json") %>%
      req_perform() %>%
      resp_body_json(simplifyVector = TRUE) %>%
      unnest(everything()) %>%
      group_by(username = str_remove(path, "/")) %>%
      summarise(across(a:c, sum)) 
    
    # A tibble: 21 x 4
       username                a      d     c
       <chr>               <int>  <int> <int>
     1 DominiqueMakowski  203778 148154   325
     2 IndrajeetPatil      15082  10513   159
     3 LukasWallrich           1      1     1
     4 bwiernik             1371    156    11
     5 cgeger                  1      1     1
     6 drfeinberg            127     23     1
     7 dtoher                 26     26     1
     8 etiennebacher         127    162     7
     9 fkohrt                  1      1     1
    10 grimmjulian             2      2     1
    11 humanfactors           22     23     4
    12 jdtrat                  1      1     1
    13 m-macaskill            33     31     2
    14 mattansb             1009    603    14
    15 mutlusun              265      4     4
    16 pkoaz                   3      2     1
    17 rempsyc              3427   2938    14
    18 strengejacke         5129  38164   223
    19 vincentarelbundock      5      0     1
    20 webbedfeet             85     85     2
    21 wjschne                 2      2     1