I would like to write a function that extracts some contributor data from a GitHub project's contributor page. For example: https://github.com/easystats/report/graphs/contributors
How can I extract, using R, for example the username, number of commits, number of additions, and number of removals?
Here is my attempt at web scraping using rvest
(https://github.com/tidyverse/rvest):
library(rvest)
contribs <- read_html("https://github.com/easystats/report/graphs/contributors")
section <- contribs %>% html_elements("section")
section
#> {xml_nodeset (0)}
contribs$node
#> <pointer: 0x0000027d9b9e9f10>
contribs$doc
#> <pointer: 0x0000027d9e03d140>
Created on 2023-01-29 with reprex v2.0.2
But I think I am not getting the expected result.
However, I would much prefer a solution where I could use an existing R package for this, or the GitHub API (https://github.com/r-lib/gh).
But is it possible at all?
Found their API in the network section in the developer tools
library(tidyverse)
library(httr2)
"https://github.com/easystats/report/graphs/contributors-data" %>%
request() %>%
req_headers("x-requested-with" = "XMLHttpRequest",
accept = "appliacation/json") %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
unnest(everything()) %>%
group_by(username = str_remove(path, "/")) %>%
summarise(across(a:c, sum))
# A tibble: 21 x 4
username a d c
<chr> <int> <int> <int>
1 DominiqueMakowski 203778 148154 325
2 IndrajeetPatil 15082 10513 159
3 LukasWallrich 1 1 1
4 bwiernik 1371 156 11
5 cgeger 1 1 1
6 drfeinberg 127 23 1
7 dtoher 26 26 1
8 etiennebacher 127 162 7
9 fkohrt 1 1 1
10 grimmjulian 2 2 1
11 humanfactors 22 23 4
12 jdtrat 1 1 1
13 m-macaskill 33 31 2
14 mattansb 1009 603 14
15 mutlusun 265 4 4
16 pkoaz 3 2 1
17 rempsyc 3427 2938 14
18 strengejacke 5129 38164 223
19 vincentarelbundock 5 0 1
20 webbedfeet 85 85 2
21 wjschne 2 2 1