rgoogle-scholar

Combining google scholar id and pubid in for loop


I am using the "scholar" package in R. I want to create a social network of coauthors for my research group. I created a dataframe researchers as follows:

members <- data.frame(name = c("Linton C Freeman", "Ronald Burt", "Stephen P. Borgatti"),
                      scholar_id = c("quiVMg8AAAAJ", "g-R8XdkAAAAJ", "hlk4a4gAAAAJ"),
                      stringsAsFactors = F)

Then I created a for loop to get publications for each researcher:

pubs <- get_publications(member$scholar_id[1])
for(i in 2:nrow(member)){
           pubs_ <- get_publications(member$scholar_id[i])
           pubs <- rbind(pubs, pubs_)
}

To get a nice list of coauthors I need to use this syntax:

coauthors <- get_complete_authors(scholar_id, pubid)

For example:

co-authors <- get_complete_authors(members$scholar_id[1], pubs$pubid[1])

I want to iterate through members to get all coauthors in a dataframe. I guess I need to nest my loops first iterating through pubs then members. I also need to add a pause statement in my loop to avoid HTTP 503 errors. My question is how to I construct a loop that does this? At the end of the day, I want a dataframe that has pubid and authors. I know how to create an edge list from this. Please help.


Solution

  • Here is how I would approach the problem using a single data.frame to keep everything organised. I would do it that way because it looks like Google Scholar uses the same id to refer to different publications, which makes life interesting.

    library(scholar)
    library(tidyverse)
    
    member <- data_frame(name = c("Linton C Freeman", "Ronald Burt", "Stephen P. Borgatti"),
                          scholar_id = c("quiVMg8AAAAJ", "g-R8XdkAAAAJ", "hlk4a4gAAAAJ"))
    
    bib_data <- member %>% 
      #this lets mutate work on each row independently
      rowwise %>% 
      #produce a dataframe for each row
      mutate(pubs = list(get_publications(scholar_id))) %>% 
      #expand the dataframes
      unnest() %>% 
      #I've included this to keep the requests down for a demonstration
      filter(row_number() < 6) %>% 
      #as above
      rowwise %>% 
      #this now uses the scholar_id and pubid from each row to get the coauthor
      #information as a new column
      mutate(coauths = get_complete_authors(scholar_id, pubid))
    

    This way you can avoid for loops altogether, and hopefully keep all the records organised clearly.

    Dealing with the coauthor information is then a bit of a different challenge, because it looks like the formats (in terms of abbreviations especially) are not consistent...