rjsonloopsopendata

A continuation of... Extracting data from an API using R


I'm a super new at this and working on R for my thesis. The code in this answer finally worked for me (Extracting data from an API using R), but I can't figure out how to add a loop to it. I keep getting the first page of the API when I need all 3360. Here's the code:

    library(httr)
    library(jsonlite)
    r1 <- GET("http://data.riksdagen.se/dokumentlista/? 
    sok=&doktyp=mot&rm=&from=2000-01-01&tom=2017-12- 31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s#soktraff")

r2 <- rawToChar(r1$content)

class(r2)
r3 <- fromJSON(r2)

r4 <- r3$dokumentlista$dokument

By the time I reach r4, it's already a data frame.

Please and thank you!

Edit: originally, I couldn't get a url that had the page as info within it. Now I have it (below). I still haven't been able to loop it. "http://data.riksdagen.se/dokumentlista/?sok=&doktyp=mot&rm=&from=2000-01-01&tom=2017-12-31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s&p="


Solution

  • I think you can extract the url of the next page from r3 as follows:

    next_url <- r3$dokumentlista$`@nasta_sida`
    # you need to re-check this, but sometimes I'm getting white spaces within the url, 
    # you may not face this problem, but in any case this line of code solved the issue 
    next_url <- gsub(' ', '', n_url)
    
    GET(next_url)
    

    Update

    I tried the url with the page number with 10 pages and it worked

    my_dfs <- lapply(1:10, function(i){
      my_url <- paste0("http://data.riksdagen.se/dokumentlista/?sok=&doktyp=mot&rm=&from=2000-01-01&tom=2017-12-31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s&p=", i)
      r1 <- GET(my_url)
      r2 <- rawToChar(r1$content)
      r3 <- fromJSON(r2)
      r4 <- r3$dokumentlista$dokument
      return(r4)
    })
    

    Update 2:

    The extracted data frames are complex (e.g. some columns are lists of data frames) which is why a simple rbind will not work here, you'll have to do some pre-processing before you stack up the data together, something like this would work

    my_dfs %>% lapply(function(df_0){
          # Do some stuff here with the data, and choose the variables you need
          # I chose the first 10 columns to check that I got 200 different observations
          df_0[1:10]
        }) %>% do.call(rbind, .)