rdplyrpurrr

Extract multiple elements from a large list and put them in a dataframe in R


This is similar to the question posted here.

I use the nhlapi package to scrape some boxscores, and this results in a large nested list.

What I would like to end up with is a dataframe that contains all of the players for both the home and away teams as well as all of the stats into a dataframe, just like it is displayed on the nhl site.

The code to get the boxscore data is:

install.packages("nhlapi")
library(nhlapi)

boxscores<-nhl_games_boxscore(gameIds = 2023020001)

I then used the suggested answer in the similar question. I am able to get player details using this:

away_players <- boxscores[[1]][["teams"]][["away"]][["players"]]

df_away_players <- lapply(1:length(away_players), function(i) {
  away_players[[i]][["person"]] %>% 
    data.frame()
}) %>% 
  bind_rows()

head(df_away_players)

       id       fullName                   link firstName lastName primaryNumber  birthDate
1 8482062     Cole Smith /api/v1/people/8482062      Cole    Smith            36 1995-10-28
2 8478508   Yakov Trenin /api/v1/people/8478508     Yakov   Trenin            13 1997-01-13
3 8474679 Gustav Nyquist /api/v1/people/8474679    Gustav  Nyquist            14 1989-09-01
4 8476925 Colton Sissons /api/v1/people/8476925    Colton  Sissons            10 1993-11-05
5 8478438    Tommy Novak /api/v1/people/8478438     Tommy    Novak            82 1997-04-28
6 8476887 Filip Forsberg /api/v1/people/8476887     Filip Forsberg             9 1994-08-13

I can also get the skaterStats using this:

df_away_stats <- lapply(1:length(away_players), function(i) {
  away_players[[i]][["stats"]] %>% 
    data.frame()
}) %>% 
  bind_rows()

head(df_away_stats)
  skaterStats.timeOnIce skaterStats.assists skaterStats.goals skaterStats.shots
1                 12:52                   0                 0                 1
2                 13:54                   0                 0                 1
3                 14:53                   1                 0                 2
4                 14:45                   0                 0                 3
5                 15:43                   0                 1                 3
6                 20:51                   2                 0                 6

I tried to combine the two using this:

df_combined <- c(df_away_players, df_away_stats)

Which does produce a list, but I cannot figure out how to get all of this info into a data frame.

str(df_combined)
List of 63
 $ id                                    : int [1:22] 8482062 8478508 8474679 8476925 8478438 8476887 8474600 8474568 8481239 8478851 ...
 $ fullName                              : chr [1:22] "Cole Smith" "Yakov Trenin" "Gustav Nyquist" "Colton Sissons" ...
 $ link                                  : chr [1:22] "/api/v1/people/8482062" "/api/v1/people/8478508" "/api/v1/people/8474679" "/api/v1/people/8476925" ...
 $ firstName                             : chr [1:22] "Cole" "Yakov" "Gustav" "Colton" ...
 $ lastName                              : chr [1:22] "Smith" "Trenin" "Nyquist" "Sissons" ...
 $ primaryNumber                         : chr [1:22] "36" "13" "14" "10" ...
 $ birthDate                             : chr [1:22] "1995-10-28" "1997-01-13" "1989-09-01" "1993-11-05" ...
 $ currentAge                            : int [1:22] 27 26 34 29 26 29 33 33 23 27 ...

It should look very similar to what is shown on the nhl site. Using the URL: https://www.nhl.com/gamecenter/nsh-vs-tbl/2023/10/10/2023020001/boxscore

This is how it looks:

enter image description here

If possible, I would also like to add the date and also be able to do this for multiple games (I believe the nhl_games_boxscore function accepts multiple gameIds), but I suspect I will need a loop of some kind for that?

OUTPUT OF first two players from away_players as requested in the comments below:

dput(df_away_players[c(1, 2)]))

away_players<- list(ID8482062 = list(person = list(id = 8482062L, fullName = "Cole Smith", 
                                    link = "/api/v1/people/8482062", firstName = "Cole", lastName = "Smith", 
                                    primaryNumber = "36", birthDate = "1995-10-28", currentAge = 27L, 
                                    birthCity = "Brainerd", birthStateProvince = "MN", birthCountry = "USA", 
                                    nationality = "USA", height = "6' 3\"", weight = 195L, active = TRUE, 
                                    alternateCaptain = FALSE, captain = FALSE, rookie = FALSE, 
                                    shootsCatches = "L", rosterStatus = "Y", currentTeam = list(
                                        id = 18L, name = "Nashville Predators", link = "/api/v1/teams/18"), 
                                    primaryPosition = list(code = "L", name = "Left Wing", type = "Forward", 
                                                           abbreviation = "LW")), jerseyNumber = "36", position = list(
                                                               code = "L", name = "Left Wing", type = "Forward", abbreviation = "LW"), 
                      stats = list(skaterStats = list(timeOnIce = "12:52", assists = 0L, 
                                                      goals = 0L, shots = 1L, hits = 2L, powerPlayGoals = 0L, 
                                                      powerPlayAssists = 0L, penaltyMinutes = 0L, faceOffWins = 0L, 
                                                      faceoffTaken = 0L, takeaways = 2L, giveaways = 1L, shortHandedGoals = 0L, 
                                                      shortHandedAssists = 0L, blocked = 0L, plusMinus = 0L, 
                                                      evenTimeOnIce = "7:52", powerPlayTimeOnIce = "0:00", 
                                                      shortHandedTimeOnIce = "5:00"))), ID8478508 = list(person = list(
                                                          id = 8478508L, fullName = "Yakov Trenin", link = "/api/v1/people/8478508", 
                                                          firstName = "Yakov", lastName = "Trenin", primaryNumber = "13", 
                                                          birthDate = "1997-01-13", currentAge = 26L, birthCity = "Chelyabinsk", 
                                                          birthCountry = "RUS", nationality = "RUS", height = "6' 2\"", 
                                                          weight = 201L, active = TRUE, alternateCaptain = FALSE, captain = FALSE, 
                                                          rookie = FALSE, shootsCatches = "L", rosterStatus = "Y", 
                                                          currentTeam = list(id = 18L, name = "Nashville Predators", 
                                                                             link = "/api/v1/teams/18"), primaryPosition = list(code = "C", 
                                                                                                                                name = "Center", type = "Forward", abbreviation = "C")), 
                                                          jerseyNumber = "13", position = list(code = "C", name = "Center", 
                                                                                               type = "Forward", abbreviation = "C"), stats = list(skaterStats = list(
                                                                                                   timeOnIce = "13:54", assists = 0L, goals = 0L, shots = 1L, 
                                                                                                   hits = 3L, powerPlayGoals = 0L, powerPlayAssists = 0L, 
                                                                                                   penaltyMinutes = 0L, faceOffWins = 0L, faceoffTaken = 0L, 
                                                                                                   takeaways = 3L, giveaways = 0L, shortHandedGoals = 0L, 
                                                                                                   shortHandedAssists = 0L, blocked = 0L, plusMinus = 0L, 
                                                                                                   evenTimeOnIce = "10:51", powerPlayTimeOnIce = "0:00", 
                                                                                                   shortHandedTimeOnIce = "3:03"))))

Solution

  • This is tricky because away_players is a reasonably deeply nested list, with lists with unequally sized elements. Since all lists resolve to individual 1-element "nodes", we can unlist every sublist, preserving individual players as the topmost nodes. Then, enframe will create data.frames from the resulting named vectors. This will create a list of data.frames in the long format. We can then pivot_wider to create a tidy data.frame with a single line per player. Finally, bind_rows to create the final data.frame. The column names got a bit clunky, but this can easily be amended with rename_with, or janitor::clean_names.

    library(purrr)
    library(dplyr)
    library(tibble)
    
    away_players |> 
        map(unlist) |> 
        map(enframe) |> 
        map(\(x) pivot_wider(x,
                             names_from = name,
                             values_from = value)) |> 
        bind_rows()
    
    # A tibble: 2 × 51
      person.id person.fullName person.link          person.firstName person.lastName person.primaryNumber person.birthDate
      <chr>     <chr>           <chr>                <chr>            <chr>           <chr>                <chr>           
    1 8482062   Cole Smith      /api/v1/people/8482… Cole             Smith           36                   1995-10-28      
    2 8478508   Yakov Trenin    /api/v1/people/8478… Yakov            Trenin          13                   1997-01-13      
    # ℹ 44 more variables: person.currentAge <chr>, person.birthCity <chr>, person.birthStateProvince <chr>,
    #   person.birthCountry <chr>, person.nationality <chr>, person.height <chr>, person.weight <chr>,
    #   person.active <chr>, person.alternateCaptain <chr>, person.captain <chr>, person.rookie <chr>,
    #   person.shootsCatches <chr>, person.rosterStatus <chr>, person.currentTeam.id <chr>, person.currentTeam.name <chr>,
    #   person.currentTeam.link <chr>, person.primaryPosition.code <chr>, person.primaryPosition.name <chr>,
    #   person.primaryPosition.type <chr>, person.primaryPosition.abbreviation <chr>, jerseyNumber <chr>,
    #   position.code <chr>, position.name <chr>, position.type <chr>, position.abbreviation <chr>, …