This is similar to the question posted here.
I use the nhlapi package to scrape some boxscores, and this results in a large nested list.
What I would like to end up with is a dataframe that contains all of the players for both the home and away teams as well as all of the stats into a dataframe, just like it is displayed on the nhl site.
The code to get the boxscore data is:
install.packages("nhlapi")
library(nhlapi)
boxscores<-nhl_games_boxscore(gameIds = 2023020001)
I then used the suggested answer in the similar question. I am able to get player details using this:
away_players <- boxscores[[1]][["teams"]][["away"]][["players"]]
df_away_players <- lapply(1:length(away_players), function(i) {
away_players[[i]][["person"]] %>%
data.frame()
}) %>%
bind_rows()
head(df_away_players)
id fullName link firstName lastName primaryNumber birthDate
1 8482062 Cole Smith /api/v1/people/8482062 Cole Smith 36 1995-10-28
2 8478508 Yakov Trenin /api/v1/people/8478508 Yakov Trenin 13 1997-01-13
3 8474679 Gustav Nyquist /api/v1/people/8474679 Gustav Nyquist 14 1989-09-01
4 8476925 Colton Sissons /api/v1/people/8476925 Colton Sissons 10 1993-11-05
5 8478438 Tommy Novak /api/v1/people/8478438 Tommy Novak 82 1997-04-28
6 8476887 Filip Forsberg /api/v1/people/8476887 Filip Forsberg 9 1994-08-13
I can also get the skaterStats using this:
df_away_stats <- lapply(1:length(away_players), function(i) {
away_players[[i]][["stats"]] %>%
data.frame()
}) %>%
bind_rows()
head(df_away_stats)
skaterStats.timeOnIce skaterStats.assists skaterStats.goals skaterStats.shots
1 12:52 0 0 1
2 13:54 0 0 1
3 14:53 1 0 2
4 14:45 0 0 3
5 15:43 0 1 3
6 20:51 2 0 6
I tried to combine the two using this:
df_combined <- c(df_away_players, df_away_stats)
Which does produce a list, but I cannot figure out how to get all of this info into a data frame.
str(df_combined)
List of 63
$ id : int [1:22] 8482062 8478508 8474679 8476925 8478438 8476887 8474600 8474568 8481239 8478851 ...
$ fullName : chr [1:22] "Cole Smith" "Yakov Trenin" "Gustav Nyquist" "Colton Sissons" ...
$ link : chr [1:22] "/api/v1/people/8482062" "/api/v1/people/8478508" "/api/v1/people/8474679" "/api/v1/people/8476925" ...
$ firstName : chr [1:22] "Cole" "Yakov" "Gustav" "Colton" ...
$ lastName : chr [1:22] "Smith" "Trenin" "Nyquist" "Sissons" ...
$ primaryNumber : chr [1:22] "36" "13" "14" "10" ...
$ birthDate : chr [1:22] "1995-10-28" "1997-01-13" "1989-09-01" "1993-11-05" ...
$ currentAge : int [1:22] 27 26 34 29 26 29 33 33 23 27 ...
It should look very similar to what is shown on the nhl site. Using the URL: https://www.nhl.com/gamecenter/nsh-vs-tbl/2023/10/10/2023020001/boxscore
This is how it looks:
If possible, I would also like to add the date and also be able to do this for multiple games (I believe the nhl_games_boxscore
function accepts multiple gameIds), but I suspect I will need a loop of some kind for that?
OUTPUT OF first two players from away_players
as requested in the comments below:
dput(df_away_players[c(1, 2)]))
away_players<- list(ID8482062 = list(person = list(id = 8482062L, fullName = "Cole Smith",
link = "/api/v1/people/8482062", firstName = "Cole", lastName = "Smith",
primaryNumber = "36", birthDate = "1995-10-28", currentAge = 27L,
birthCity = "Brainerd", birthStateProvince = "MN", birthCountry = "USA",
nationality = "USA", height = "6' 3\"", weight = 195L, active = TRUE,
alternateCaptain = FALSE, captain = FALSE, rookie = FALSE,
shootsCatches = "L", rosterStatus = "Y", currentTeam = list(
id = 18L, name = "Nashville Predators", link = "/api/v1/teams/18"),
primaryPosition = list(code = "L", name = "Left Wing", type = "Forward",
abbreviation = "LW")), jerseyNumber = "36", position = list(
code = "L", name = "Left Wing", type = "Forward", abbreviation = "LW"),
stats = list(skaterStats = list(timeOnIce = "12:52", assists = 0L,
goals = 0L, shots = 1L, hits = 2L, powerPlayGoals = 0L,
powerPlayAssists = 0L, penaltyMinutes = 0L, faceOffWins = 0L,
faceoffTaken = 0L, takeaways = 2L, giveaways = 1L, shortHandedGoals = 0L,
shortHandedAssists = 0L, blocked = 0L, plusMinus = 0L,
evenTimeOnIce = "7:52", powerPlayTimeOnIce = "0:00",
shortHandedTimeOnIce = "5:00"))), ID8478508 = list(person = list(
id = 8478508L, fullName = "Yakov Trenin", link = "/api/v1/people/8478508",
firstName = "Yakov", lastName = "Trenin", primaryNumber = "13",
birthDate = "1997-01-13", currentAge = 26L, birthCity = "Chelyabinsk",
birthCountry = "RUS", nationality = "RUS", height = "6' 2\"",
weight = 201L, active = TRUE, alternateCaptain = FALSE, captain = FALSE,
rookie = FALSE, shootsCatches = "L", rosterStatus = "Y",
currentTeam = list(id = 18L, name = "Nashville Predators",
link = "/api/v1/teams/18"), primaryPosition = list(code = "C",
name = "Center", type = "Forward", abbreviation = "C")),
jerseyNumber = "13", position = list(code = "C", name = "Center",
type = "Forward", abbreviation = "C"), stats = list(skaterStats = list(
timeOnIce = "13:54", assists = 0L, goals = 0L, shots = 1L,
hits = 3L, powerPlayGoals = 0L, powerPlayAssists = 0L,
penaltyMinutes = 0L, faceOffWins = 0L, faceoffTaken = 0L,
takeaways = 3L, giveaways = 0L, shortHandedGoals = 0L,
shortHandedAssists = 0L, blocked = 0L, plusMinus = 0L,
evenTimeOnIce = "10:51", powerPlayTimeOnIce = "0:00",
shortHandedTimeOnIce = "3:03"))))
This is tricky because away_players
is a reasonably deeply nested list, with lists with unequally sized elements.
Since all lists resolve to individual 1-element "nodes", we can unlist
every sublist, preserving individual players as the topmost nodes.
Then, enframe
will create data.frames from the resulting named vectors.
This will create a list of data.frames in the long format.
We can then pivot_wider
to create a tidy data.frame with a single line per player.
Finally, bind_rows
to create the final data.frame. The column names got a bit clunky, but this can easily be amended with rename_with
, or janitor::clean_names
.
library(purrr)
library(dplyr)
library(tibble)
away_players |>
map(unlist) |>
map(enframe) |>
map(\(x) pivot_wider(x,
names_from = name,
values_from = value)) |>
bind_rows()
# A tibble: 2 × 51
person.id person.fullName person.link person.firstName person.lastName person.primaryNumber person.birthDate
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 8482062 Cole Smith /api/v1/people/8482… Cole Smith 36 1995-10-28
2 8478508 Yakov Trenin /api/v1/people/8478… Yakov Trenin 13 1997-01-13
# ℹ 44 more variables: person.currentAge <chr>, person.birthCity <chr>, person.birthStateProvince <chr>,
# person.birthCountry <chr>, person.nationality <chr>, person.height <chr>, person.weight <chr>,
# person.active <chr>, person.alternateCaptain <chr>, person.captain <chr>, person.rookie <chr>,
# person.shootsCatches <chr>, person.rosterStatus <chr>, person.currentTeam.id <chr>, person.currentTeam.name <chr>,
# person.currentTeam.link <chr>, person.primaryPosition.code <chr>, person.primaryPosition.name <chr>,
# person.primaryPosition.type <chr>, person.primaryPosition.abbreviation <chr>, jerseyNumber <chr>,
# position.code <chr>, position.name <chr>, position.type <chr>, position.abbreviation <chr>, …