rdataframetwitter-api-v2

Unlisting lists inside data frame and put them in different columns in r


I used the Twitter API to get lots of tweets. What I did was to create a df with the data I want:

  preprocess <- function(df) {
  df_tw <- do.call(rbind,lapply(df, function (m)
    data.frame(text = df$text,
               lang = df$lang,
               geo = df$geo,
               date = df$created_at)))
  # Select unique rows based on the text column only
  df_u <- df_tw %>% distinct(text, .keep_all=TRUE)
  return(df)
}

However, the coordinates look like this: c(14.4865036, 35.85288308). How can I put them in different columns in the same df?

> dput(head(df_mt))
structure(list(text = c("A tiny little fish dish to round off the day. ", 
"Sharing Music #dj #Malta #house #housemusic #pioneer #xemxija #venezuelanDj ", 
"Dj Abraham Sound en Malta #dj #pioneer #paceville #Malta #VenezuelanDj ", 
"Nature’s very own private pool, the blue hole in Gozo is a place that you can enjoy all year round. 📸: @chrissefarbi and @ch.farbmacher \n\n#Malta #VisitMalta #MoreToExplore ", 
"London’s first EV rapid charging hub opened by TfL and Engenie  #Taxi #Chauffeur #Malta ", 
"Incredible to see this in Malta 🇲🇹🇵🇱@FlightPolish "
), lang = c("en", "en", "en", "en", "en", "en"), geo.place_id = c("0fc3ac0d6915e000", 
"1d834adff5d584df", "07d9d2902f483001", "1d834adff5d584df", "1d834adff5d584df", 
"0fc2ecc63cd4c000"), geo.coordinates = structure(list(type = c(NA, 
NA, NA, NA, "Point", NA), coordinates = list(NULL, NULL, NULL, 
    NULL, c(14.4865036, 35.85288308), NULL)), row.names = c(NA, 
6L), class = "data.frame"), date = c("2022-12-30T20:00:29.000Z", 
"2022-12-30T17:21:44.000Z", "2022-12-30T17:16:15.000Z", "2022-12-30T15:54:39.000Z", 
"2022-12-30T14:57:34.000Z", "2022-12-30T14:32:18.000Z"), row.names = c("attachments.3", 
"attachments.4", "attachments.5", "attachments.6", "attachments.7", 
"attachments.8"), class = "data.frame")

Thank you.


Solution

  • With unnest_wider:

    library(tidyr)
    data.frame(df) |>
      unnest_wider(geo.coordinates.coordinates, names_sep = ".")
    

    output

    ## A tibble: 6 × 9
    #  text                                                            lang  geo.p…¹ geo.c…² geo.c…³ geo.c…⁴ date  row.n…⁵ class
    #  <chr>                                                           <chr> #<chr>   <chr>     <dbl>   <dbl> <chr> <chr>   <chr>
    #1 "A tiny little fish dish to round off the day. "                en    0fc3ac… NA         NA      NA   2022… attach… data…
    #2 "Sharing Music #dj #Malta #house #housemusic #pioneer #xemxija… en    1d834a… NA         NA      NA   2022… attach… data…
    #3 "Dj Abraham Sound en Malta #dj #pioneer #paceville #Malta #Ven… en    07d9d2… NA         NA      NA   2022… attach… data…
    #4 "Nature’s very own private pool, the blue hole in Gozo is a pl… en    1d834a… NA         NA      NA   2022… attach… data…
    #5 "London’s first EV rapid charging hub opened by TfL and Engeni… en    1d834a… Point      14.5    35.9 2022… attach… data…
    #6 "Incredible to see this in Malta \U0001f1f2\U0001f1f9\U0001f1f… en    0fc2ec… NA         NA      NA   2022… attach… data…
    ## … with abbreviated variable names ¹​geo.place_id, ²​geo.coordinates.type, ³​geo.coordinates.coordinates.1,
    ##   ⁴​geo.coordinates.coordinates.2, ⁵​row.names