rtidyversesubsettibblereadr

Why does spec() return NULL after subsetting a tibble? (And how do I avoid that?)


After reading in my data using read_csv() from readr, the command spec() returns "full column specification" for the resulting tibble:

> spec(steps)
cols(
  duration = col_double(),
  version_code = col_double(),
  run_step = col_double(),
  walk_step = col_double(),
  start_time = col_datetime(format = ""),
  sample_position_type = col_logical(),
  custom = col_logical(),
  update_time = col_datetime(format = ""),
  create_time = col_datetime(format = ""),
  count = col_double(),
  speed = col_double(),
  distance = col_double(),
  calorie = col_double(),
  time_offset = col_character(),
  deviceuuid = col_character(),
  pkg_name = col_character(),
  end_time = col_datetime(format = ""),
  datauuid = col_character(),
  x = col_logical()
)

But if I subset the tibble that information is lost:

> spec(subset(steps, select = c(1, 5, 10, 11, 12, 17)))
NULL

Why? And how do I keep it?


Solution

  • I'm not sure why this behavior occurs, but this is intentional and explicitly defined in the readr changelog - if you look at 1.3, it states (emphasis mine):

    "readr 1.3.0 returns results with a spec_tbl_df subclass. This differs from a regular tibble only that the spec attribute (which holds the column specification) is lost as soon as the object is subset (and a normal tbl_df object is returned)."

    You can add the spec attributes back in using attr(subseted_df, "spec") <- attr(original_df, "spec") - for instance:

    Data

    x <- readr::read_csv(readr::readr_example("mtcars.csv"))
    readr::spec(x) # works normal
    
    y <- x[x$mpg < 20,]
    
    readr::spec(y)
    # NULL
    
    # add in specs
    attr(y, "spec") <- attr(x, "spec")
    
    readr::spec(y)
    
    cols(
      mpg = col_double(),
      cyl = col_double(),
      disp = col_double(),
      hp = col_double(),
      drat = col_double(),
      wt = col_double(),
      qsec = col_double(),
      vs = col_double(),
      am = col_double(),
      gear = col_double(),
      carb = col_double()
    )