[SOLVED] Why does spec() return NULL after subsetting a tibble? (And how do I avoid that?)

Why does spec() return NULL after subsetting a tibble? (And how do I avoid that?)

After reading in my data using read_csv() from readr, the command spec() returns "full column specification" for the resulting tibble:

> spec(steps)
cols(
  duration = col_double(),
  version_code = col_double(),
  run_step = col_double(),
  walk_step = col_double(),
  start_time = col_datetime(format = ""),
  sample_position_type = col_logical(),
  custom = col_logical(),
  update_time = col_datetime(format = ""),
  create_time = col_datetime(format = ""),
  count = col_double(),
  speed = col_double(),
  distance = col_double(),
  calorie = col_double(),
  time_offset = col_character(),
  deviceuuid = col_character(),
  pkg_name = col_character(),
  end_time = col_datetime(format = ""),
  datauuid = col_character(),
  x = col_logical()
)

But if I subset the tibble that information is lost:

> spec(subset(steps, select = c(1, 5, 10, 11, 12, 17)))
NULL

Why? And how do I keep it?

Solution

I'm not sure why this behavior occurs, but this is intentional and explicitly defined in the readr changelog - if you look at 1.3, it states (emphasis mine):

"readr 1.3.0 returns results with a spec_tbl_df subclass. This differs from a regular tibble only that the spec attribute (which holds the column specification) is lost as soon as the object is subset (and a normal tbl_df object is returned)."

You can add the spec attributes back in using attr(subseted_df, "spec") <- attr(original_df, "spec") - for instance:

Data

x <- readr::read_csv(readr::readr_example("mtcars.csv"))
readr::spec(x) # works normal

y <- x[x$mpg < 20,]

readr::spec(y)
# NULL

# add in specs
attr(y, "spec") <- attr(x, "spec")

readr::spec(y)

cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)