After reading in my data using read_csv()
from readr, the command spec()
returns "full column specification" for the resulting tibble:
> spec(steps)
cols(
duration = col_double(),
version_code = col_double(),
run_step = col_double(),
walk_step = col_double(),
start_time = col_datetime(format = ""),
sample_position_type = col_logical(),
custom = col_logical(),
update_time = col_datetime(format = ""),
create_time = col_datetime(format = ""),
count = col_double(),
speed = col_double(),
distance = col_double(),
calorie = col_double(),
time_offset = col_character(),
deviceuuid = col_character(),
pkg_name = col_character(),
end_time = col_datetime(format = ""),
datauuid = col_character(),
x = col_logical()
)
But if I subset the tibble that information is lost:
> spec(subset(steps, select = c(1, 5, 10, 11, 12, 17)))
NULL
Why? And how do I keep it?
I'm not sure why this behavior occurs, but this is intentional and explicitly defined in the readr
changelog - if you look at 1.3, it states (emphasis mine):
"readr 1.3.0 returns results with a
spec_tbl_df
subclass. This differs from a regular tibble only that thespec
attribute (which holds the column specification) is lost as soon as the object is subset (and a normaltbl_df
object is returned)."
You can add the spec
attributes back in using attr(subseted_df, "spec") <- attr(original_df, "spec")
- for instance:
Data
x <- readr::read_csv(readr::readr_example("mtcars.csv"))
readr::spec(x) # works normal
y <- x[x$mpg < 20,]
readr::spec(y)
# NULL
# add in specs
attr(y, "spec") <- attr(x, "spec")
readr::spec(y)
cols(
mpg = col_double(),
cyl = col_double(),
disp = col_double(),
hp = col_double(),
drat = col_double(),
wt = col_double(),
qsec = col_double(),
vs = col_double(),
am = col_double(),
gear = col_double(),
carb = col_double()
)