How can I read the column specification col_types
of the readr::read_delim
function from a file?
Instead of
> read_csv(file = I('varInt,varChar,varFac\n
+ 1,a,A1\n
+ 2,b,A2\n
+ 3,c,A3'),
+ col_types = cols(varInt = 'i',
+ varChar = 'c',
+ varFac = col_factor(levels = c('A1', 'A2', 'A3'))))
# A tibble: 3 × 3
varInt varChar varFac
<int> <chr> <fct>
1 1 a A1
2 2 b A2
3 3 c A3
I want to do something like
mySpecFile <- read_csv(file = I("Variable,Spec\n
varInt,i\n
varChar,c\n
varFac,col_factor(levels = c('A1'; 'A2'; 'A3'))"))
mySpec <- mySpecFile |> pull(Spec, Variable) |> as.list()
read_csv(file = I('varInt,varChar,varFac\n
1,a,A1\n
2,b,A2\n
3,c,A3'),
col_types = mySpec)
But this throws: Error: Unknown shortcut: col_factor(levels = c('A1'; 'A2'; 'A3'))
So, specifying levels of factors does not work for me.
Seems to be related: R readr col_types specified in a metadata file, specifically using custom date formats
However, the readr::read_delim
documentation says
One of NULL, a cols() specification, or a string. See vignette("readr") for more details.
If NULL, all column types will be inferred from guess_max rows of the input, interspersed throughout the file. This is convenient (and fast), but not robust. If the guessed types are wrong, you'll need to increase guess_max or supply the correct types yourself.
Column specifications created by list() or cols() must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only().
Alternatively, you can use a compact string representation where each character represents one column:
A few things:
The varFac
spec is a string containing col_factor
, not a call or expression (or the results of it). We can possibly evaluate it.
Your varFac,col_factor(levels = c('A1'; 'A2'; 'A3'))
doesn't have a valid R expression, we need to replace ;
with ,
; this likely means the spec CSV needs to be ;
-delimited (or something other than ,
)
library(readr)
mySpecFile <- read_csv2(file = I("Variable;Spec\n
varInt;i\n
varChar;c\n
varFac;col_factor(levels = c('A1', 'A2', 'A3'))"))
# ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
# Rows: 3 Columns: 2
# ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
# Delimiter: ";"
# chr (2): Variable, Spec
# ℹ Use `spec()` to retrieve the full column specification for this data.
# ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mySpec <- mySpecFile |>
pull(Spec, Variable) |>
as.list() |>
lapply(function(z) if (nchar(z) > 1) tryCatch(eval(parse(text = z)), error = function(e) z) else z)
read_csv(file = I('varInt,varChar,varFac\n
1,a,A1\n
2,b,A2\n
3,c,A3'),
col_types = mySpec)
# # A tibble: 3 × 3
# varInt varChar varFac
# <int> <chr> <fct>
# 1 1 a A1
# 2 2 b A2
# 3 3 c A3
The if (nchar(z) > 1)
is to guard against "c"
(for c
haracter) becoming an R function (and possibly other things). If you want more specificity, change that conditional to something else.
The tryCatch(.., error = function(e) z)
ensures that if it is not an expression, it returns the original string.
As an alternative to using ;
-delimited text, we can quote them (or just the one string) to protect the embedded commas we need.
mySpecFile <- read_csv(file = I("Variable,Spec\n
varInt,i\n
varChar,c\n
varFac,\"col_factor(levels = c('A1', 'A2', 'A3'))\""))