I have a data.table, DT
containing one string variable, s
, from which I want to create additional variables based on the positions of each character in s
. The varname
, start
and end
positions and vartype
of each variable are given in a dictionary, dic
. Here is an example, including the desired output:
DT <- DT <- data.table(s=c('a191','b292','c393'))
dic <- data.table(varname=c('bla,ble,bli'),start=c(1,2,3),end=c(1,2,4),vartype=c('c','i','i')
DTdesired <- data.table(bla=c('a','b','c'),ble=c(1,2,3),bli=c(91,92,93))
This is the same problem as importing data from a fixed width file (fwf), which has been covered in this SO question. The answer to that post includes a data table solution
DT[ , lapply(seq_len(length(cols$beg)), function(ii) {
stri_sub(V1, cols$beg[ii], cols$end[ii])
})]
But I am looking for a solution that besides start
and end
would also use the varname
and coltype
variables of the dictionary
Edit: In reality, DT
would be a large Census file, around 200 million observations. Each element of s
is a string with 200 characters. And the dictionary, dic
, can contain about 10 to 50 variables to be extracted, depending on the year.
We can use Map
to do the conversion. (added the updated function created by @LucasMation)
library(magrittr)
library(data.table)
subtr_typeconvert <- function(x, y)
{
DT$s %>%
substr(x, y) %>%
type.convert(as.is = TRUE)
}
dic[, setNames(Map(function(x, y)
subtr_typeconvert(x,y), start, end), varname)]