I have a data.table storing heterogeneous file names in a string column. I want to extract the extension from that column, always taking the characters after the last dot occurrence. Sometimes the filename may contain more dots.
I tried:
files0=data.table(filename=c("simple_file.csv","file with.two dots.xls"))
files0[,chunks:=length(tstrsplit(filename,"\\."))]
files0[,extension:=tstrsplit(filename,"\\.")[chunks]]
How do I make sure that tstrsplit
is only applied to each row so that this approach works?
PS: I also managed to generate a column with storing the correct number of text "chunks" with str_count
, but the problem remains that when I want to create the "extension" column, the entire "filename" column seems to be used for each row.
No need for sapply
or strsplit
, those will add unnecessary complexity and inefficiency. We can use tools::file_ext
(built-in to R) or just do a sub
ourselves.
dat[, tools::file_ext(filename)]
# [1] "csv" "xls"
dat[, sub(".*\\.", "", filename)]
# [1] "csv" "xls"