I'm reading data from an Excel file into a data.table in R. The file is formatted like so:
COL_1_STUFF COL_2_STUFF COL_3_STUFF
ID EST MOE PCT EST MOE PCT EST MOE PCT
That is, for each variable (the COL
s), there is an estimate, a margin of error, and a percentage given.
The trouble is being created by read.xlsx2
, which I'm using to import the file like so:
data <- as.data.table(read.xlsx2(
"file.xlsx", sheetIndex = 1L, colIndex = c(1L, 4L, 7L), startRow = 2L))
The problem is that read.xlsx2
assigns the same column name to a bunch of stuff--the import looks something like:
ID EST EST EST
even if I set header = FALSE
, I'm apt to get something like
X1 X2 X2 X2
To circumvent this, I've done the following subsequent to import:
data[ , c("col1_est", "EST") := .(EST, NULL)]
data[ , c("col2_est", "EST") := .(EST, NULL)]
data[ , c("col3_est", "EST") := .(EST, NULL)]
This strikes me as an odd way to deal with the problem; can anyone suggest an alternate approach to this?
This can be done with the check.names
argument to fread
, data.table
, and setDT
:
DT <- read.xlsx2(
"file.xlsx", sheetIndex = 1L, colIndex = c(1L, 4L, 7L), startRow = 2L)
)
setDT(DT, check.names = TRUE)
Automatically adds .1
, .2
, etc. to duplicated column names.