I have two files that I am working with. Both have thousands of values, so some of the simpler/more traditional formatting methods with vector lists etc. don't work. The biggest issues I am running into are that the values I am working with are different lengths and that the closures are not subsettable. I am reluctant to pad the smaller file so that they are the same length, so I have been using setnames
rather than match
or another function.
The smaller file is the main one I am manipulating and the one that needs the column names to be replaced. It has over 6000 columns and is set up like so:
x A1 A2 A3 A4 A13 A14
sample1 928 29 0 298 8392 138
sample2 0 239 903 13 424 2
sample3 348 930 1938 23 233 492
sample7 843 349 90 239 0 239
sample8 234 349 30 39 8249 845
sample19 849 0 1235 14 149 982
Etc. Neither the rows nor columns are sequential, as some have been removed. (This is why the files are no longer the same size/length.)
The second file is made up of only two columns, and matches the names with string values. It has over 10,000 rows and is set up like so:
a x
A1 TIAHKGS
A2 ASJKHFSLA
A3 ASKJLHFAS
A4 JSHDKGFK
A5 ASHJFSKLHF
A6 ASLHFDASF
A7 ASHGFKJ
Etc. The string values are not all the same length, but the column is sequential. I am trying to replace the column names of the first file with the matching values from the second column of the second file.
The two methods that were closest to solving the problem were the match
and setnames
function. I was worried that match wouldn't be able to properly handle skipping values, so my latest attempt used setnames.
This seemed? to fix the problem with different length files, but now I am receiving Error in data.frame$FileKey : #object of type 'closure' is not subsettable
Also, I am using vroom
instead of readr
because it is faster with large files.
library(vroom)
library(data.table)
MainFile <- vroom("File1.csv")
FileKey <- vroom("File2.csv")
old <- colnames(MainFile)
new <- FileKey[,c(2)]
setnames(MainFile, data.frame$old, data.frame$new, skip_absent=TRUE)
Error in data.frame$FileKey :
#object of type 'closure' is not subsettable
names(MainFile)
Any advice is appreciated. I'm not sure if there is a way to melt or bind the files together that avoids both problems?
Thank you!
You can use match
provided you handle the NA
values correctly. Here is a base R version using match
.
vals <- FileKey$x[match(names(MainFile), FileKey$a)]
names(MainFile)[!is.na(vals)] <- na.omit(vals)
MainFile
x TIAHKGS ASJKHFSLA ASKJLHFAS JSHDKGFK A13 A14
1 sample1 928 29 0 298 8392 138
2 sample2 0 239 903 13 424 2
3 sample3 348 930 1938 23 233 492
4 sample7 843 349 90 239 0 239
5 sample8 234 349 30 39 8249 845
6 sample19 849 0 1235 14 149 982
data
MainFile <- structure(list(x = c("sample1", "sample2", "sample3", "sample7",
"sample8", "sample19"), A1 = c(928, 0, 348, 843, 234, 849), A2 = c(29,
239, 930, 349, 349, 0), A3 = c(0, 903, 1938, 90, 30, 1235), A4 = c(298,
13, 23, 239, 39, 14), A13 = c(8392, 424, 233, 0, 8249, 149),
A14 = c(138, 2, 492, 239, 845, 982)), class = "data.frame", row.names = c(NA,
-6L))
FileKey <- structure(list(a = c("A1", "A2", "A3", "A4", "A5", "A6", "A7"
), x = c("TIAHKGS", "ASJKHFSLA", "ASKJLHFAS", "JSHDKGFK", "ASHJFSKLHF",
"ASLHFDASF", "ASHGFKJ")), class = "data.frame", row.names = c(NA,-7L))