I have a larger data set which when I import to R
has column names with ,
and different types of special characters.
When I use this data set as copy in another variable or as subset to another smaller data or performing different kind of data transformation with data and column name coming from same larger data set, then all the special characters in column name change to .
.
Is there a way to preserve special characters in column name? I don't want R
to change anything with respect to column names.
Please suggest solutions.
Example
> library(MASS)
> data(cats)
> cats <- cats[1:10,]
> cats
Sex Bwt Hwt
1 F 2.0 7.0
2 F 2.0 7.4
3 F 2.0 9.5
4 F 2.1 7.2
5 F 2.1 7.3
6 F 2.1 7.6
7 F 2.1 8.1
8 F 2.1 8.2
9 F 2.1 8.3
10 F 2.1 8.5
> colnames(cats) <- c("A:17272,,1,MPR.rtn_rslt", "B:17272,,1,MPR.rtn_rslt", "C:17272,,1,MPR.rtn_rslt")
> cats
A:17272,,1,MPR.rtn_rslt B:17272,,1,MPR.rtn_rslt C:17272,,1,MPR.rtn_rslt
1 F 2.0 7.0
2 F 2.0 7.4
3 F 2.0 9.5
4 F 2.1 7.2
5 F 2.1 7.3
6 F 2.1 7.6
7 F 2.1 8.1
8 F 2.1 8.2
9 F 2.1 8.3
10 F 2.1 8.5
cats
data set has column names with special character ,
and :
. Below, I am performing data transformation.
> # Define the avector-subselection method
> as.data.frame.avector <- as.data.frame.vector
> `[.avector` <- function(x,i,...) {
+ r <- NextMethod("[")
+ mostattributes(r) <- attributes(x)
+ r
+ }
> # Preserve attributes as they are lost due to subet
> test <- data.frame(
+ lapply(cats, function(x) {
+ structure( x, class = c("avector", class(x) ) )
+ } )
+ )
> test
A.17272..1.MPR.rtn_rslt B.17272..1.MPR.rtn_rslt C.17272..1.MPR.rtn_rslt
1 F 2.0 7.0
2 F 2.0 7.4
3 F 2.0 9.5
4 F 2.1 7.2
5 F 2.1 7.3
6 F 2.1 7.6
7 F 2.1 8.1
8 F 2.1 8.2
9 F 2.1 8.3
10 F 2.1 8.5
Above transformation leads to new data test
coming from cats
, changing all the special characters like :
and ,
to .
.
Try:
test <- data.frame(lapply(cats, function(x) {
structure(x, class = c("avector", class(x)))
}), check.names = FALSE)
You will have to refer to names using quotes or in some cases back-ticks with the format, but it may be preferable to renaming the entire data frame.