I am new to R Studio. I found that when I copy a column from a data frame imported from xlsx file, the new column name includes a suffix of pre-column name. But if I create the data frame in R, the new column name will not include the suffix. This problem really confused me. And I want to know how to cancel this feature. Here is my test code:
library(readxl)
> df <- read_excel("test.xlsx", sheet = "Sheet1")
> cName <- "Column2"
> df$Column3 <- df[,cName]
> df
# A tibble: 3 × 3
Column1 Column2 Column3$Column2
<dbl> <dbl> <dbl>
1 1 4 4
2 2 5 5
3 3 6 6
This code shows the suffix when using read_excel
> df <- data.frame( Column1 = c(1,2,3),
+ Column2 = c(4,5,6))
> cName <- "Column2"
> df$Column3 <- df[,cName]
> df
Column1 Column2 Column3
1 1 4 4
2 2 5 5
3 3 6 6
And this code shows no suffix when creating the df in R.
I want to know how to avoid the suffix when I import a data frame form .xlsx file.
This behavior isn't specific to the data being sourced from an Excel file, it's that that package happens to return a tibble
instead of a data.frame
. The two objects are largely the same, except they have different defaults for the drop
argument when subsetting via [
.
df <- tibble( Column1 = c(1,2,3),
Column2 = c(4,5,6))
cName <- "Column2"
df$Column3 <- df[,cName] # Assigning a one column tibble
df$Column3 <- df[,cName,drop = TRUE] # Assigning a vector
Indeed, if you look at the structure in the first case you'll see that what's happened probably isn't at all what you wanted:
> str(df)
tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
$ Column1: num [1:3] 1 2 3
$ Column2: num [1:3] 4 5 6
$ Column3: tibble [3 × 1] (S3: tbl_df/tbl/data.frame)
..$ Column2: num [1:3] 4 5 6
In general, it is safer (both with tibbles & data frames) to select a single column as a vector via:
df[[cName]]