rdataframereadxl

How to avoid adding the pre-column name after the new column name in data frame imported from .xlsx?


I am new to R Studio. I found that when I copy a column from a data frame imported from xlsx file, the new column name includes a suffix of pre-column name. But if I create the data frame in R, the new column name will not include the suffix. This problem really confused me. And I want to know how to cancel this feature. Here is my test code:

library(readxl)
> df <- read_excel("test.xlsx", sheet = "Sheet1")                                                                                                                                         
> cName <- "Column2"
> df$Column3 <- df[,cName]
> df
# A tibble: 3 × 3
  Column1 Column2 Column3$Column2
    <dbl>   <dbl>           <dbl>
1       1       4               4
2       2       5               5
3       3       6               6

This code shows the suffix when using read_excel

> df <- data.frame( Column1 = c(1,2,3),
+                   Column2 = c(4,5,6))
> cName <- "Column2"
> df$Column3 <- df[,cName]
> df
  Column1 Column2 Column3
1       1       4       4
2       2       5       5
3       3       6       6

And this code shows no suffix when creating the df in R.

I want to know how to avoid the suffix when I import a data frame form .xlsx file.


Solution

  • This behavior isn't specific to the data being sourced from an Excel file, it's that that package happens to return a tibble instead of a data.frame. The two objects are largely the same, except they have different defaults for the drop argument when subsetting via [.

    df <- tibble( Column1 = c(1,2,3),
                      Column2 = c(4,5,6))
    cName <- "Column2"
    df$Column3 <- df[,cName] # Assigning a one column tibble
    df$Column3 <- df[,cName,drop = TRUE] # Assigning a vector
    

    Indeed, if you look at the structure in the first case you'll see that what's happened probably isn't at all what you wanted:

    > str(df)
    tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
     $ Column1: num [1:3] 1 2 3
     $ Column2: num [1:3] 4 5 6
     $ Column3: tibble [3 × 1] (S3: tbl_df/tbl/data.frame)
      ..$ Column2: num [1:3] 4 5 6
    

    In general, it is safer (both with tibbles & data frames) to select a single column as a vector via:

    df[[cName]]