I have 2 data frames. In my dataset, a part of column name with leading zeros or trailing zeros should be ignored when matching columns, such as 009 = 9
or 22.0 == 22.
And I want to subset the intersection of column names of two data frames.
As an example:
df1
No 009 237 038.1 22.0 010
1 2 3 6 3 3
3 1 7 6 5 5
7 5 NA 9 0 6
df2
No 9 237 38.1 010 070 33.5
1 2 3 6 3 2 1
3 1 7 6 5 1 2
7 5 NA 9 0 9 6
The result should be like this:
result.df1
No 009 237 038.1 010
1 2 3 6 3
3 1 7 6 5
7 5 NA 9 6
result.df2
No 9 237 38.1 010
1 2 3 6 3
3 1 7 6 5
7 5 NA 9 0
How can I do it?
We can use trimws()
to remove leading/trailing zero and dot (with the argument whitespace = "[\\.0]"
), then use %in%
to compare the two colnames
and use that to index ([]
) the original dataframe.
Save the trimws()
results to new variables (clean_col_df1
and clean_col_df2
) to improve readability of codes.
clean_col_df1 <- trimws(colnames(df1), whitespace = "[\\.0]")
clean_col_df2 <- trimws(colnames(df2), whitespace = "[\\.0]")
df1[, clean_col_df1 %in% clean_col_df2]
No 009 237 038.1 010
1 1 2 3 6 3
2 3 1 7 6 5
3 7 5 NA 9 6
df2[, clean_col_df2 %in% clean_col_df1]
No 9 237 38.1 010
1 1 2 3 6 3
2 3 1 7 6 5
3 7 5 NA 9 0