I have a data frame with 77 rows and 460 columns. The first column represents the rsID for each row titled "RS_number". The columns are each labeled by a SNP rsID (e.g., rs4751).
I need to limit this data frame to the dimensions 76 X 76, reflecting the same column names as the variable "RS_number". My first thought is to make the row.names equal to the first column "RS_number", but I am not sure how to move forward with this AND how to limit the columns to the same identifiers as the rownames.
Below is the code I used to create the data frame and a sample of the data frame:
'''newdf = concatenated[concatenated$RS_number %in% colnames(hours)[3:76],] %>%
as.data.frame()'''
RS_number rs1 rs2 rs3 rs4 rs10
[,1] rs1 1.0 0.2 0.3 0.4 NA
[,2] rs2 0.0 1.0 0.0 NA 0.2
[,3] rs3 0.2 0.1 1.0 NA NA
[,4] rs4 0.0 0.1 0.5 1.0 NA
[,5] rs5 NA 0.1 NA 0.2 NA
[,6] rs9 0.5 0.4 0.1 0.0 0.6
I would like my data frame to only keep those columns and RS_numbers that are common including rs1, rs2, rs3, rs4, and exclude rs5, and rs10.
We can use intersect
with the 'Rs_number' column values and the column names of 'df1', then, concatenate the 'Rs_number' with the intersected elements to select the columns
nm1 <- intersect(df1$Rs_number, names(df1))
df1[c("Rs_number", nm1)]