I need help with that because I do not know how to handle this. I have 2 dataframes, they look like this:
(df1) DataGenSample: each column is a sample and the first one is the gene
(df2) Subtypes: a df of 2 columns, the 1st col is the sample and the 2nd col is a subtype of cancer
The first thing what i'm looking for is to select only the matching samples of DataGenSample from subtypes, and then separate them with its subtype.
The data files can be found here
Any help is more than welcome! because I'm lost.
DataGenSample <- read.table("DataGenSample.txt",sep="\t", header=TRUE, check.names = FALSE)
Subtypes <- read.table("SamplesType.txt",sep="\t", header=TRUE, check.names = FALSE)
A little example: df1:
hugo_symbol TCGA-3C-AAAU-01 TCGA-3C-AALI-01 TCGA-3C-AALJ-01 ... TCGA-3C-AALL-99
CDK11A 0 -1 -1 ... -1
HNRNPR 0 -1 -1 ... -1
SRSF10 0 -1 -1 ... -1
df2:
Sample_id Subtype
TCGA-3C-AAAU-01 BRCA_LumA
TCGA-3C-AALI-01 BRCA_Her2
TCGA-3C-AALL-99 BRCA_Normal
Output Expected:
-BRCA_LumA.df:
hugo_symbol TCGA-3C-AAAU-01
CDK11A 0
HNRNPR 0
SRSF10 0
-BRCA_Her2.df:
hugo_symbol TCGA-3C-AALI-01
CDK11A -1
HNRNPR -1
SRSF10 -1
-BRCA_Normal.df:
hugo_symbol TCGA-3C-AALL-99
CDK11A -1
HNRNPR -1
SRSF10 -1
If I understand correctly you want to select a subset of columns from DataGenSample corresponsding to a certain subtype in the Subtypes. This can be achieved by pivoting the columns to rows using pivot_longer()
from tidyr package (name was gather()
in older versions). After the pivot you can join the two data frames on SAMPLE_ID.
You can now filter on subtype and the SAMPLE_IDs (now less in number) can be pivoted back to columns.
You can do this for all subtypes separately using a for loop, using assign()
to name to data frame according to the subtype in the filter.
library(dplyr)
library(tidyr)
DataGenSample_long <- DataGenSample %>%
pivot_longer(names_to = 'SAMPLE_ID', values_to = 'value', cols = -Hugo_Symbol)
DataGenSample_long_join <- DataGenSample_long %>%
left_join(Subtypes, by = 'SAMPLE_ID')
for (Subtype in unique(Subtypes$SUBTYPE)) {
assign(paste0(Subtype,'.df'),
DataGenSample_long_join %>%
filter(SUBTYPE == Subtype) %>%
select(-SUBTYPE) %>%
pivot_wider(names_from = SAMPLE_ID, values_from = value))
}