rdataframeloops

How to iteratively generate new dataframes by column value in R?


I am trying to iterate a section of code based off values in a column of a dataframe of all my data. An example of the layout of the dataframe is below:

HumanName log2FoldChange pvalue gene
Rob -3e00 4e-06 GeneA
Carol -2e04 2e-09 GeneA
Pamela 4e-06 5e-04 GeneA
Rob 4e-02 1e-10 GeneB
Carol -1e-10 4e-06 GeneB
Pamela -8e03 3e-09 GeneB

I now want to run a section of code that produces separate dataframes based off of values in the "gene" column. So I'd ultimately end up with MvWup for GeneA and a separate MvWUp for GeneB.

GeneSetUp <- subset(FullSet0.05, log2FoldChange>0)
GeneSetDown <- subset(FullSet0.05, log2FoldChange<0)
GeneSetUpLF1 <- subset(GeneSetUp, log2FoldChange > 1)
GeneSetDownLF1 <- subset(GeneSetDown, log2FoldChange< -1)
GeneSetUpLF1G <- subset(GeneSetUpLF1, select = -c(pvalue, log2FoldChange))
GeneSetDownLF1G <- subset(GeneSetDownLF1, select = -c(pvalue, log2FoldChange))

MvWup<-as.vector(unlist(GeneSetUpLF1G))
MvWdown<-as.vector(unlist(GeneSetDownLF1G))

I have gone back and forth between a for loop and a map approach but am struggling with both. I've created a function for the above code but can't seem to apply it correctly. Any guidance would be very appreciated.


Solution

  • # Sample dataframe
    df <- data.frame(
      HumanName = c("Rob", "Carol", "Pamela", "Rob", "Carol", "Pamela"),
      log2FoldChange = c(-3e00, -2e04, 4e-06, 4e-02, -1e-10, -8e03),
      pvalue = c(4e-06, 2e-09, 5e-04, 1e-10, 4e-06, 3e-09),
      gene = c("GeneA", "GeneA", "GeneA", "GeneB", "GeneB", "GeneB")
    )
    
    # Split the dataframe by 'gene'
    gene_list <- split(df, df$gene)
    
    # Initialize empty list to store results
    results <- list()
    
    # Iterate through each gene-specific dataframe
    for (gene_name in names(gene_list)) {
      gene_data <- gene_list[[gene_name]]
      
      # Filtering logic
      GeneSetUp <- subset(gene_data, log2FoldChange > 0)
      GeneSetDown <- subset(gene_data, log2FoldChange < 0)
      
      GeneSetUpLF1 <- subset(GeneSetUp, log2FoldChange > 1)
      GeneSetDownLF1 <- subset(GeneSetDown, log2FoldChange < -1)
      
      GeneSetUpLF1G <- subset(GeneSetUpLF1, select = -c(pvalue, log2FoldChange))
      GeneSetDownLF1G <- subset(GeneSetDownLF1, select = -c(pvalue, log2FoldChange))
      
      # Store filtered dataframes in a list with dynamic names
      results[[paste0("MvWup_", gene_name)]] <- GeneSetUpLF1G
      results[[paste0("MvWdown_", gene_name)]] <- GeneSetDownLF1G
    }
    
    # Example: Access result for GeneA
    results$MvWup_GeneA
    results$MvWdown_GeneA