rdplyr

Create new column using group_by and adding number to value


I have been searching for a solution to my problem and did find this solution but it uses letters and I need to use numbers:

I have a dataframe with genes that are duplicated in rows:

gene
PCDH9
PCDH9
PCDH9
PCDH9
CN0T6L
CN0T6l
CN0T6l
MRPL1
FRAS1
FRAS1

I want to create a new column that adds a number to the gene. I have tried a couple of different things but am getting errors each time.

mygenes %>% group_by(gene) %>% 
    mutate(isoform_id = paste(gene, NUMBERS[row_number()], sep = "_"))

mygenes %>% group_by(gene) %>% mutate(isoform_id = consecutive_id(gene))

What I want is a new dataframe with '_N' added to each gene in a group where N is the consecutive number for the gene in the group. My desired dataframe would look like this:

gene    isoform_id
PCDH9       PCDH9_1
PCDH9       PCDH9_2
PCDH9       PCDH9_3
PCDH9       PCDH9_4
CN0T6L      CN0T6L_1 
CN0T6l      CN0T6l_2
CN0T6l      CN0T6l_3
MRPL1       MRPL1_1
FRAS1       FRAS1_1
FRAS1       FRAS1_2

How can I get my code working correctly?


Solution

  • row_number() already gives you a number.

    mygenes %>% 
        group_by(gene) %>% 
        mutate(
            isoform_id = paste(gene, row_number(), sep = "_")
        )