I have been searching for a solution to my problem and did find this solution but it uses letters and I need to use numbers:
I have a dataframe with genes that are duplicated in rows:
gene
PCDH9
PCDH9
PCDH9
PCDH9
CN0T6L
CN0T6l
CN0T6l
MRPL1
FRAS1
FRAS1
I want to create a new column that adds a number to the gene. I have tried a couple of different things but am getting errors each time.
mygenes %>% group_by(gene) %>%
mutate(isoform_id = paste(gene, NUMBERS[row_number()], sep = "_"))
mygenes %>% group_by(gene) %>% mutate(isoform_id = consecutive_id(gene))
What I want is a new dataframe with '_N' added to each gene in a group where N is the consecutive number for the gene in the group. My desired dataframe would look like this:
gene isoform_id
PCDH9 PCDH9_1
PCDH9 PCDH9_2
PCDH9 PCDH9_3
PCDH9 PCDH9_4
CN0T6L CN0T6L_1
CN0T6l CN0T6l_2
CN0T6l CN0T6l_3
MRPL1 MRPL1_1
FRAS1 FRAS1_1
FRAS1 FRAS1_2
How can I get my code working correctly?
row_number()
already gives you a number.
mygenes %>%
group_by(gene) %>%
mutate(
isoform_id = paste(gene, row_number(), sep = "_")
)