I have a data frame that has a column containing the chromosome details (1 to 22). I would like to create another column with only Chr numbers
Using stringr
package and regex
you may achieve what you are searching for but you need to know all possibilities. Maybe if there is only underscore between what you want and annoying information, you can solve your problem using str_split
and "_" as pattern parameter.
library(stringr)
df <- data.frame(chromosome = c("chr6_GL000253v2_alt", "chr6_GL000254v2_alt",
"chr6_GL000255v2_alt", "chr6_GL000256v2_alt", "chr4", "chr11",
"chr8", "chr12", "chr2", "chr12", "chr4", "chr6", "chr15", "chr4",
"chr2"))
df$chromosome_fixed=str_split(df$chromosome,"_",simplify = T)[,1]