I have a data frame (451 obs of 8 variables) that has two columns (6&7) that look like this:
Major Minor
C:726 T:2
A:687 G:41
T:3 C:725
I want to create one column that summarises this. To do this, I don't care about the letters in each cell, but I want the larger number to remain, whatever row it's in. i.e. I want it to look like this:
Summary_column
726
687
725
Not necessary, but for those that wonder what Im doing, this is the output from a programme called VCFtools; it has a count function that counts alleles in a VCF, but sometimes it names the allele as "Minor" when it is clearly more common.
Thanks for your help!
I would do something like this :
extract <- function(v) {
gsub("^.*:", "", v)
}
within(d, Summary_column <- pmax(extract(Major), extract(Minor)))
Which gives :
Major Minor Summary_column
1 C:726 T:2 726
2 A:687 G:41 687
3 T:3 C:725 725