rvcftools

Choose higher values from two columns after extracting the number, R


I have a data frame (451 obs of 8 variables) that has two columns (6&7) that look like this:

  Major      Minor
  C:726      T:2
  A:687      G:41
  T:3        C:725

I want to create one column that summarises this. To do this, I don't care about the letters in each cell, but I want the larger number to remain, whatever row it's in. i.e. I want it to look like this:

  Summary_column
  726
  687
  725

Not necessary, but for those that wonder what Im doing, this is the output from a programme called VCFtools; it has a count function that counts alleles in a VCF, but sometimes it names the allele as "Minor" when it is clearly more common.

Thanks for your help!


Solution

  • I would do something like this :

    extract <- function(v) {
      gsub("^.*:", "", v)
    }
    within(d, Summary_column <- pmax(extract(Major), extract(Minor)))
    

    Which gives :

      Major Minor Summary_column
    1 C:726   T:2            726
    2 A:687  G:41            687
    3   T:3 C:725            725