rdataframetapply

R function which.max with tapply


I am trying to make a data frame with the maximum over records by a factor. I would like a data frame with 4 rows (one for each G) with the max for X in that group and the corresponding Y value. I know I could write a loop but would rather not.

Data<-data.frame(X=rnorm(200), Y=rnorm(200), G=rep(c(1,2,3,4), each=50))
XMax<-tapply(Data$X, Data$G, function(x){max(x, na.rm=T)})
WhichXMax<-tapply(Data$X, Data$G, function(x){which.max(x)})

The which.max function returns the row number after the data has been subsetted by the tapply factor, where I really want the row number referencing the Data rows. So I could do something like;

YMax<-Data$Y[Which]
MaxData<-data.frame(XMax=XMax, YMax=YMax, G=levels(Data$G))

Solution

  • You can use by and reference the rownames of the row returned by which.max:

    Data[by(Data, Data$G, function(dat) rownames(dat)[which.max(dat$X)] ),]
    
    #           X          Y G
    #4   1.595281 -0.3309078 1
    #61  2.401618  0.9510128 2
    #147 2.087167  0.9160193 3
    #171 2.307978 -0.3887222 4
    

    (This assumes set.seed(1) for reproducibility's sake)