rlistdataframe

Finding Area Under the Curve (AUC) in R by Trapezoidal Rule


I have a below mentioned Sample List containing Data Frames (Each in has ...ID,yobs,x(independent variable)).And I want to find AUC(Trapezoidal rule)for each case(ID).., So that my output(master data frame) looks like following (have shown at last)

Can anybody suggest the efficient way of finding this (I have a high number of rows for each ID's)

Thank you

#Some Make up code for only one data frame
Y1=c(0,2,5,7,9)
Y2=c(0,1,3,8,11)
Y3=c(0,4,8,9,12,14,18) 
t1=c(0:4)
t2=c(0:4)
t3=c(0:6) 

a1=data.frame(ID=1,y=Y1,x=t1) 
a2=data.frame(ID=2,y=Y2,x=t2) 
a3=data.frame(ID=3,y=Y3,x=t3) 
data=rbind(a1,a2,a3) 

#dataA(Just to show)
   ID  obs time 
1   1  0   0 
2   1  2   1 
3   1  5   2 
4   1  7   3 
5   1  9   4 
6   2  0   0 
7   2  1   1 
8   2  3   2 
9   2  8   3 
10  2 11   4 
11  3  0   0 
12  3  4   1 
13  3  8   2 
14  3  9   3 
15  3 12   4 
16  3 14   5 
17  3 18   6 

 #dataB(Just to show)
   ID  obs time 
1   1  0   0 
2   1  2   1 
3   1  5   2 
4   1  7   3 
5   1  9   4 
6   2  0   0 
7   2  1   1 
8   2  3   2 

  #dataC(Just to show)
   ID  obs time 
1   1  0   0 
2   1  2   1 
3   1  5   2 
4   1  7   3 
5   1  9   4 
6   2  0   0 
7   2  1   1 
8   2  3   2 

##Desired output

      ID  AUC
dataA  1   XX
dataA  2   XX
dataA  3   XX
dataB  1   XX
dataB  2   XX
dataC  1   XX
dataC  2   XX

Solution

  • I'm guessing something like this would work

    calcauc<-function(data) {
        psum<-function(x) rowSums(embed(x,2))
        stack(lapply(split(data, data$ID), function(z) 
            with(z, sum(psum(y) * diff(x)/ 2)))
        )
    }
    calcauc(data)
    
    #   values ind
    # 1   18.5   1
    # 2   17.5   2
    # 3   56.0   3
    

    Of course normally x and y values are between 0 and 1 for ROC curves which is why we seem to have such large "AUC" values but really this is just the area of the polygon underneath the line defined by the points in the data set.

    The psum function is just a helper function to calculate pair-wise sums (useful in the formula for the area of trapezoid).

    Basically we use split() to look at one ID at a time, then we calculate the area for each ID, then we use stack() to bring everything back into one data.frame.