rapplyfilterfunction

How to use apply in a filter function


It sounds like it has been asked multiple times, but I could not get any of the previous solutions to work. I have the following problem:

I have a big dataframe with bone measurements and other information. One column (HREPP) contains the name of the region the bone is from. Now I would like to create a new dataframe for each region to calculate means, deviations and more within this smaller table. (I know that is also possible to calculate from the full table, but it would require even more programming skills.)

I created a a sorted list of unique values for the regions using

unique_hrepp <- unique(ni[3]) 

because the column "HREPP" for the region is the third column from the df "ni". Then I ordered it using:

unique_hrepp <- unique_hrepp[order(unique_hrepp$HREPP, decreasing = FALSE), ]

All this worked well and now I want to filter the big table. The easiest thing is to do this:

hrepp_1 <- filter(fulltable, HREPP == unique_hrepp[1])
hrepp_2 <- filter(fulltable, HREPP == unique_hrepp[2])
hrepp_3 <- filter(fulltable, HREPP == unique_hrepp[3])

But I have some 50 regions and do not want to repeat this over and over again. In addition I would like to know how it is done properly.

I came to

lapply(unique_hrepp, function(x) filter(fulltable, HREPP == "unique_hrepp"))

which does almost the right thing, but all information seems to be gone as there is no content in the cells and I cannot get dataframes as output.


Solution

  • As Sotos suggested:

    You can first split the big data.frame in a list of data.frame using the split function in R. Just make sure the HREPP column is of class factor. You do not have to order it.

    ldf = split(x = df, f = df$HREPP)
    

    ldf is a list of data.frames where each data.frame will contain observations of a unique value of region in the column HREPP.

    Now you can use lapply to calculate mean for each region separately in each of the data.frames present in the list:

    ldf = lapply(x= ldf,FUN= function(t){t$mean_density = mean(t$density,na.rm=TRUE);t})
    ldf = lapply(x= ldf,FUN= function(t){t$mean_weight = mean(t$weight,na.rm=TRUE);t})
    

    Then you can combine the list back to data.frame using rbindlist from data.table package :

    df = rbindlist(l = ldf,use.names=TRUE)
    df = as.data.frame(df)