It sounds like it has been asked multiple times, but I could not get any of the previous solutions to work. I have the following problem:
I have a big dataframe with bone measurements and other information. One column (HREPP) contains the name of the region the bone is from. Now I would like to create a new dataframe for each region to calculate means, deviations and more within this smaller table. (I know that is also possible to calculate from the full table, but it would require even more programming skills.)
I created a a sorted list of unique values for the regions using
unique_hrepp <- unique(ni[3])
because the column "HREPP" for the region is the third column from the df "ni". Then I ordered it using:
unique_hrepp <- unique_hrepp[order(unique_hrepp$HREPP, decreasing = FALSE), ]
All this worked well and now I want to filter the big table. The easiest thing is to do this:
hrepp_1 <- filter(fulltable, HREPP == unique_hrepp[1])
hrepp_2 <- filter(fulltable, HREPP == unique_hrepp[2])
hrepp_3 <- filter(fulltable, HREPP == unique_hrepp[3])
But I have some 50 regions and do not want to repeat this over and over again. In addition I would like to know how it is done properly.
I came to
lapply(unique_hrepp, function(x) filter(fulltable, HREPP == "unique_hrepp"))
which does almost the right thing, but all information seems to be gone as there is no content in the cells and I cannot get dataframes as output.
As Sotos suggested:
You can first split the big data.frame
in a list of data.frame
using the split
function in R. Just make sure the HREPP column is of class factor
. You do not have to order it.
ldf = split(x = df, f = df$HREPP)
ldf
is a list of data.frames where each data.frame will contain observations of a unique value of region in the column HREPP
.
Now you can use lapply
to calculate mean for each region separately in each of the data.frames present in the list:
ldf = lapply(x= ldf,FUN= function(t){t$mean_density = mean(t$density,na.rm=TRUE);t})
ldf = lapply(x= ldf,FUN= function(t){t$mean_weight = mean(t$weight,na.rm=TRUE);t})
Then you can combine the list back to data.frame using rbindlist
from data.table
package :
df = rbindlist(l = ldf,use.names=TRUE)
df = as.data.frame(df)