I have a database containing the number of cells in different growth stages (enlarging, thickening and maturing) for different trees during many years. I collected data every certain days of the year (DOY; January 1 would be DOY 1, January 2 would be DOY 2, etc.). I simplificated it like this to make a reproducible example:
df <- data.frame("Year" = c(2012, 2012, 2012, 2012, 2012, 2012, 2012,
2012, 2012, 2012, 2013, 2013, 2013,
2013, 2013),
"Tree" = c(15, 15, 15, 15, 15, 22, 22, 22, 22, 22, 41, 41,
41, 41, 41),
"DOY" = c(65, 97, 125, 177, 214, 65, 97, 125, 177, 214,
61, 99, 118, 166, 221),
"Enlarging" = c(0, 2, 4, 5, 0, 0, 3, 6, 3, 0, 0, 5, 4, 4, 0),
"Thickening" = c(0, 0, 2, 4, 0, 0, 0, 4, 3, 0, 0, 0, 3, 2, 0),
"Maturing" = c(0, 0, 3, 7, 0, 0, 0, 3, 4, 0, 0, 3, 6, 8, 0))
df <- df %>%
mutate(Year = as.factor(Year),
Tree = as.factor(Tree),
DOY = as.numeric(DOY),
Enlarging = as.numeric(Enlarging),
Maturing = as.numeric(Maturing))
print(df)
Year Tree DOY Enlarging Thickening Maturing
1 2012 15 65 0 0 0
2 2012 15 97 2 0 0
3 2012 15 125 4 2 3
4 2012 15 177 5 4 7
5 2012 15 214 0 0 0
6 2012 22 65 0 0 0
7 2012 22 97 3 0 0
8 2012 22 125 6 4 3
9 2012 22 177 3 3 4
10 2012 22 214 0 0 0
11 2013 41 61 0 0 0
12 2013 41 99 5 0 3
13 2013 41 118 4 3 6
14 2013 41 166 4 2 8
15 2013 41 221 0 0 0
I have two questions. The simple one is that I wanted to know how can I turn this type of database into a presence(1)/absence(0) dataframe. If the number of cells it's 0, keep it 0. If the number of cells is >=1, turn it to 1. Simple as that.
Second bonus question is that I wanted to fit a logistic regression using this 0/1 dataframe, but as you can see, my samplings took place every 30 days or more. I would like to fit a daily logistic regression, something like creating a sequence seq(1,365,1)
of the 365 days of the year and predict daily values using this. This way I could predict daily values using the logistic regression and obtain which exact day did the growth of every stage start and end.
The second question could save me A LOT of time. I have tried different scripts and I always end up getting a different error. That's all I need,thank you so much, hope someone can help me.
To answer the first part, there are multiple ways to replace several column values with a 0/1. You could try:
df[,4:6] <- (df[,4:6] > 0)*1
or
df[2:3]<-lapply(df[2:3], function(x) +(x>0))
You can use these values as outcomes in logistic regression, but unsure what you are looking for in your description (ie, prediction vs parameter estimation? Generalized estimating equations? Something else (ie, time to event analysis)?). If you provide examples of what you may want, I can edit the answer to provide more help. Good luck!