I have a dataset containing a column of dates (daily values summed by week) and a column of values representing bird mortality. I need to find out the dates the majority of mortality fall into. The values are not normally distributed with a major peak at the end of July due to increased surveying effort. Some weeks had zero observed mortalities.
Date Observed | Value |
---|---|
2022-05-08 | 4 |
2022-05-22 | 1 |
2022-05-29 | 5 |
2022-06-05 | 1 |
2022-06-12 | 10 |
2022-06-19 | 64 |
2022-06-26 | 4 |
2022-07-10 | 800 |
2022-07-17 | 300 |
2022-07-24 | 207 |
2022-07-31 | 3002 |
2022-08-07 | 2 |
I'm not new to R but I am evidently rusty on my statistical analysis. Could anyone point me in the right direction?
I have tried simple things: estimating median of Date Observed (using summary
and summaryStats(DateObserved, quartiles=TRUE)[[7]]
from EnvStats library. But after plotting the values over time, I do not think the resulting quartiles look right. Also, tried fitting a linear model.
Thank you sincerely in advance for any leads!
library(tidyverse)
df_1 <- read_tsv(file="DateObserved Value
2022-05-08 4
2022-05-22 1
2022-05-29 5
2022-06-05 1
2022-06-12 10
2022-06-19 64
2022-06-26 4
2022-07-10 800
2022-07-17 300
2022-07-24 207
2022-07-31 3002
2022-08-07 2") |> mutate(dt=as.numeric(DateObserved))
dt_seq <- seq(from=min(df_1$dt),
to = max(df_1$dt))
dt_seq_pretty_indx <- seq(from=min(df_1$dt),
to = max(df_1$dt),
length.out=6)
dt_seq_pretty <- seq(from=min(df_1$DateObserved),
to = max(df_1$DateObserved),
length.out=6)
plot(df_1$dt,
df_1$Value, xaxt = "n")
axis(side=1,
at = dt_seq_pretty_indx,
labels=dt_seq_pretty)
myloess <- loess(Value ~ dt,df_1,span = 0.4)
pred <- predict(myloess,newdata = data.frame(dt=dt_seq))
lines(x=dt_seq,
y=pred)
library(Hmisc)
wq <- wtd.quantile(x=dt_seq,
weights = pred,
probs = c(.25,.75))
abline(v=wq[1],col="blue")
abline(v=wq[2],col="blue")
print(as.Date(wq,origin="1970-01-01"))