I have a monthly (Jan - Dec) data set for weather and crop yield. This data is collected for multiple years (2002 - 2019). My aim is to obtain bootstrapped slope coefficient of the affect of temperature in each month on yield gap. In bootstrapping, I want to block the year information in a way that the function should randomly sample data from a specific year in each bootstrap rather than choosing rows from mixed years.
I read some blogs and tried different methods but I am not confident about those. I tried to disect the bootstrapped splits to ensure if I am doing it correctly but I was not.
Here is the starting code:
# Load libraries
library(readxl)
library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
library(reprex)
# data
ww_wt <- read_csv("https://raw.githubusercontent.com/MohsinRamay/yieldgap/main/ww_wt.csv")
#> New names:
#> * `` -> ...1
#> Rows: 1924 Columns: 20
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (3): ID, Location, Month
#> dbl (16): ...1, Year, Latitude, Longitude, YieldTrt, YieldUntrt, Mildew, Ye...
#> date (1): Date
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
ww_wt %>%
select(Year, Month, gap, temp) %>%
head()
#> # A tibble: 6 x 4
#> Year Month gap temp
#> <dbl> <chr> <dbl> <dbl>
#> 1 2002 September 0.282 13.6
#> 2 2002 October 0.282 13.3
#> 3 2002 November 0.282 7.07
#> 4 2002 December 0.282 3.44
#> 5 2002 January 0.282 5.61
#> 6 2002 February 0.282 6.93
# Bootstrapping
set.seed(123)
boots <- ww_wt %>%
ungroup() %>%
select(Year, Month, gap, temp) %>%
nest(data = -c(Month)) %>%
mutate(boots = map(data, ~bootstraps(.x, times = 100, apparent = FALSE))) %>%
unnest(boots) %>%
mutate(model = map(splits, ~lm(gap ~ temp, data = analysis(.))),
coefs = map(model, tidy))
Created on 2022-01-04 by the reprex package (v2.0.1)
I am nesting Months
because I want to get slopes for each month separately. Also, the data for each Year has different sample size n
because of different number of locations each year.
Thanks for providing these hints Julia. I think I have figured this out by just adding some extra lines of code.
library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
library(viridis)
#> Loading required package: viridisLite
#>
#> Attaching package: 'viridis'
#> The following object is masked from 'package:scales':
#>
#> viridis_pal
ww_wt <- read_csv("https://raw.githubusercontent.com/MohsinRamay/yieldgap/main/ww_wt.csv")
#> New names:
#> * `` -> ...1
#> Rows: 1924 Columns: 20
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (3): ID, Location, Month
#> dbl (16): ...1, Year, Latitude, Longitude, YieldTrt, YieldUntrt, Mildew, Ye...
#> date (1): Date
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
set.seed(123)
# Block bootstrapping
boots <- ww_wt %>%
ungroup() %>%
select(Year, Month, gap, temp) %>%
pivot_wider(names_from = Month, values_from = temp, values_fn = mean) %>%
bootstraps(times = 10, apparent = FALSE) %>%
mutate(splits = map(splits, analysis)) %>%
unnest(splits) %>%
group_by(id) %>%
mutate(row = row_number()) %>%
pivot_longer(names_to = "Month", values_to = "temp", cols = September:August)
# Bootstraps
boots %>%
group_by(id) %>%
ggplot(aes(x = id, y = row, fill = Year)) +
geom_tile() +
scale_fill_viridis(option = "B", direction = 1) +
labs(x = NULL) +
facet_wrap(~Month) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
The above plots clearly indicates that, within each bootstrap, data from each Year
is being randomly sampled with replacement.
Created on 2022-01-08 by the reprex package (v2.0.1)