I'm trying to download NOAA data using the rnoaa package and I'm running into a bit of trouble.
I took a vector from a dataframe and it looks like this:
df <- dataframe$ghcnd
This gives me an output like:
[1] "GHCND:US1AKAB0058" "GHCND:US1AKAB0015" "GHCND:US1AKAB0021" "GHCND:US1AKAB0061"
[5] "GHCND:US1AKAB0055" "GHCND:US1AKAB0038" "GHCND:US1AKAB0051" "GHCND:US1AKAB0052"
[9] "GHCND:US1AKAB0060" "GHCND:US1AKAB0065" "GHCND:US1AKAB0062" "GHCND:US1AKFN0016"
[13] "GHCND:US1AKFN0018" "GHCND:US1AKFN0015" "GHCND:US1AKFN0011" "GHCND:US1AKFN0013"
[17] "GHCND:US1AKFN0030" "GHCND:US1AKJB0011" "GHCND:US1AKJB0014" "GHCND:US1AKKP0005"
[21] "GHCND:US1AKMS0011" "GHCND:US1AKMS0019" "GHCND:US1AKMS0012" "GHCND:US1AKMS0020"
[25] "GHCND:US1AKMS0018" "GHCND:US1AKMS0014" "GHCND:US1AKPW0001" "GHCND:US1AKSH0002"
[29] "GHCND:US1AKVC0006" "GHCND:US1AKWH0012" "GHCND:US1AKWP0001" "GHCND:US1AKWP0002"
[33] "GHCND:US1ALAT0014" "GHCND:US1ALAT0013" "GHCND:US1ALBW0095" "GHCND:US1ALBW0087"
[37] "GHCND:US1ALBW0020" "GHCND:US1ALBW0066" "GHCND:US1ALBW0031" "GHCND:US1ALBW0082"
[41] "GHCND:US1ALBW0099" "GHCND:US1ALBW0040" "GHCND:US1ALBW0004" "GHCND:US1ALBW0085"
[45] "GHCND:US1ALBW0009" "GHCND:US1ALBW0001" "GHCND:US1ALBW0094" "GHCND:US1ALBW0013"
[49] "GHCND:US1ALBW0079" "GHCND:US1ALBW0060"
In reality, I have about 22,000 weather stations. This is just showing the first 50.
library(rnoaa)
options("noaakey" = Sys.getenv("noaakey"))
Sys.getenv("noaakey")
weather <- ncdc(datasetid = 'GHCND', stationid = df, var = 'PRCP', startdate = "2020-05-30",
enddate = "2020-05-30", add_units = TRUE)
Which produces the following error:
Error: Request-URI Too Long (HTTP 414)
However, when I subset the df into just, say, the first 100 entries, I can't get data for more than the first 25. However, the package details say I should be able to run 10,000 queries a day.
df1 <- df[1:125] ## Splitting dataframe. Too big otherwise
for (i in 1:length(df1)){
weather2<-ncdc(datasetid = 'GHCND', stationid=df1[i],var='PRCP',startdate ='2020-06-30',enddate='2020-06-30',
add_units = TRUE)
}
But this just producing a dataframe of a single row, that row being the 125th weather station.
If anyone could give advise on what to try next that would be great :)
Also, cross linked: https://discuss.ropensci.org/t/rnoaa-getting-county-level-rain-data/2403
Figured it out, with a lot of help from @Dave2e and a bud on the ropensci link above.
df <- cleaned_emshr$ghcnd ## Grabbing necessary column
z <- split(df, ceiling(seq_along(df)/100))
out <- list()
for (i in seq_along(z)) {
out[[i]] <- ncdc(datasetid = 'GHCND', stationid = z[[i]], var = 'PRCP',
startdate = "2020-05-30", enddate = "2020-05-30",
add_units = TRUE, limit = 100)
}
weather <- bind_rows(lapply(out, "[[", "data"))