rdownloadrnoaa

Multi step loop to acquire weather data over years and stations


I have a process to create a df for a single weather station over a singular month period. However, I have about 25 stations that I would like to aquire precipitation data for over a 5 year period.

I have the station ids in a df, that looks like the table below (but with 23 more stations.

stationid           County
GHCND:USW00093721   ANNEARUNDEL
GHCND:USC00182308   BALTIMORE

The weather dataset is aquired through the following code

library("rnoaa")
ANNEARUNDEL_2006 <- ncdc(datasetid='GHCND', stationid = "GHCND:USC00182060", datatypeid='PRCP', startdate = '2006-07-01', enddate = '2006-08-01', limit=400, token =  "API KEY") 

ANNEARUNDEL_2006 <- ANNEARUNDEL_2006$data

I familiar with very basic for loops that work for one process. Is there a way to set this up the loop would create a new df using the county name and year over the span of 2006 to 2011 for all 25 stations? Is a loop the best way to accomplish this?


Solution

  • I like loops for things like this because they are easier to read and write. You could do it like this with two loops:

    my_df <- read.table(text = "stationid   County
    GHCND:USW00093721   ANNEARUNDEL
    GHCND:USC00182308   BALTIMORE",
                        header = T)
    
    library(rnoaa)
    
    results <- list() # list as storage variable for the loop results
    i <- 1 # indexing variable
    
    for(sid in unique(my_df$stationid)) { # each station in your stationid dataframe
        for(year in 2006:2011) { # each year you care about
            data <- ncdc(datasetid='GHCND', stationid = sid,
                         datatypeid='PRCP', startdate = paste0(year, '-01-01'),
                         enddate = paste0(year, '-12-31'), limit=400, token = "API KEY")$data # subset the returned list right away here with $data
    
            # add info from each loop iteration
            data$county <- my_df[my_df$stationid == sid,]$County
            data$year <- year
    
    results[[i]] <- data # store it
    i <- i + 1 # rinse and repeat
        }
    }
    one_big_df <- do.call(rbind, results) # stack all of the data frames together rowwise
    

    Of course, you could always adjust a for loop to using lapply or it's friends. If speed became an issue you might want to consider it.