rfor-looplapplyrnoaa

How do I retain a site id


I have a data frame with columns id, latitude, longitude. I need to find near by meteorological stations and download data using RNOAA. The first step is to get station names with meteo_nearby_stations then download data with meteo_pull_monitors.

My question, is how do I retain the site id from df in the results from meteo_pull_monitors?

desired result can be seen here

library(rnoaa)
id<-c("07227500", "07308500", "07311700")
latitude<-c(35.47033,34.11009,  33.82064)
longitude<-c(101.87963,98.53172,-99.78648)
df<-data.frame(id,latitude,longitude)

met_test<-meteo_nearby_stations(df, lat_colname = "latitude",
      lon_colname = "longitude", station_data = ghcnd_stations(),
      var = c("TMAX","TMIN"), year_min = NULL, year_max = NULL, 
      radius = 200, limit = 3)
met_test_df<-do.call(rbind, lapply(met_test,as.data.frame))
met_id<-as.vector(met_test_df$id)
met_data<-meteo_pull_monitors(met_id, var = c("date","TMAX","TMIN"), date_min = "2020-01-01", date_max = "2020-06-01")

Solution

  • We can join the site_id data to the results of the meteo_nearby_stations() function by pulling the names of each element in the met_test list.

    library(rnoaa)
    id<-c("07227500", "07308500", "07311700")
    latitude<-c(35.47033,34.11009,  33.82064)
    longitude<-c(101.87963,98.53172,-99.78648)
    df<-data.frame(id,latitude,longitude)
    
    met_test<-meteo_nearby_stations(df, lat_colname = "latitude",
                                    lon_colname = "longitude", station_data = ghcnd_stations(),
                                    var = c("TMAX","TMIN"), year_min = NULL, year_max = NULL, 
                                    radius = 200, limit = 3)
    

    Fortunately, each element of met_list contains the name of the site_id associated with the meter_nearby_stations() request. We can access this information with the names() function.

    > names(met_test)
    [1] "07227500" "07308500" "07311700"
    > 
    

    To merge the site identifiers, we modify the do.call() function from the original post to include lapply() with an anonymous function that assigns the correct name from the list to a column we name site_id. Note that in order to loop through the list of data frames and access their names, we use a vector, 1:length(met_test) to drive the lapply() function, and include met_test as a second argument so we can use the index number x to access both the correct list element and its name.

    met_test_df<-do.call(rbind, lapply(1:length(met_test),function(x,y){
         data <- as.data.frame(y[[x]])
         # note individual data frames already have an ID variable
         data$site_id <- names(y)[x]
         data
    },met_test))
    met_test_df
    

    ...and the output:

    > met_test_df
               id             name latitude longitude   distance  site_id
    1 CHM00052955           GUINAN  35.5830  100.7500 102.990626 07227500
    2 CHM00056080            HEZUO  35.0000  102.9000 106.410602 07227500
    3 CHM00052957           TONGDE  35.2700  100.6500 113.695195 07227500
    4 CHM00056033            MADOI  34.9170   98.2170  94.243943 07308500
    5 CHM00056046           DARLAG  33.7500   99.6500 110.669503 07308500
    6 CHM00056029            YUSHU  33.0000   96.9670 190.415441 07308500
    7 USC00419163     TRUSCOTT 3 W  33.7569  -99.8617   9.927467 07311700
    8 USC00411995 COPPER BREAKS SP  34.1122  -99.7430  32.667020 07311700
    9 USC00417572        RHINELAND  33.5333  -99.6500  34.356103 07311700
    > 
    

    At this point we can extract the individual monitor data, and merge the site_id numbers by monitor id. First, we extract the monitor data.

    met_id<-as.vector(met_test_df$id)
    met_data<-meteo_pull_monitors(met_id, var = c("date","TMAX","TMIN"), date_min = "2020-01-01", date_max = "2020-06-01")
    

    Then, we merge the site identifier data.

    sites <- met_test_df[,c("id","site_id")]
    mergedData <- merge(met_data,sites)
    

    Finally, we print the first few rows of the result data frame.

    head(mergedData)
    
               id       date tmax tmin  site_id
    1 CHM00052955 2020-01-01   81 -193 07227500
    2 CHM00052955 2020-01-02   81 -163 07227500
    3 CHM00052955 2020-01-03   54 -155 07227500
    4 CHM00052955 2020-01-04   62 -127 07227500
    5 CHM00052955 2020-01-05   62 -149 07227500
    6 CHM00052955 2020-01-06    3 -216 07227500
    >