raveragenana.approx

Substitute Average of Previous and Next Available Values of Field for NA Values in Dataframe


The sample data set of available much bigger data set is in following format:

Station <-c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A")
Parameter <-c(2,3,NA,4,4,9,NA,NA,10,15,NA,NA,NA,18,20)
Par_Count <-c(1,1,1,2,2,1,2,2,1,1,3,3,3,1,1)

df<-data.frame(Station, Parameter, Par_Count)
df
Station  Parameter  Par_Count
   A        2          1
   A        3          1
   A        NA         1
   A        4          2
   A        4          2
   A        9          1
   A        NA         2
   A        NA         2
   A        10         1
   A        15         1
   A        NA         3
   A        NA         3
   A        NA         3
   A        18         1
   A        20         1

I want to approximate NA's which are less than 2 in number with average of next and previous available values for NA in that column. In original data set somewhere NA's are 100's in number, so I want to ignore consecutive NA's greater than 3 in number. Par_Count represent number of consecutive occurrences of that particular value in parameter. I tried with: library(zoo) df1 <- within(df, na.approx(df$Parameter, maxgap = 2)) and even for for single occurence with: df1 <- within(df, Parameter[Parameter == is.na(df$Parameter) & Par_Count == 1] <- lead(Parameter) - lag(Parameter)) but nothing worked. It didn't change any occurrence of NA value. The desired output is like:

Station  Parameter  Par_Count
       A        2          1
       A        3          1
       A        3.5        1
       A        4          2
       A        4          2
       A        9          1
       A        9.5        2
       A        9.75       2  <--here 9.5 will also work
       A        10         1
       A        15         1
       A        NA         3
       A        NA         3
       A        NA         3
       A        18         1
       A        20         1

Solution

  • You are nearly there. I think you have misinterpreted the use of within. If you would like to use within, You need to assign the output of na.approx to a column of the data frame. The following will work:

    library(zoo)
    df1 <- within(df, Parameter <- na.approx(Parameter, maxgap = 2, na.rm = FALSE))
    

    Note it is advisable to use na.rm = FALSE, otherwise leading or trailing NAs will be removed, leading to an error.

    Personally, I think the following is more readable, though it is a matter of style.

    library(zoo)
    df1 <- df
    df1$Parameter <- na.approx(df$Parameter, maxgap = 2, na.rm = FALSE))