I am new to R and I am having some issues with the padr package described here.
I have a hourly data set that is missing hours and I would like to insert a row to input a value for the missing data. I am trying to use the pad function and the fill_by_value function from the padr package but I am getting an error when I use the pad function.
The data called Mendo is presented as:
Date.Local Time.Local Sample.Measurement
2016-01-01 00:00:00 3
2016-01-01 00:01:00 4
2016-01-01 00:02:00 1
2016-01-01 00:04:00 4
2016-01-01 00:05:00 5
I want the final data to look like:
Date.Local Time.Local Sample.Measurement
2016-01-01 00:00:00 3
2016-01-01 00:01:00 4
2016-01-01 00:02:00 1
2016-01-01 00:03:00 999
2016-01-01 00:04:00 4
2016-01-01 00:05:00 5
I am under the impression the padr packaged wants a datetime POSIXct column so I use the command
Mendo$Time.Local <- as.POSIXct(paste(Mendo$Date.Local, Mendo$Time.Local), format = '%Y-%m-%d %H:%M')
to get:
Time.Local Sample.Measurement
2016-01-01 00:00:00 3
2016-01-01 00:01:00 4
2016-01-01 00:02:00 1
2016-01-01 00:04:00 4
2016-01-01 00:05:00 5
Now I try to use the pad function like instruction in the link provided above. My line of code is:
Mendo_padded <- Mendo %>% pad
and I get the error:
Error in if (total_invalid == nrow(x)) { : missing value where TRUE/FALSE needed In addition: Warning message: In if (unique(nchar(x_char)) == 10) { : the condition has length > 1 and only the first element will be used
If this were to work, I would then use the command
Mendo_padded %>% fill_by_value(Sample.Measurement, value = 999)
to get all the missing hours Sample.Measurement value to be 999.
I would love feedback, suggestions or comments on what I may be doing wrong and how I can go about getting this code to work! Thank you!
It seems that pad
can automatically detect which column is of Date / POSIXct / POSIXlt type, so you do not need to supply Mendo$Time.Local
to pad
. The padding will be applied on hour intervals.
library(magrittr)
library(padr)
PM10 <- read.csv(file="../Downloads/hourly_81102_2016.csv",
stringsAsFactors = FALSE) # don't change the columns to factors
Mendo <- PM10[PM10$County.Name == "Mendocino",]
Mendo$Time.Local <-
as.POSIXct(paste(
Mendo$Date.Local, Mendo$Time.Local), format = '%Y-%m-%d %H:%M')
Mendo <- Mendo[,c("Time.Local", "Sample.Measurement")]
# remove Mendo$Time.Local
Mendo_padded <- Mendo %>% na.omit %>%
pad(interval = 'hour',
start_val = NULL, end_val = NULL, group = NULL,
break_above = 1)
You may also consider using the column Time.GMT
and Date.GMT
because date and time may depend on where you (your computer) are.
Edit: As suggested by OP, na.omit
should be used before pad
to avoid NA values in the Date column.