Some values are so different from group because there are missing rows and the data is not continuous make my diffVal unusual.
> df
Date diffVal1 diffVal2
1 2017-05-31 04:01:00 718 483
2 2017-05-31 05:01:00 704 477
3 2017-05-31 06:01:00 741 478
4 2017-05-31 07:01:00 874 483
5 2017-05-31 08:01:00 907 495
6 2017-05-31 09:01:00 887 510
7 2017-05-31 10:01:00 2922 514
8 2017-05-31 13:01:00 1012 529
9 2017-05-31 14:01:00 979 539
10 2017-05-31 15:01:00 886 485
11 2017-05-31 16:01:00 818 471
You can see that there are missing rows at Date (hour;11,12) I need to smooth the unusual value to normal.
I'm trying to set unusual value to NULL but the problem is how to know there are unusual value in data frame from big data frame if it is my example data frame I can set the data that is more then 1200 to NA (It is not good idea at all because it is not reasonable.) then use approximated NA values function na.approx()
and I have to get plot of these later.
df$diffVal1 <- ifelse((df$diffVal1>1300), NA,df$diffVal1)
df$diffVal1 <- na.approx(df$diffVal1)
> df
Date diffVal1 diffVal2
1 2017-05-31 04:01:00 718.0 483
2 2017-05-31 05:01:00 704.0 477
3 2017-05-31 06:01:00 741.0 478
4 2017-05-31 07:01:00 874.0 483
5 2017-05-31 08:01:00 907.0 495
6 2017-05-31 09:01:00 887.0 510
7 2017-05-31 10:01:00 949.5 514
8 2017-05-31 13:01:00 1012.0 529
9 2017-05-31 14:01:00 979.0 539
10 2017-05-31 15:01:00 886.0 485
11 2017-05-31 16:01:00 818.0 471
What should do with this problem? And how to add missing rows by Date to approximate from adding again?
Thank you so much for your helps.
Tell me if that works for you:
data preparation :
df <- read.table(text="Date; diffVal1; diffVal2
1; 2017-05-31 04:01:00; 718; 483
2; 2017-05-31 05:01:00; 704; 477
3; 2017-05-31 06:01:00; 741; 478
4; 2017-05-31 07:01:00; 874; 483
5; 2017-05-31 08:01:00; 907; 495
6; 2017-05-31 09:01:00; 887; 510
7; 2017-05-31 10:01:00; 2922; 514
8; 2017-05-31 13:01:00; 1012; 529
9; 2017-05-31 14:01:00; 979; 539
10; 2017-05-31 15:01:00; 886; 485
11; 2017-05-31 16:01:00; 818; 471",sep=";",header=TRUE,stringsAsFactors=FALSE)
df$Date <- as.POSIXct(df$Date)
df$diffVal1 <- as.numeric(df$diffVal1)
df$diffVal2 <- as.numeric(df$diffVal2)
all_dates <- data.frame(Date = seq(min(df$Date),max(df$Date),by=3600))
work and result :
df2 <- df
df2 <- df2[order(df2$Date,decreasing=TRUE),]
df2$Val1_total <- cumsum(df2$diffVal1)
df2 <- merge(df2,all_dates,all.y = TRUE)
df2$Val1_total[is.na(df2$Val1_total)] <- approx(x = df2$Date, y = df2$Val1_total, xout = df2$Date[is.na(df2$Val1_total)])$y
df2$diffVal1 <- c(-diff(df2$Val1_total),tail(df2$diffVal1,1))
# > df2
# Date diffVal1 diffVal2 Val1_total
# 1 2017-05-31 04:01:00 718 483 11448
# 2 2017-05-31 05:01:00 704 477 10730
# 3 2017-05-31 06:01:00 741 478 10026
# 4 2017-05-31 07:01:00 874 483 9285
# 5 2017-05-31 08:01:00 907 495 8411
# 6 2017-05-31 09:01:00 887 510 7504
# 7 2017-05-31 10:01:00 974 514 6617
# 8 2017-05-31 11:01:00 974 NA 5643
# 9 2017-05-31 12:01:00 974 NA 4669
# 10 2017-05-31 13:01:00 1012 529 3695
# 11 2017-05-31 14:01:00 979 539 2683
# 12 2017-05-31 15:01:00 886 485 1704
# 13 2017-05-31 16:01:00 818 471 818