I'm trying to interpolate data by using the spline function (stats R package).
Specifically, I have a dataset like the following one:
DATE Y
01/01/2020
02/01/2020 0.705547512
04/01/2020 0.760723591
06/01/2020 0.014017642
07/01/2020
09/01/2020 0.579518616
10/01/2020
12/01/2020 0.7747401
15/01/2020 0.289562464
19/01/2020
I would like to learn how to interpolate missing data on the base of the other values (e.g. Y variable values for the January 1st, January 7th,...). The aim is to populate such missing data; in order to do that, browsing on the internet, I found the spline R function that should do this task.
Can someone help me to compute the interpolated data? Thanks in advance.
So, I tried to implement the following R code in order to interpolate missing data.
SPLINE<- spline(x=df[2],
y=df[1],
method = "natural")$y
The outcome is a numeric vector with 3 record; all of them are equal to 10. I don't understand the ratio behind this kind of interpolation since I expected a vector with 10 record and all observations equal to the original Y variable except for the record corresponding the 2020-01-07, 2020-01-10 and 2020-01-19 that were missing and the spline function populates with the selected method.
It's difficult to tell what your problem is because your data is not reproducible. Are those really empty cells in your data frame? A numeric column can't have empty cells - they would have to be NA values. If they look empty when you print the data frame, then it is a character column and must be converted to numeric, or else spline
won't work. Also, are those real date objects, or are they just character strings that represent dates? Again, if they are character strings, spline
won't work.
Let's take your example data as given:
df <- read.table(text = "
DATE Y
01/01/2020 ''
02/01/2020 0.705547512
04/01/2020 0.760723591
06/01/2020 0.014017642
07/01/2020 ''
09/01/2020 0.579518616
10/01/2020 ''
12/01/2020 0.7747401
15/01/2020 0.289562464
19/01/2020 ''
", header = TRUE)
Now we convert to the correct formats:
df$DATE <- as.Date(df$DATE, format = '%d/%m/%Y')
df$Y <- as.numeric(df$Y)
Following which, spline
works just fine. Let's use it to generate a smooth line consisting of 100 points:
SPLINE <- spline(x = df$DATE, y = df$Y, n = 100, method = 'natural')
plot(df$DATE, df$Y, ylim = c(-0.1, 1))
lines(SPLINE$x, SPLINE$y)
Created on 2023-09-01 with reprex v2.0.2