Given the following dataframe:
df1 <- data.frame(Company = c('A','B','C','D','E'), `X1980` = c(NA, 5, 3, 8, 13),
`X1981` = c(20, NA, 23, 11, 29),
`X1982` = c(NA, 32, NA, 41, 42),
`X1983` = c(45, 47, 53, 58, NA))
I would like to replace the NA's with values through interpolation over the rows resulting in the following data frame:
Company 1980 1981 1982 1983
A NA 20 32,5 45
B 5 18,5 32 47
C 3 23 38 53
D 8 11 41 58
E 13 29 42 NA
I tried using na.apporox in combination with apply:
df1[-1] <- t(apply(df1[-1], 1, FUN = na.approx))
But this results in the following error:
Fehler in h(simpleError(msg, call)) :
Fehler bei der Auswertung des Argumentes 'x' bei der Methodenauswahl für Funktion 't': dim(X) muss positive Länge haben
Thanks in advance for any help!!!
EDIT: I forgot to define how to treat the NA's in the na.approx function.
df1[-1] <- t(apply(df1[-1], 1, na.approx, na.rm=FALSE))
This results in the desired output!
This interpolation methods, from what i know, don't deal with NA
's at the end of the sequence, which is the case of entry Ex1983. So, na.approx
(at least the one from zoo
package) gives an optional argument na.rm
for you to select if you want to remove those NA's or not.
By default it is TRUE
, so it removes, giving your matrix a 4th column that only has 3 elements, not 4 (as the rest of the columns), so it produces that error. To deal with that you simply set that argument to FALSE
:
df1[-1] <- t(apply(df1[-1], 1, FUN = zoo::na.approx, na.rm=FALSE))
Output:
> df1
Company X1980 X1981 X1982 X1983
1 A NA 20 NA 45
2 B 5 NA 32 47
3 C 3 23 NA 53
4 D 8 11 41 58
5 E 13 29 42 NA
To fill that NA, i can help you use the answers from here, if you need me to.