I have a dataset which looks like
Date temperature sysid appliance_name
01.03.18 13 abcd123 silverstar 320
02.03.18 15 abcd123 silverstar 320
03.03.18 17 abcd123 silverstar 320
04.03.18 15 abcd123 silverstar 320
01.03.18 18 bcfw5678 silverstar erp30i
02.03.18 14 bcfw5678 NA
03.03.18 18 bcfw5678 silverstar erp30i
04.03.18 10 bcfw5678 silverstar erp30i
01.03.18 16 ygft3456 NA
02.03.18 13 ygft3456 silverstar 30i
03.03.18 12 ygft3456 silverstar 30i
04.03.18 18 ygft3456 silverstar 30i
It is clear for me to fill the NAs with the same appliance_name
as the appliance_name
of a record with same sysid
. For example, the record 02.03.18 14 bcfw5678 NA
should have the NA replaced with silverstar erp30i
because from the other records it is clear that this particular sysid
belongs to silverstar erp30i
. I cannot use do.locf
because it is possible that the previous record belongs to another appliance group.
What are the ways to treat NAs in this case?
One way to handle this is by replacing the NA
's by first non-NA value in each group (sysid
).
library(dplyr)
df %>%
group_by(sysid) %>%
mutate(appliance_name = replace(appliance_name, is.na(appliance_name),
appliance_name[which.max(!is.na(appliance_name))]))
# Date temperature sysid appliance_name
# <fct> <int> <fct> <fct>
# 1 01.03.18 13 abcd123 silverstar320
# 2 02.03.18 15 abcd123 silverstar320
# 3 03.03.18 17 abcd123 silverstar320
# 4 04.03.18 15 abcd123 silverstar320
# 5 01.03.18 18 bcfw5678 silverstarerp30i
# 6 02.03.18 14 bcfw5678 silverstarerp30i
# 7 03.03.18 18 bcfw5678 silverstarerp30i
# 8 04.03.18 10 bcfw5678 silverstarerp30i
# 9 01.03.18 16 ygft3456 silverstar30i
#10 02.03.18 13 ygft3456 silverstar30i
#11 03.03.18 12 ygft3456 silverstar30i
#12 04.03.18 18 ygft3456 silverstar30i