I have a data
as follows:
data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
Wt=c(91,92,85,205,285,43,95,75,76,NA),
Ht=c(185,182,173,171,600,650,NA,890,NA,NA))
Wt
represents the weight in kilograms and Ht
represents the height in centimeters. In this example, I want to treat the values of Wt
bigger than 200 as outliers and change to some specific numbers.
Also, I want to treat the values of Ht
bigger than 250 as outliers and change to NA
.
In my actual data
, there are few outliers in Wt
and many outliers in Ht
.
So, I could find the outliers for Wt
by using the code below:
a1<-data$Wt
a1<-data.frame(a1)
a1<-na.omit(a1)
b1<-a1[a1$a1>200, ]
b1 #205,285
I want to change 205 to 80 and change 285 to 90. (Because, in my actual data, there are few outliers for Wt
, so that I can change them individually.)
Also, I want to make the values of Ht
bigger than 250 as NA
. So my expected output is as follows:
data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
Wt=c(91,92,85,80,90,43,95,75,76,NA),
Ht=c(185,182,173,171,NA,NA,NA,NA,NA,NA))
Do it by reference using data.table
:
library(data.table)
setDT(data)
data[Ht > 250, Ht := NA]
data[Wt == 205, Wt := 80]
data[Wt == 285, Wt := 90]
data
id Wt Ht
1: 1 91 185
2: 2 92 182
3: 3 85 173
4: 4 80 171
5: 5 90 NA
6: 6 43 NA
7: 7 95 NA
8: 8 75 NA
9: 9 76 NA
10: 10 NA NA
For more info, see: Introduction to data.table
.
To implement the ifelse
method suggested by Shawn in data.table
, you could do something like the following. This is still updating by reference, so there is still no need to write something like data <- data %>% ...
.
library(data.table)
setDT(data)
data[, `:=`(Ht = fifelse(Ht > 250, NA_real_, Ht),
Wt = fcase(Wt == 205, 80,
Wt == 285, 90,
rep(TRUE, .N), Wt))]
Note that fifelse
is data.table
's fast ifelse
, and I use fcase
to handle the two Wt
conditions together (although the rep(TRUE, .N)
trick is a bit hacky, so maybe it would be nicer to just use two fifelse
calls for Wt
instead of fcase
).
Update 2023-07-17: If you need a vectorised option, I suggest using plyr::mapvalues
. Even though plyr
is retired, you can just use body(plyr::mapvalues)
and do the same thing for it's few dependencies, and hence put the code into your own project.