I have a data frame with 60000 rows. In one of the column, the values alternate between "landing" and "take off", as such.
Movements | Flight number | Destination |
---|---|---|
take-off | 011 | Paris |
landing | 011 | Paris |
take-off | 053 | Ibiza |
landing | 053 | Ibiza |
take-off | 067 | Mans |
take-off | 123 | Geneva |
landing | 123 | Geneva |
But there are some mistakes, like here in row 5. In this instance, I have the take-off row, but not the landing one.
Is there any way in R or Excel to highlight or select those rows? To be specific: flight numbers are not unique and appear multiple times throughout it.
Thanks!
I tried this:
data() <- read.csv("Flight data.csv")
# Tried to find the positions where values don't alternate correctly
incorrect_positions <- which(data$event[-1] == data$event[-length(data$event)]) + 1
if (length(incorrect_positions) == 0) {
print("The values in the 'event' column alternate correctly.\n")
} else {
print("The values in the 'event' column do not alternate correctly at positions:", incorrect_positions, "\n")
}
It did not give me an error, but apparently doesn't correspond to what I want - it kept printing that all the values alternate correctly, when I know for a fact that they don't.
Using as.factor
then as.integer
to strip of the levels. diff
erences should be -1 or 1; so taking abs
will help to identify which
are bad ones.
> f <- \(x) c(1L, abs(diff(as.integer(as.factor(x)))))
> f(dat$Movements)
[1] 1 1 1 1 1 0 1
> which(f(dat$Movements) != 1)
[1] 6
Data:
> dput(dat)
structure(list(Movements = c("take-off", "landing", "take-off",
"landing", "take-off", "take-off", "landing"), Flight.number = c(11L,
11L, 53L, 53L, 67L, 123L, 123L), Destination = c("Paris", "Paris",
"Ibiza", "Ibiza", "Mans", "Geneva", "Geneva")), class = "data.frame", row.names = c(NA,
-7L))