rcoding-efficiency

How to code it in a more efficient way : delete multiple row with a very complex condition in R


Below is a sample of a large data set from which I want to delete quadrats (Qm) numbered greater than than 3 in parcels (PARCELLE) 1, 3, 4, and 8.

FIELD   SECTOR  PARCELLE    Qm  Total
North   A   1   1   2
North   A   1   2   3
North   A   1   3   0.5
North   A   1   4   0.5
North   A   1   5   1
North   A   1   6   0.5
North   B   2   1   10
North   B   2   2   3
North   B   2   3   4
North   B   2   4   2
North   B   2   5   7
North   B   2   6   25
North   C   3   1   0
North   C   3   2   0
North   C   3   3   2
North   C   3   4   5
North   C   3   5   0.5
North   C   3   6   1
North   D   4   1   0
North   D   4   2   0
North   D   4   3   0
North   D   4   4   0
North   D   4   5   0
North   D   4   6   85
North   E   5   1   0
North   E   5   2   5
North   E   5   3   0.5
North   E   5   4   0
North   E   5   5   0
North   E   5   6   0
North   F   6   1   0.5
North   F   6   2   0.5
North   F   6   3   0.5
North   F   6   4   0
North   F   6   5   0
North   F   6   6   0
North   G   7   1   0.5
North   G   7   2   0.5
North   G   7   3   2
North   G   7   4   2
North   G   7   5   0.5
North   G   7   6   0
North   H   8   1   0.5
North   H   8   2   1
North   H   8   3   60
North   H   8   4   0.5
North   H   8   5   0.5
North   H   8   6   1

I have achieved this manipulation with one statement for each parcel.

New_Data <- Data_Frame[!(Data_Frame$PARCELLE == "1" & Data_Frame$Qm > 3), ]
New_Data <- New_Data[!(New_Data$PARCELLE == "3" & New_Data$Qm > 3), ]
New_Data <- New_Data[!(New_Data$PARCELLE == "4" & New_Data$Qm > 3), ]
New_Data <- New_Data[!(New_Data$PARCELLE == "8" & New_Data$Qm > 3), ]

I want to condense my code but I can't figure out how to specify a condition on the parcel number. I would like my code to resemble something like this:

New_Data <- Data_Frame[!(Data_Frame$PARCELLE == "1 & 3 & 4 & 8" & Data_Frame$Qm > 3), ]

Solution

  • Use %in% operator:

    Data_Frame[!(Data_Frame$PARCELLE %in% c(1, 2, 3) & Data_Frame$Qm>3),]
    

    You can also use the following:

     subset(Data_Frame, !(PARCELLE %in% c(1, 2, 3) & Qm > 3))
    

    The two are only different in terms of how they treat NA with the first returning NA where the data was NA while the second drops the NA data