rspatialrgeo-shapefile

Spatial inverse subset using square brackets in R


I have a spatial point data frame -> spatial_points

and a polygon -> spatial_poly

I can subset all points within the polygon using

subset_within <- spatial_points[spatial_poly,]  which is nice and intuitive.

But if I want to subset all points outside the polygon, I can't use

subset_ouside <- spatial_points[-spatial_poly,]

This question has been asked before, and the answer was to use gDifference() from the rgeos package. Fine.

My question is, why does [ ] work for selecting within, not the inverse? I don't really understand the error message

Error in h(simpleError(msg, call)) : error in evaluating the argument 'i' in selecting a method for function '[': invalid argument to unary operator

Just curious. Thanks.

EDIT

Here is an example borrowed from Subset spatial points with a polygon

require(rgeos)
require(sp)

##create spdf
coords=expand.grid(seq(150,151,0.1),seq(-31,-30,0.1))
spdf=data.frame("lng"=coords[,1],"lat"=coords[,2])
coordinates(spdf) = ~lng+lat
proj4string(spdf)<- CRS("+init=epsg:4326")
plot(spdf)

##create poly
poly1 = SpatialPolygons(list(Polygons(list(Polygon(cbind(c(150.45,150.45,150.75,150.75,150.45),c(-30.75,-30.45,-30.45,-30.75,-30.75)))),ID=1)))
proj4string(poly1)<- CRS("+init=epsg:4326")

##get points withing polygon
points_within <-spdf[poly1,]  # this works

plot(spdf)
plot(poly1, add=T)
plot(points_within,col="blue",pch=16,add=T)

##get points outside polygon
points_outside <-spdf[-poly1,]  # this does not work - why??

In this simple example one can use gDifference(), which works in this example. However, my SpatialPointDataframe is very large, and using gDifference crashes R.


Solution

  • When you do df[2, 1] in R, you are actually calling a function. The function is '['(df, 1, 2). It's just the parser hides this from you, and this allows you to write code in a more natural way.

    If you think about it, the [ operator does different things depending on the type of object you are using, even if the operations are conceptually similar. The actual code that returns a subset of a numeric vector is different from the code that returns the subset of a matrix, or of a list. In fact, there are some objects in R for which calling the [ function doesn't make sense and isn't implemented. For example, if you try to call it on a function name:

    print[1]
    #> Error in print[1] : object of type 'closure' is not subsettable
    

    If you create a complex new class in R with various different members, you need to define what the [ operator means, and you need to implement it. What does it mean to subset a SpatialPoints class by a SpatialPolygon class? R has no way to know that on its own, so when the author of the sp package created the SpatialPolygons class, he had to write the methods that do the subsetting based on the operands that are passed to the operator [. You can see the source code here.

    If you trace the logic through, you will see that in the case of spdf[poly1,], the subset is determined by the use of other spatial functions, and boils down to

    which(!is.na(over(spdf, geometry(poly1))))
    #> 39 40 41 50 51 52 61 62 63 
    #> 39 40 41 50 51 52 61 62 63
    

    And these numeric subsets are then used to subset the actual polygons to return a new object consisting of just the subset. That means we could get the points_outside in a similar way:

    points_within  <- spdf[poly1,] 
    points_outside <- spdf[which(is.na(over(spdf, geometry(poly1))))]
    
    plot(spdf)
    plot(poly1, add = TRUE)
    plot(points_within, col="blue", pch = 16, add = TRUE)
    plot(points_outside, col="red", pch = 16, add = TRUE)
    

    enter image description here

    But to answer your main question, which is why does spdf[-poly1,] not work, you have to realise that this actually means '['(spdf, -poly1). For this to be evaluated, it would first be necessary to evaluate -poly1, but if you try that, then you get:

    -poly1
    #> Error in -poly1 : invalid argument to unary operator
    

    Of course, it doesn't really make sense to apply the - operator to a SpatialPoints object on its own. Take the points away from what?

    In truth, it would be possible to write the function so it works in this way, but it would require a complex bit of non-standard evaluation. You could submit it as a feature request on that GitHub page, but personally I'd be happy using the function above.

    I hope that makes things clearer.