rspatial

How do I sample points within multiple polygons loaded from a kml file in R?


I have a kml file that describes four polygons(the kml file is called "core.kml", and I have put a copy at this link). I would like to choose random points within the area defined by the polygons. Ideally I would like to choose points evenly over the whole area, so choose more points from the larger polygons than the small ones, in proportion to their area. I have borrowed the code from answer 1 here, which works beautifully with the North Carolina polygon (at least when I update it by inserting nc <- st_geometry(nc)). But with my polygons, the st_sample step always gives this error:

st_as_s2(): dropping Z and/or M coordinate Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented, :
Loop 0 is not valid: Edge 15 crosses edge 19

I have tried inserting sf_use_s2(FALSE), but I still get the same error. The plot command does work, and the polygons look as expected. My code:

core<-st_read("core.kml")
core<-st_geometry(core) 
sf_use_s2(FALSE) 
core.points<-st_sample(core,5) 
plot(core)

Solution

  • If it's about to ensure that there are points on each polygon, you might like to try something along the lines

     library(sf)
     sf_use_s2(FALSE) 
    #> Spherical geometry (s2) switched off
     core = st_read("/Users/cara/Desktop/core.kml") |> st_geometry() |> st_make_valid() 
    
     core 
    #> Geometry set for 4 features 
    #> Geometry type: GEOMETRY
    #> Dimension:     XY
    #> Bounding box:  xmin: -79.06133 ymin: 33.69862 xmax: -78.90788 ymax: 33.8351
    #> Geodetic CRS:  WGS 84
    #> MULTIPOLYGON (((-79.014 33.80998, -79.00692 33....
    #> POLYGON ((-79.06133 33.82931, -79.05514 33.8306...
    #> POLYGON ((-78.99585 33.78962, -78.99496 33.7901...
    #> POLYGON ((-78.91353 33.70372, -78.91197 33.7052...
     
     ppwa = \(sfc, pts) { 
       # points per weighted area 
       a = st_area(sfc)
       p = proportions(a)
       as.numeric(p * pts) 
     }
     
     # ensure sum(n) == pts, see below
     floor_preserve = \(x) {
       fx = floor(x)
       i = tail(order(x - fx), round(sum(x)) - sum(fx))
       fx[i] = fx[i] + 1
       fx
     }
     
     n = ppwa(core, 25) |> floor_preserve()
     n 
    #> [1] 14  2  6  3
     sum(n)
    #> [1] 25
     
     set.seed(1119)
     p = st_sample(core, n)
    #> although coordinates are longitude/latitude, st_intersects assumes that they
    #> are planar
    #> ... gets repeated ...
    
     plot(core) 
     points(p, pch = 4, col = palette.colors()[7])
    #> Warning in st_centroid.sfc(x, of_largest_polygon = of_largest_polygon):
    #> st_centroid does not give correct centroids for longitude/latitude data
    

    Notice that (1) we haven't corrected issues with core. Please have a look on the warnings. This might be a good discussion to start your research; islands are difficult; (2) the total number of points is not random, as we set it to 25. The number of points per polygon is therefore not random randon: it's 25 multiplied by the relative area of each polygon, i.e. weighted.