Is it possible to scrape the polygon data from this interactive map in R?
🔗 https://fogocruzado.org.br/mapadosgruposarmados
The map shows territories controlled by armed groups across different years. I'd like to automate the extraction of this data for all years and save each as multiple shapefiles.
They mention an API, but this specific data doesn’t seem to be available through it.
I did this but give me something that makes no sense.
# Load the HTML file
html_file = read_html(html_path)
# Extract path data from the HTML
path_data = html_file %>%
html_nodes("path.leaflet-interactive") %>%
html_attr("d")
# Function to clean and extract coordinates from path data
clean_coordinates <- function(path) {
# Remove "M", "L", "z" commands (move, line, close)
path <- gsub("[MLz]", "", path) # Remove M, L, and z commands
path <- gsub("[^0-9,.-]", " ", path) # Remove any non-numeric characters except comma and period
coords <- strsplit(path, " ") # Split by spaces to separate coordinates
coords <- as.numeric(unlist(coords)) # Flatten list and convert to numeric
# Ensure coordinates are in pairs (lat, lon)
coords_matrix <- matrix(coords, ncol = 2, byrow = TRUE)
# Ensure the coordinates are in [lon, lat] order for GeoJSON
coords_matrix <- coords_matrix[, c(2, 1)] # Switch lat, lon order
# Optionally, adjust the coordinates if they are too far off (e.g., by shifting or scaling)
# coords_matrix <- coords_matrix - min(coords_matrix, na.rm = TRUE) # Adjust if needed
return(coords_matrix)
}
# Apply the function to clean the coordinates
coordinates_list <- lapply(path_data, clean_coordinates)
# Create GeoJSON features
geojson <- list(
type = "FeatureCollection",
features = lapply(coordinates_list, function(coord) {
list(
type = "Feature",
geometry = list(
type = "Polygon", # or "LineString" based on the data
coordinates = list(coord) # Directly use the coordinates list
)
)
})
)
# Convert the result to JSON
geojson_json <- toJSON(geojson, pretty = TRUE)
Blockquote
Ahhh I finally got it :)
selenider
to open your urlexecute_js_expr
to..document.querySelectorAll('.leaflet-container')
map = mapContainers[0].htmlwidget_data_init_result.getMap()
layer.options.group == "2023"
- there are multiple layers, not all are important here. Somehow I could only retrieve the group 2023. You might need to instruct selenider
to click on the other layers to retrieve the data for the other yearspolygons
listlet polygon = {
coordinates: layer.getLatLngs(),
fillColor: layer.options.fillColor,
label: layer.options.label || 'Unknown'
};
polygons.push(polygon);
return JSON.stringify(polygons);
toJSON
to write the polygons to dataframe
library(selenider)
library(jsonlite)
session <- selenider_session("selenium", browser = "chrome")
open_url("https://fogocruzado.org.br/mapadosgruposarmados")
map_analysis <- execute_js_expr("
// retrieve map object
let mapContainers = document.querySelectorAll('.leaflet-container');
let map = mapContainers[0].htmlwidget_data_init_result.getMap();
console.log(map._layers); // look at the selenider-console
let layers = map._layers; // just for debugging and inspecting layers
// collect polygons
let polygons = [];
for (let layerId in layers) {
let layer = layers[layerId];
// Filter for group 2023 - idk where the other groups are
if (layer.options && layer.options.group == '2023') {
// for Polygon-Layer (L.Polygon oder L.Rectangle)
if (layer.getLatLngs) {
let polygon = {
coordinates: layer.getLatLngs(),
fillColor: layer.options.fillColor,
label: layer.options.label || 'Unknown',
popup: layer.options.popup // get popup data too!
};
polygons.push(polygon);
}
}
}
//console.log(polygons);
return JSON.stringify(polygons);
")
polygons_data <- jsonlite::fromJSON(map_analysis)
# now build our own leaflet map using the data
library(leaflet)
m <- leaflet() %>%
addTiles()
for (i in 1:nrow(polygons_data)) {
coord_entry <- polygons_data$coordinates[[i]]
# Some coordinates NULL- yikes
if (!is.null(coord_entry) && length(coord_entry) > 0 && is.data.frame(coord_entry[[1]])) {
coords_df <- coord_entry[[1]]
# Check if the data frame has both lat and lng and is not empty
if (nrow(coords_df) > 2 && all(c("lat", "lng") %in% colnames(coords_df))) {
m <- m %>% addPolygons(
lng = coords_df$lng,
lat = coords_df$lat,
fillColor = polygons_data$fillColor[i],
color = "#000000",
weight = 1,
opacity = 1,
fillOpacity = 0.5,
popup = polygons_data$popup[i]
)
}
}
}
m
execute_js_expr
. Selenider can also perform clicks on elements, then the layer info for all years can be crawled over and pasted together to one huge data.frame.