I want to compare two vegetation maps (.shp) using an alluvial diagram. One vegetation map is from 2010, one from 2023. Mapping units in both 2010 and 2023 are the same. Mapping area's however where not the same, so i intersected both maps.
Now i have the data (after some thorough restructuring of the dataframe) in the following format (random subset of 15 rows, total is in original dataset is 9220):
ID value total Jaar
<int> <chr> <dbl> <dbl>
1 1927 H0000 2.33e- 8 2023
2 1030 H4030 7.64e+ 5 2010
3 2447 H7120 3.65e- 5 2023
4 301 H0000 2.47e- 8 2023
5 611 H0000 2.73e-17 2023
6 4021 H0000 1.17e+ 5 2010
7 1531 H0000 3.11e+ 4 2023
8 759 H0000 2.84e- 4 2010
9 1339 H6230 6.51e- 7 2010
10 2848 H9999 2.23e- 5 2010
11 1740 H4010A 3.17e- 7 2023
12 335 H4030 5.90e- 5 2023
13 4182 H7120 1.47e- 3 2023
14 2676 H0000 3.81e+ 4 2023
15 2828 H9999 2.89e+ 5 2010
ID = unique id to be able to couple the vegetation in a map-polygon in 2010 to the vegetation in that same polygon in 2023. Up to three different vegetation-types can occur per polygon and it is possible that in 2010 only one type occured in a polygon and in 2023 3 types occured in a polygon. This means that there will be for example one 611 ID in 2010, and three 611 ID's in 2023.
value = vegetation type (habitat type)
total = total surface area in square meters
Jaar = year of mapping
i use the following code to make the alluvial diagram:
data_alluv_long %>%
mutate(Jaar = factor(Jaar, levels = c("2010",
"2023")),
value = factor(value, levels = c("H9999",
"H0000",
"H91D0",
"H7110B",
"H4030",
"H7120",
"H4010A",
"H3160",
"H6230",
"H7150",
"H7110A",
"H2320",
"H0410A",
"H0401A"
))) %>%
ggplot(
aes(x = Jaar,
stratum = value,
alluvium = ID,
y = total)) +
geom_alluvium(aes(fill = value),
alpha = 0.7) +
geom_stratum() +
theme_minimal() +
labs(
title = "Veranderingen in Habitattypen",
x = "Jaar",
y = "Oppervlakte"
)
I keep getting this error:
Error in `geom_alluvium()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `setup_data()`:
! Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).
As far as i can find out in the 'help'option, my df is in the right format.
Some values are extremely small (an artefact from the intersection), so i tried omitting these by adding filter(total > 0.001)
but same error occurs.
What i want from the alluvial: two bars, one for 2010 and one for 2023. The fill of the bars must be the vegetation types (habitat types). The flow must show how certain types stayed the same, or how certain times changed during the 13 years.
My question: Where does the error come from and how is my data not in the correct structure?
Link to complete dataset: https://www.dropbox.com/scl/fo/ov0tmhoujvbg4vohsj8td/AGcnP2Cd4Wt3jC4wuIeqYYA?rlkey=aq9lkm6rb0juod4jepkq03vqr&dl=0
The is_lodes_form()
check reveals that you have duplicated ID-axis pairings, which is what I suspected might happen given your earlier description that some polygons have different numbers of vegetation types between years.
Instead of requiring each ID to appear exactly once in each year, we create a unique flow_id that combines the original ID with an index for each vegetation type within that ID.
# First, let's load required libraries
library(tidyverse)
library(ggalluvial)
setwd(dirname(rstudioapi::getSourceEditorContext()$path)) # set the current script's location as working directory
# Read in data from csv
data_alluv_long <- read.csv("alluv_long.csv")
# Ensure each ID appears in both years
data_complete <- data_alluv_long %>%
group_by(ID) %>%
filter(n_distinct(Jaar) == 2) %>%
ungroup()
is_lodes_form(
data_complete,
Jaar,
value,
ID
)
# is wrong so there is something wrong --> Duplicated id-axis pairings. This is your error
# Fix it
# Function to prepare the data
prepare_alluvial_data <- function(data) {
# Step 1: Create a unique identifier for each ID-vegetation combination
data_prepared <- data %>%
group_by(ID, Jaar) %>%
# Create a unique combination identifier
mutate(veg_index = row_number(),
# Create a unique flow identifier
flow_id = paste(ID, veg_index, sep = "_")) %>%
ungroup()
# Step 2: Ensure the data is properly structured for the alluvial diagram
data_prepared <- data_prepared %>%
# Convert Jaar to factor
mutate(Jaar = factor(Jaar),
# Ensure value is a factor with specified levels if needed
value = factor(value))
return(data_prepared)
}
# Using your data:
data_ready <- prepare_alluvial_data(data_complete)
# Create the plot with the modified data
ggplot(data_ready,
aes(x = Jaar,
stratum = value,
alluvium = flow_id, # Using the new flow_id instead of ID
y = total,
fill = value)) +
geom_flow(alpha = 0.7) +
geom_stratum(alpha = 0.8) +
scale_x_discrete(expand = c(0.1, 0.1)) +
theme_minimal() +
labs(
title = "Veranderingen in Habitattypen",
x = "Jaar",
y = "Oppervlakte (m²)"
) +
scale_fill_discrete(name = "Habitattype") +
theme(legend.position = "right")
# Sanity Check
sum_H0000 <- data_ready %>%
filter(value == "H0000") %>%
group_by(Jaar) %>%
summarise(total_area = sum(total)) %>%
arrange(Jaar)
sum_H0000$total_area[2]/sum_H0000$total_area[1]
# Check total area by year
data_alluv_long %>%
group_by(Jaar) %>%
summarise(total_area = sum(total))