I want to perform dendrogram visualization using hierarchical grouping with Minkowski method on my dataset from eurostat library. I want to make values shown in this dendrogram:
to display country names like in this one
I can only use base R packages and/or ggplot2 due to project's requirements.
Use this code to recreate my situation:
install.packages("eurostat")
install.packages("dplyr")
install.packages("ggplot2")
library(eurostat)
library(dplyr)
library(ggplot2)
member_states <- c("AT", "BE", "BG", "HR", "CY", "CZ",
"DK", "EE", "FI", "FR", "DE", "GR",
"HU", "IE", "IT", "LV", "LT", "LU",
"MT", "NL", "PL", "PT", "RO", "SK",
"SI", "ES", "SE", "EL")
hicp <- get_eurostat("prc_hicp_manr", time_format = "date")
hicp_filtered <- hicp %>% filter(time >= as.Date("2000-02-01")
& time <= as.Date("2022-09-01")) %>%
filter(coicop == "CP00") %>%
filter(geo %in% member_states) %>%
mutate(geo = case_when(
geo == "AT" ~ "Austria",
geo == "BE" ~ "Belgium",
geo == "BG" ~ "Bulgaria",
geo == "HR" ~ "Croatia",
geo == "CY" ~ "Cyprus",
geo == "CZ" ~ "Czech Republic",
geo == "DK" ~ "Denmark",
geo == "EE" ~ "Estonia",
geo == "FI" ~ "Finland",
geo == "FR" ~ "France",
geo == "DE" ~ "Germany",
geo == "GR" ~ "Greece",
geo == "HU" ~ "Hungary",
geo == "IE" ~ "Ireland",
geo == "IT" ~ "Italy",
geo == "LV" ~ "Latvia",
geo == "LT" ~ "Lithuania",
geo == "LU" ~ "Luxembourg",
geo == "MT" ~ "Malta",
geo == "NL" ~ "Netherlands",
geo == "PL" ~ "Poland",
geo == "PT" ~ "Portugal",
geo == "RO" ~ "Romania",
geo == "SK" ~ "Slovakia",
geo == "SI" ~ "Slovenia",
geo == "ES" ~ "Spain",
geo == "SE" ~ "Sweden",
geo == "EL" ~ "Greece",
TRUE ~ geo
))
data <- hicp_filtered[, c(3,4,5)]
data_widened <- reshape(transform(data,
id = ave(seq_along(geo), geo, FUN = seq_along)),
idvar = c("id", "time"),
direction = "wide", timevar = "geo")
To perform that classification analysis I tried to write this code:
distance_matrix <- dist(data_widened[3:29, ], method = "minkowski", p = 1.5)
hc <- hclust(distance_matrix, method = "ward.D2")
plot(hc)
How can I replace those weird values with country names and align clusters on my plot too look like in the desired form?
Thanks in advance.
You have got the row and column indices round the wrong way, and you also need to transpose the data.
# Remove "values." from the names of each column
names(data_widened) <- gsub("values\\.", "", names(data_widened))
distance_matrix <- dist(t(data_widened[,3:29]), method = "minkowski", p = 1.5)
hc <- hclust(distance_matrix, method = "ward.D2")
plot(hc)