I've a problem with the control of the pattern of two class labels (1 and 2) results in the classification task using k-medoids. I'd like to apply the cluster::clara
in two areas (ID
) g2
and g3
and the same classification label for both areas, in my example:
# Packages
library(cluster)
library(ggplot2)
my_ds <-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/class_areas_ds.csv")
str(my_ds)
# 'data.frame': 194789 obs. of 5 variables:
# $ x : num 426060 426060 426060 426060 426060 ...
# $ y : num 8217410 8217410 8217410 8217410 8217410 ...
# $ ID : chr "g2" "g2" "g2" "g2" ...
# $ R : num 0.455 0.427 0.373 0.463 0.529 ...
# $ HUE: num -0.00397 -0.00384 -0.0028 -0.00369 -0.00352 ..
# Classification based in `R` and `HUE` variables
res<-NULL
areas<-unique(my_ds$ID)
for(i in 1:length(areas)){
my_ds_split<-my_ds[my_ds$ID==areas[i],]
k.medoids.res<-cluster::clara(my_ds_split[,4:ncol(my_ds_split)], 2, metric ="manhattan")
my_ds_split.F<-cbind(my_ds_split, class = k.medoids.res$clustering)
my_ds_split.F$class<-ifelse(my_ds_split.F$class==1,0,1)
res<-rbind(res,cbind(my_ds_split.F))
}
res<-as.data.frame(res)
# Plot the results
plots <- list()
for (g in 1:length(areas)) {
my_ds_split_class<-res[res$ID==areas[g],]
plots[[g]] <- ggplot() +
geom_point(data=my_ds_split_class,
aes(x=x, y=y, color=class)) +
theme_void()
}
plots[[1]]
plots[[2]]
In the plots, the classification of the area g2
is the opposite of the g3
and make just only one classification with g2
and g3
dataset together is not an option, because I'm my original data set I have 90 thousand areas and my RAM memory is just 64GB.
Please, any help for me find any way to create the same agreement on classification between several areas?
There is a trick to it! You need to start always with the higher or lower values of the data set, just only put and remove then after the classification and works very well, in this case using the lower value in the variable R
:
library(dplyr)
my_ds_split<-my_ds[my_ds$ID==areas[i],]
min.start.value <- my_ds_split %>%
slice(which.min(R))
my_ds_split <- rbind(min.start.value,my_ds_split)
k.medoids.res<-cluster::clara(my_ds_split[,4:ncol(my_ds_split)], 2, metric ="manhattan")
my_ds_split.F<-cbind(my_ds_split, class = k.medoids.res$clustering)
my_ds_split.F<-my_ds_split.F[-c(1),]