rk-meansfactoextra

Importing data for k-means clustering


I'm trying to follow this

https://uc-r.github.io/kmeans_clustering

library(tidyverse)  # data manipulation
library(cluster)    # clustering algorithms
library(factoextra) # clustering algorithms & visualization

distance <- get_dist(df)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

Which, as expected, works great.

It may be something really simple: why is there no column name for what is obviously the state field?

If I try and use this methodology with a dataset like this

ipl <- read.csv("https://query.data.world/s/3kadbuzyj25jwe42k6tgij56gscept?dws=00000", header=TRUE, stringsAsFactors=FALSE)
ipl <- na.omit(ipl)

distanceipl <- get_dist(ipl)
fviz_dist(distanceipl, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

Instead of the player names on each axis, I get what I think are the row numbers. How do I get the player names in PLAYER on the axes?

There are two solutions here: either label the vizualisation by using ggplot2

+scale_y_discrete (labels = FIELDFORLABELLING)

or pass the player name to row names:

rownames(dataframe) <-dataframe$FIELDFORLABELLING

thanks for the answers!


Solution

  • From the docs:

    fviz_dist(): returns a ggplot2

    So you can just add labels the way you would with a normal ggplot2 object, i.e.:

    fviz_dist(distanceipl, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07")) + scale_y_discrete(labels = ipl$PLAYER)

    enter image description here