rplotcluster-analysisnominal-data

Plotting clusters of nominal data in R


Imagine we have 7 categories (e.g. religion), and we would like to plot them not in a linear way, but in clusters that are automatically chosen to be nicely aligned. Here the individuals within groups have the same response, but should not be plotted on one line (which happens when plotting ordinal data).

So to sum it up:

Are there any packages designed for this purpose? What are keywords I need to look for?

Example data:

religion <- sample(1:7, 100, T)
# No overlap here, but I would like to see the group part come out more. 
plot(religion)  

Solution

  • After assigning coordinates to the center of each group, you can use wordcloud::textplot to avoid overlapping labels.

    # Data
    n <- 100
    k <- 7
    religion <- sample(1:k, n, TRUE)
    names(religion) <- outer(LETTERS, LETTERS, paste0)[1:n]
    # Position of the groups
    x <- runif(k)
    y <- runif(k)
    # Plot
    library(wordcloud)
    textplot(
      x[religion], y[religion], names(religion), 
      xlim=c(0,1), ylim=c(0,1), axes=FALSE, xlab="", ylab=""
    )
    

    wordcloud

    Alternatively, you can build a graph with a clique (or a tree) for each group, and use one of the many graph-layout algorithms in igraph.

    library(igraph)
    A <- outer( religion, religion, `==` )
    g <- graph.adjacency(A)
    plot(g)
    plot(minimum.spanning.tree(g))
    

    igraph