I have a data matrix as a .csv (output from sourmash). The matrix looks something like this: matrix
I also have metadata that corresponds with that matrix. It groups the samples represented in the matrix several different ways. It looks something like this: metadata
I'd like to plot an MDS while coloring certain points based on their metadata value. So far I've been able to upload the matrix and plot the points, but am lost on how to "link" the metadata values to the matrix so that I can color the matrix values by color when they are plotted. I know it's probably a simple fix but would appreciate any help! This is what I have so far:
#import matrix and metadata
sm_matrix <- read.csv("path to .csv", header = TRUE, sep = ",")
md <- read.csv("path to .csv", header = TRUE, sep = ",")
#transform for plotting
sm_matrix <- as.matrix(sm_matrix)
#plot
mds <- sm_test %>%
dist() %>%
cmdscale() %>%
as_tibble()
colnames(mds) <- c("dim.1", "dim.2")
I've also tried this to plot
ggscatter(mds, x = "dim.1", y = "dim.2",
color = md$Location,
palette = "jco",
size = 1,
ellipse = TRUE,
ellipse.type = "convex",
repel = TRUE)
but I get this error:
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (92): colour
Run `rlang::last_error()` to see where the error occurred.
Warning message:
In if (color %in% names(data) & is.null(add.params$color)) add.params$color <- color :
the condition has length > 1 and only the first element will be used
Thank you!
Sam
Here an approach that works. A warning of ggscatter
remains, but a warning is not an error and it may be an issue of the package.
First, the data are created directly in the script. This is the preferred way, because otherwise people have to invest additional work to type the data from the screenshots. In addition, it is also good style to mention the used packages explicitly.
The script itself uses two tricks. First, names are added after calling as_tibble
with setNames
. The other trick is to convert the character variable Location
into a numeric by converting it first to a factor and then a numeric. Furthermore, I increased size
to 4, to make the result better visible.
library("dplyr")
library("ggpubr")
sm_matrix <- matrix(c(1, 0.2, 0.7, 0.2, 1, 0.2, 0.3, 0.2, 1), nrow=3)
rownames(sm_matrix ) <- colnames(sm_matrix) <- c("sample_1", "sample_2", "sample_3")
md <- as.data.frame(matrix(c("sample1", "sample2", "sample3", LETTERS[1:9]), nrow=3))
colnames(md) <- c("SampleID", "Diet", "Location", "Size")
mds <- sm_matrix %>%
dist() %>%
cmdscale() %>%
as_tibble() %>%
setNames(c("dim.1", "dim.2"))
plot(mds)
ggscatter(mds, x = "dim.1", y = "dim.2",
color = as.numeric(as.factor(md$Location)),
palette = "jco",
size = 4,
ellipse = TRUE,
ellipse.type = "convex",
repel = TRUE)