I need some help with an error message returning when using the chordDiagram()
function from the circlize
package.
I am working with fisheries landings. Fishing vessels start their trip in one port (homeport PORT_DE
), and land their catch (scallops in this case) in another port (landing port PORT_LA
). I am trying to draw a chord diagram using circlize
package to visualise the flow of landings between ports. I have 161 unique ports and the port names are stored as character
strings.
Before calling the chordDiagram()
function to draw the chord diagram, I store the relevant columns in a dummy object (m
).
# Store relevant column
m <- data.frame(PORT_DE = VMS_by_trips$PORT_DE_Label,
PORT_LA = VMS_by_trips$PORT_LA_Label,
SCALLOP_W = VMS_by_trips$Trip_SCALLOP_W)
head(m)
# PORT_DE PORT_LA SCALLOP_W
# 1 Arbroath Arbroath 2.147143
# 2 Eyemouth Aberdeen 8.791970
# 3 Buckie Aberdeen 2.025833
# 4 Montrose Aberdeen 8.268540
# 5 Aberdeen Aberdeen 1.358286
# 6 Peterhead Aberdeen 0.797500
I then create an adjacency matrix using dcast()
and rename rows.
require(reshape2)
m <- as.matrix(dcast(m, PORT_DE ~ PORT_LA, value.var = "SCALLOP_W", fun.aggregate = sum))
dim(m) #adjecency matrix represents port pairs
#[1] 153 138
row.names(m) <- m[,1]
m <- m[,2:dim(m)[2]]
class(m) <- "numeric"
Finally, I call the plot function chordDiagram()
.
library(circlize)
chordDiagram(m)
Unfortunately, this results in an error message.
Error in `[.data.frame`(df, c(1, 2, 5)) : undefined columns selected
If I replace the row and column names with numbers, the function runs, and the correct plot is returned.
row.names(m) <- 1:153
colnames(m) <- 1:137
Any ideas how to run the function with the actual port names?
I have already tried to remove special characters, replace " "
spaces with "_"
underscores, keep a smaller number of characters, keep only a few port pairs. Unfortunately the same error keeps appearing. Any help appreciated.
Please note that since posting this question, I have managed to create the visualisation needed. Here is a link to another related question, which also includes the code to adjust various settings of a chord diagram.
Adjust highlight.sector() width and placement - Chord diagram (circlize package) in R
With thanks to @ZuguangGu, the reason for the error message was the NAs
in my column names. If you remove them first, then the chord diagram plots just fine. Following the same notation, please see below.
#create adjacency matrix
m <- data.frame(PORT_DE = VMS_by_trips$PORT_DE_Label,
PORT_LA = VMS_by_trips$PORT_LA_Label,
SCALLOP_W = VMS_by_trips$Trip_SCALLOP_W)
#Check for NA values in your dataset
which(is.na(m[, 1]))
which(is.na(m[, 2]))
#Remove the rows which have NA values, there will not be errors any more.
df = m
df = df[!(is.na(df[[1]]) | is.na(df[[2]])), ]
require(reshape2)
m <- dcast(df, PORT_DE ~ PORT_LA, value.var = "SCALLOP_W", fun.aggregate = sum)
row.names(m) <- m[,1]
m <- as.matrix(m[, -1])
# remove self-links
m2 = m
cn = intersect(rownames(m2), colnames(m2))
for(i in seq_along(cn)) {
m2[cn[i], cn[i]] = 0
}
# Export 3 versions of the chord diagram in a PDF
library(circlize)
pdf("test.pdf")
# Use all data
chordDiagram(m)
title("using all data")
#remove self-links
chordDiagram(m2)
title("remove self-links")
#here reduce = 0.01 means to remove ports which have capacity less than 0.01 of capacity of all ports.
chordDiagram(m2, reduce = 0.01)
title("remove self-links and small sectors")
dev.off()