rsankey-diagramriverplot

Riverplot package in R - error in edges column names


I'm trying to use the Riverplot package in R to make a Sankey diagram, but I'm getting an error message about the column names in the edges frame.

I'm installing the readr and riverplot packages and then doing this:

> my_data <- read_csv("~/RProjects/my_data.csv")
>
> edges = rep(my_data, col.names = c("N1","N2","Value"))
>
> nodes = data.frame(ID = unique(c(edges$N1, edges$N2)))
>
> river <- makeRiver(nodes, edges)
>
> return(plot(river))

But on the penultimate command setting up the riverplot object "river" I get this error:

Error in checkedges(edges, nodes$ID)
  edges must have the columns N1, N2 and Value

The original CSV already has these column headings. I'm not sure what I'm doing wrong. I'm a complete newbie to R, so please be patient if I'm missing the obvious!

dput on my CSV file looks like this:

structure(list(N1 = c("Cambridge", "Cambridge", "Cambridge", 
"Cambridge", "Cambridge", "South Cambs", "South Cambs", "South Cambs", 
"South Cambs", "South Cambs", "Rest of East", "Rest of East", 
"Rest of East", "Rest of East", "Rest of East", "Rest of UK", 
"Rest of UK", "Rest of UK", "Rest of UK", "Rest of UK", "Abroad", 
"Abroad", "Abroad", "Abroad", "Abroad"), N2 = c("Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad", "Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad", "Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad", "Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad", "Cambridge", 
"South Cambs", "Rest of East", "Rest of UK", "Abroad"), Value = c(106068L, 
1616L, 2779L, 13500L, 5670L, 2593L, 138263L, 2975L, 4742L, 1641L, 
2555L, 3433L, 0L, 0L, 0L, 6981L, 3802L, 0L, 0L, 0L, 5670L, 1641L, 
0L, 0L, 0L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-25L), .Names = c("N1", "N2", "Value"), spec = structure(list(
    cols = structure(list(N1 = structure(list(), class = c("collector_character", 
    "collector")), N2 = structure(list(), class = c("collector_character", 
    "collector")), Value = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("N1", "N2", "Value")), default = structure(list(), class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"))

str(edges) gives:

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   25 obs. of  3 variables:
 $ N1   : chr  "Cambridge" "Cambridge" "Cambridge" "Cambridge" ...
 $ N2   : chr  "Cambridge" "South Cambs" "Rest of East" "Rest of UK" ...
 $ Value: int  106068 1616 2779 13500 5670 2593 138263 2975 4742 1641 ...
 - attr(*, "spec")=List of 2
  ..$ cols   :List of 3
  .. ..$ N1   : list()
  .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
  .. ..$ N2   : list()
  .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
  .. ..$ Value: list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  ..$ default: list()
  .. ..- attr(*, "class")= chr  "collector_guess" "collector"
  ..- attr(*, "class")= chr "col_spec"

Solution

  • I believe the problem is that you left out the required ID column, and thus confused the command.

    edges = rep(my_data, col.names = c("N1","N2","Value"))
    edges    <- data.frame(edges)
    edges$ID <- 1:25
    
    nodes = data.frame(ID = unique(c(edges$N1, edges$N2)))
    
    river <- makeRiver(nodes, edges) 
    

    The code above eliminates the error message. Note that it raises an unrelated warning, regarding repeated edge information.

    Warning message:
    In checkedges(edges, nodes$ID) :
      duplicated edge information, removing 10 edges