Consider the following data frame:
Country Provinces City Zone
1 Canada Newfondland St Johns A
2 Canada PEI Charlottetown B
3 Canada Nova Scotia Halifax C
4 Canada New Brunswick Fredericton D
5 Canada Quebec NA NA
6 Canada Quebec Quebec City NA
7 Canada Ontario Toronto A
8 Canada Ontario Ottawa B
9 Canada Manitoba Winnipeg C
10 Canada Saskatchewan Regina D
Would there be a clever way to convert it to a treeNetwork
compatible list (from the networkD3
package) in the form of:
CanadaPC <- list(name = "Canada",
children = list(
list(name = "Newfoundland",
children = list(list(name = "St. John's",
children = list(list(name = "A"))))),
list(name = "PEI",
children = list(list(name = "Charlottetown",
children = list(list(name = "B"))))),
list(name = "Nova Scotia",
children = list(list(name = "Halifax",
children = list(list(name = "C"))))),
list(name = "New Brunswick",
children = list(list(name = "Fredericton",
children = list(list(name = "D"))))),
list(name = "Quebec",
children = list(list(name = "Quebec City"))),
list(name = "Ontario",
children = list(list(name = "Toronto",
children = list(list(name = "A"))),
list(name = "Ottawa",
children = list(list(name = "B"))))),
list(name = "Manitoba",
children = list(list(name = "Winnipeg",
children = list(list(name = "C"))))),
list(name = "Saskatchewan",
children = list(list(name = "Regina",
children = list(list(name = "D")))))))
In order to plot a Reingold-Tilford tree that would have an arbitrary set of levels:
I have tried several sub-optimal routines including a messy combination of for
loops but I can't get this in the desired format.
Ideally, the function would scale in order to consider the first column as the root
(starting point) and the other columns would be different levels of children.
Edit
A similar question was asked on the same topic and @MrFlick provided an interesting recursive function. The original data frame had a fixed set of levels. I introduced NA
s to add another level of complexity (arbitrary set of levels) that is not adressed in @MrFlick initial solution.
Data
structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
"Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))
A better strategy for this scenario may be a recursive split()
Here's such an implementation. First, here's the sample data
dd<-structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
"Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))
note that' i've replaced the "NA"
strings with true NA
values. Now, the function
rsplit <- function(x) {
x <- x[!is.na(x[,1]),,drop=FALSE]
if(nrow(x)==0) return(NULL)
if(ncol(x)==1) return(unname(lapply(x[,1], function(v) list(name=v))))
s <- split(x[,-1, drop=FALSE], x[,1])
unname(mapply(function(v,n) {if(!is.null(v)) list(name=n, children=v) else list(name=n)}, lapply(s, rsplit), names(s), SIMPLIFY=FALSE))
}
Then we can run
rsplit(dd)
It seems to work with the test data. The only difference is the order in which the children nodes are arranged.