What would a Deneb (Vega-Lite) specification look like when drawing a scatter chart like this + rectangles around dots that should be grouped together?
The following code creates the scatter chart but, I am not sure how to create the rectangles.
{
"data": {
"values": [
{"Average": 6.01,"Groups": "Group1","Index": 6,"Date": "2023-09-16"},
{"Average": 13.21,"Groups": "Group1","Index": 8,"Date": "2023-11-04"},
{"Average": 3.63,"Groups": "Group1","Index": 8,"Date": "2023-12-23"},
{"Average": 5.91,"Groups": "Group1","Index": 7,"Date": "2024-02-10"},
{"Average": 6.19,"Groups": "Group1","Index": 10,"Date": "2024-03-30"},
{"Average": 3.97,"Groups": "Group1","Index": 10,"Date": "2024-05-18"},
{"Average": -1.52,"Groups": "Group1","Index": 10,"Date": "2024-07-06"},
{"Average": 1.28,"Groups": "Group1","Index": 10,"Date": "2024-08-24"},
{"Average": 3.39,"Groups": "Group1","Index": 10,"Date": "2024-10-15"},
{"Average": 1.21,"Groups": "Group1","Index": 23,"Date": "2024-12-03"},
{"Average": -0.13,"Groups": "Group1","Index": 15,"Date": "2025-01-21"},
{"Average": 4.49,"Groups": "Group1","Index": 16,"Date": "2025-03-11"},
{"Average": 34.97,"Groups": "Group2","Index": 6,"Date": "2023-09-16"},
{"Average": 25.14,"Groups": "Group2","Index": 8,"Date": "2023-11-04"},
{"Average": 27.59,"Groups": "Group2","Index": 8,"Date": "2023-12-23"},
{"Average": 27.2,"Groups": "Group2","Index": 7,"Date": "2024-02-10"},
{"Average": 23.91,"Groups": "Group2","Index": 10,"Date": "2024-03-30"},
{"Average": 26.29,"Groups": "Group2","Index": 10,"Date": "2024-05-18"},
{"Average": 26.43,"Groups": "Group2","Index": 10,"Date": "2024-07-06"},
{"Average": 25.21,"Groups": "Group2","Index": 10,"Date": "2024-08-24"},
{"Average": 25.51,"Groups": "Group2","Index": 10,"Date": "2024-10-15"},
{"Average": 38.46,"Groups": "Group2","Index": 23,"Date": "2024-12-03"},
{"Average": 46.44,"Groups": "Group2","Index": 15,"Date": "2025-01-21"},
{"Average": 56.63,"Groups": "Group2","Index": 16,"Date": "2025-03-11"},
{"Average": 17.39,"Groups": "Group3","Index": 6,"Date": "2023-09-16"},
{"Average": 9.15,"Groups": "Group3","Index": 8,"Date": "2023-11-04"},
{"Average": 7.46,"Groups": "Group3","Index": 8,"Date": "2023-12-23"},
{"Average": 6.62,"Groups": "Group3","Index": 7,"Date": "2024-02-10"},
{"Average": 4.15,"Groups": "Group3","Index": 10,"Date": "2024-03-30"},
{"Average": 5.52,"Groups": "Group3","Index": 10,"Date": "2024-05-18"},
{"Average": 6.08,"Groups": "Group3","Index": 10,"Date": "2024-07-06"},
{"Average": 5.54,"Groups": "Group3","Index": 10,"Date": "2024-08-24"},
{"Average": 5.77,"Groups": "Group3","Index": 10,"Date": "2024-10-15"},
{"Average": 5.23,"Groups": "Group3","Index": 23,"Date": "2024-12-03"},
{"Average": 4.83,"Groups": "Group3","Index": 15,"Date": "2025-01-21"},
{"Average": 9.56,"Groups": "Group3","Index": 16,"Date": "2025-03-11"
}
]
},
"layer": [
{
"mark": {
"type": "point"
},
"encoding": {
"y": {
"field": "Average",
"type": "quantitative",
"title": null
},
"x": {
"field": "Date",
"type": "ordinal",
"title": null
},
"color": {
"field": "Groups",
"type": "nominal"
},
"shape": {
"field": "Groups"
}
}
},
{
"mark": {
"type": "rule",
"stroke": "black",
"strokeWidth": 1
},
"encoding": {
"x": {
"field": "MinDate",
"type": "ordinal"
}
},
"transform": [
{
"aggregate": [
{
"op": "min",
"field": "Date",
"as": "MinDate"
}
],
"groupby": ["Index"]
},
{
"calculate": "datum.MinDate - 1",
"as": "MinDate"
}
]
},
{
"mark": {
"type": "rule",
"stroke": "black",
"strokeWidth": 1
},
"encoding": {
"x": {
"field": "MaxDate",
"type": "ordinal"
}
},
"transform": [
{
"aggregate": [
{
"op": "max",
"field": "Date",
"as": "MaxDate"
}
],
"groupby": ["Index"]
},
{
"calculate": "datum.MaxDate + 1",
"as": "MaxDate"
}
]
}
]
}
This code returns the following plo, and the Date in the x-axis does not look good and it has a NaN.
Is it possible to remove the x-axis from the vertical lines and only keep the ones from the dots?
Desired output created in R. The rectangle is automatically created using the geom_mark_rect
function from the ggforce
R package. It is based on the Index
column.
I can create this plot in Power BI using R visual, but it is slow.
require("ggplot2")
library(ggplot2)
require("ggforce")
library(ggforce)
gt_shape <- 1:length(unique(dataset$Groups))
gt_ymax <- ceiling(max(dataset$Average) + (0.15 * max(dataset$Average)))
gt_ymin <- floor(min(dataset$Average) - (0.15 * min(dataset$Average)))
ggplot(dataset, aes(x = Date, y = Average, group = as.character(Index))) +
geom_point(aes(shape = Groups, color = Groups, fill = Groups), stroke = 1, na.rm = TRUE) +
geom_mark_rect(show.legend = FALSE) +
theme_minimal() +
ylim(gt_ymin, gt_ymax) +
labs(
y = "",
x ="Date"
) +
scale_shape_manual(values = gt_shape) +
scale_colour_viridis_d(option = "turbo") +
scale_fill_viridis_d(option = "turbo", guide = "none") +
theme(
legend.position = "bottom",
text = element_text(colour = "black"),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
)
Thank you.
It looks like you're using the [Index] field in your dataset to group the rectangles, so you can use an aggregate
transform to consolidate the min and max values of your x and y encoding fields.
Here's my attempt:
We then can use the aggregate values to assign to the x
/y
(and x2
/y2
) encoding channels as needed. I've modified the intended layer as follows:
{
"transform": [
{
"aggregate": [
{"op": "min", "field": "Date", "as": "min_date"},
{"op": "max", "field": "Date", "as": "max_date"},
{"op": "min", "field": "Average", "as": "min_measure"},
{"op": "max", "field": "Average", "as": "max_measure"}
],
"groupby": ["Index"]
}
],
"mark": {
"type": "rect",
"stroke": "black",
"strokeWidth": 1,
"fill": "transparent",
"cornerRadius": 5,
"yOffset": 10,
"y2Offset": -10
},
"encoding": {
"x": {"field": "min_date", "bandPosition": 0.25},
"x2": {"field": "max_date", "bandPosition": 0.75},
"y": {"field": "min_measure"},
"y2": {"field": "max_measure"}
}
}
The x
and x2
encodings use opposite bandPosition
values, to shift them so that they aren't plotted in exactly the same position as the points. For groups with a single value, these looked like rule marks otherwise.
Similarly, the yOffset
and y2Offset
values ensure that the plotted y-positions don't overlap your point marks
.
Note that I have moved some of the encodings in your first layer to the top level so that both layers can inherit them, so you may want to check/review those, too.
Here's the working version in Vega Editor for you to explore.
EDIT: I realised after providing this answer, that it might just make sense to use offset
for all channels and avoid the bandPosition
approach altogether. The layer would now look like this:
{
"transform": [
{
"aggregate": [
{"op": "min", "field": "Date", "as": "min_date"},
{"op": "max", "field": "Date", "as": "max_date"},
{"op": "min", "field": "Average", "as": "min_measure"},
{"op": "max", "field": "Average", "as": "max_measure"}
],
"groupby": ["Index"]
}
],
"mark": {
"type": "rect",
"stroke": "black",
"strokeWidth": 1,
"fill": "transparent",
"cornerRadius": 5,
"xOffset": -10,
"x2Offset": 10,
"yOffset": 10,
"y2Offset": -10
},
"encoding": {
"x": {"field": "min_date"},
"x2": {"field": "max_date"},
"y": {"field": "min_measure"},
"y2": {"field": "max_measure"}
}
}