powerbipowerbi-desktopvega-litedeneb

Add rectangle around dots in scatter plot using Deneb in Power Bi


What would a Deneb (Vega-Lite) specification look like when drawing a scatter chart like this + rectangles around dots that should be grouped together?

The following code creates the scatter chart but, I am not sure how to create the rectangles.

{
  "data": {
    "values": [
      {"Average": 6.01,"Groups": "Group1","Index": 6,"Date": "2023-09-16"},
      {"Average": 13.21,"Groups": "Group1","Index": 8,"Date": "2023-11-04"},
      {"Average": 3.63,"Groups": "Group1","Index": 8,"Date": "2023-12-23"},
      {"Average": 5.91,"Groups": "Group1","Index": 7,"Date": "2024-02-10"},
      {"Average": 6.19,"Groups": "Group1","Index": 10,"Date": "2024-03-30"},
      {"Average": 3.97,"Groups": "Group1","Index": 10,"Date": "2024-05-18"},
      {"Average": -1.52,"Groups": "Group1","Index": 10,"Date": "2024-07-06"},
      {"Average": 1.28,"Groups": "Group1","Index": 10,"Date": "2024-08-24"},
      {"Average": 3.39,"Groups": "Group1","Index": 10,"Date": "2024-10-15"},
      {"Average": 1.21,"Groups": "Group1","Index": 23,"Date": "2024-12-03"},
      {"Average": -0.13,"Groups": "Group1","Index": 15,"Date": "2025-01-21"},
      {"Average": 4.49,"Groups": "Group1","Index": 16,"Date": "2025-03-11"},
      {"Average": 34.97,"Groups": "Group2","Index": 6,"Date": "2023-09-16"},
      {"Average": 25.14,"Groups": "Group2","Index": 8,"Date": "2023-11-04"},
      {"Average": 27.59,"Groups": "Group2","Index": 8,"Date": "2023-12-23"},
      {"Average": 27.2,"Groups": "Group2","Index": 7,"Date": "2024-02-10"},
      {"Average": 23.91,"Groups": "Group2","Index": 10,"Date": "2024-03-30"},
      {"Average": 26.29,"Groups": "Group2","Index": 10,"Date": "2024-05-18"},
      {"Average": 26.43,"Groups": "Group2","Index": 10,"Date": "2024-07-06"},
      {"Average": 25.21,"Groups": "Group2","Index": 10,"Date": "2024-08-24"},
      {"Average": 25.51,"Groups": "Group2","Index": 10,"Date": "2024-10-15"},
      {"Average": 38.46,"Groups": "Group2","Index": 23,"Date": "2024-12-03"},
      {"Average": 46.44,"Groups": "Group2","Index": 15,"Date": "2025-01-21"},
      {"Average": 56.63,"Groups": "Group2","Index": 16,"Date": "2025-03-11"},
      {"Average": 17.39,"Groups": "Group3","Index": 6,"Date": "2023-09-16"},
      {"Average": 9.15,"Groups": "Group3","Index": 8,"Date": "2023-11-04"},
      {"Average": 7.46,"Groups": "Group3","Index": 8,"Date": "2023-12-23"},
      {"Average": 6.62,"Groups": "Group3","Index": 7,"Date": "2024-02-10"},
      {"Average": 4.15,"Groups": "Group3","Index": 10,"Date": "2024-03-30"},
      {"Average": 5.52,"Groups": "Group3","Index": 10,"Date": "2024-05-18"},
      {"Average": 6.08,"Groups": "Group3","Index": 10,"Date": "2024-07-06"},
      {"Average": 5.54,"Groups": "Group3","Index": 10,"Date": "2024-08-24"},
      {"Average": 5.77,"Groups": "Group3","Index": 10,"Date": "2024-10-15"},
      {"Average": 5.23,"Groups": "Group3","Index": 23,"Date": "2024-12-03"},
      {"Average": 4.83,"Groups": "Group3","Index": 15,"Date": "2025-01-21"},
      {"Average": 9.56,"Groups": "Group3","Index": 16,"Date": "2025-03-11"
      }
    ]
  },
  "layer": [
    {
      "mark": {
        "type": "point"
      },
      "encoding": {
        "y": {
          "field": "Average",
          "type": "quantitative",
          "title": null
        },
        "x": {
          "field": "Date",
          "type": "ordinal",
          "title": null
        },
        "color": {
          "field": "Groups",
          "type": "nominal"
        },
        "shape": {
          "field": "Groups"
        }
      }
    },
    {
      "mark": {
        "type": "rule",
        "stroke": "black",
        "strokeWidth": 1
      },
      "encoding": {
        "x": {
          "field": "MinDate",
          "type": "ordinal"
        }
      },
      "transform": [
        {
          "aggregate": [
            {
              "op": "min",
              "field": "Date",
              "as": "MinDate"
            }
          ],
          "groupby": ["Index"]
        },
        {
          "calculate": "datum.MinDate - 1",
          "as": "MinDate"
        }
      ]
    },
    {
      "mark": {
        "type": "rule",
        "stroke": "black",
        "strokeWidth": 1
      },
      "encoding": {
        "x": {
          "field": "MaxDate",
          "type": "ordinal"
        }
      },
      "transform": [
        {
          "aggregate": [
            {
              "op": "max",
              "field": "Date",
              "as": "MaxDate"
            }
          ],
          "groupby": ["Index"]
        },
        {
          "calculate": "datum.MaxDate + 1",
          "as": "MaxDate"
        }
      ]
    }
  ]
}

This code returns the following plo, and the Date in the x-axis does not look good and it has a NaN.

Is it possible to remove the x-axis from the vertical lines and only keep the ones from the dots?

enter image description here

Desired output created in R. The rectangle is automatically created using the geom_mark_rect function from the ggforce R package. It is based on the Index column.

enter image description here

I can create this plot in Power BI using R visual, but it is slow.

require("ggplot2")
library(ggplot2)

require("ggforce")
library(ggforce)

gt_shape <- 1:length(unique(dataset$Groups))
gt_ymax  <- ceiling(max(dataset$Average) + (0.15 * max(dataset$Average)))
gt_ymin  <- floor(min(dataset$Average) - (0.15 * min(dataset$Average)))

ggplot(dataset, aes(x = Date, y = Average, group = as.character(Index))) +
    geom_point(aes(shape = Groups, color = Groups, fill = Groups), stroke = 1, na.rm = TRUE) +
    geom_mark_rect(show.legend = FALSE) +
    theme_minimal() +
    ylim(gt_ymin, gt_ymax) +
    labs(
        y = "",
        x ="Date"
    ) +
    scale_shape_manual(values = gt_shape) +
    scale_colour_viridis_d(option = "turbo") +
    scale_fill_viridis_d(option = "turbo", guide = "none") +
    theme(
        legend.position = "bottom",
        text = element_text(colour = "black"),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
    )

Thank you.


Solution

  • It looks like you're using the [Index] field in your dataset to group the rectangles, so you can use an aggregate transform to consolidate the min and max values of your x and y encoding fields.

    Here's my attempt:

    Replication of the original post request using Vega-Lite

    We then can use the aggregate values to assign to the x/y (and x2/y2) encoding channels as needed. I've modified the intended layer as follows:

        {
          "transform": [
            {
              "aggregate": [
                {"op": "min", "field": "Date", "as": "min_date"},
                {"op": "max", "field": "Date", "as": "max_date"},
                {"op": "min", "field": "Average", "as": "min_measure"},
                {"op": "max", "field": "Average", "as": "max_measure"}
              ],
              "groupby": ["Index"]
            }
          ],
          "mark": {
            "type": "rect",
            "stroke": "black",
            "strokeWidth": 1,
            "fill": "transparent",
            "cornerRadius": 5,
            "yOffset": 10,
            "y2Offset": -10
          },
          "encoding": {
            "x": {"field": "min_date", "bandPosition": 0.25},
            "x2": {"field": "max_date", "bandPosition": 0.75},
            "y": {"field": "min_measure"},
            "y2": {"field": "max_measure"}
          }
        }
    

    The x and x2 encodings use opposite bandPosition values, to shift them so that they aren't plotted in exactly the same position as the points. For groups with a single value, these looked like rule marks otherwise.

    Similarly, the yOffset and y2Offset values ensure that the plotted y-positions don't overlap your point marks.

    Note that I have moved some of the encodings in your first layer to the top level so that both layers can inherit them, so you may want to check/review those, too.

    Here's the working version in Vega Editor for you to explore.


    EDIT: I realised after providing this answer, that it might just make sense to use offset for all channels and avoid the bandPosition approach altogether. The layer would now look like this:

        {
          "transform": [
            {
              "aggregate": [
                {"op": "min", "field": "Date", "as": "min_date"},
                {"op": "max", "field": "Date", "as": "max_date"},
                {"op": "min", "field": "Average", "as": "min_measure"},
                {"op": "max", "field": "Average", "as": "max_measure"}
              ],
              "groupby": ["Index"]
            }
          ],
          "mark": {
            "type": "rect",
            "stroke": "black",
            "strokeWidth": 1,
            "fill": "transparent",
            "cornerRadius": 5,
            "xOffset": -10,
            "x2Offset": 10,
            "yOffset": 10,
            "y2Offset": -10
          },
          "encoding": {
            "x": {"field": "min_date"},
            "x2": {"field": "max_date"},
            "y": {"field": "min_measure"},
            "y2": {"field": "max_measure"}
          }
        }
    

    Updated spec in the editor