How would one add a new shape, with both outline color and fill color, to ggplot2's shape palette?

This question is motivated by a review conducted by Franconeri et al. 2021 Paper available here. On page 122, there is a discussion of "perceptual shape space," i.e., what makes for readily distinguishable shapes? It is concluded that shapes that differ in three dimensions--openness, spikiness, and intersectionality--are more distinguishable than shapes that lack variance in these three dimensions.

So, I was inspired to try adding some of the shapes they suggest to ggplot2 for use in creating a "custom shape palette". For example, in Figure 6:

Three shapes stand out to me as being particularly exciting: double intersecting crescents, "square pretzels," and "fat crosses". Donuts (hollow circles) are also intriguing.

What would it take to build one of these shapes and incorporate it fully into ggplot's machinery so that "it just works" whenever a user says "shape = XXX" in a ggplot call? Ideally, any shape added would have separate stroke color and interior fill color aesthetics.

I tried to have ChatGPT help me do this, but I was unsuccessful--Basically, it wanted me to create a custom shape-drawing function for each shape of the basic form:

.custom_shapes$plus <- function(data, col, fill, size, stroke) {
  #UNPACK DATA
  x <- data$x
  y <- data$y
  len <- size * 0.0035   #CONTROLS LENGTH OF ARMS
  bar <- size * 0.001  #CONTROLS 1/2 WIDTH OF EACH ARM.
  stroke_adj = stroke * 0.5
  
  #DRAW X AND Y COORDINATES FOR EACH VERTEX OF THE POLYGON.
  xs <- c(
    x - len, x - len, x - bar, x - bar,
    x + bar, x + bar, x + len, x + len,
    x + len, x + bar, x + bar, x + bar,
    x - bar, x - bar, x - bar, x - len
  )
  
  ys <- c(
    y + bar, y - bar, y - bar, y - len,
    y - len, y - bar, y - bar, y + bar,
    y + bar, y + bar, y + len, y + len,
    y + len, y + bar, y + bar, y + bar
  )
  
  #DRAW A POLYGON TO CONNECT THE VERTICES, THEN GIVE THEM A COLOR, FILL, AND STROKE WIDTH.
  grid::polygonGrob(
    x = unit(xs, "npc"),
    y = unit(ys, "npc"),
    gp = grid::gpar(
      col = col,
      fill = fill,
      lwd = stroke_adj * .pt
    ),
    id = rep(1, length(xs))
  )
}

Then, it wanted me to make a custom geom, geom_custom_point, that would expect shape strings corresponding to the custom shapes, e.g., shape = "plus", and then match those up with the custom shape-drawing function.

However, this didn't work for me. ggplot2 seems determined to try to map any shape string to its existing code of shapes, and I don't understand how to "override" that. Nor am I confident its fundamental approach is sound.

I should add that it looks like this package adds shapes that would then be compatible with ggplot2, but it doesn't provide infrastructure to add one's own shapes. Still, somewhere in its code might be the answer to how to do this.

Solution

It is possible to roll your own ggplot extension that takes arbitrary polygons and plots them as points. The solution sketched below allows you to use the points in your image as follows:

ggplot(mtcars, aes(wt, mpg, fill = drat)) + 
  geom_point2(aes(shape = factor(gear)), shapes = shapes, size = 3) + 
  scale_shape_manual(values = c("crescent", "donut", "circle")) +
  scale_fill_distiller(palette = 18) +
  theme_minimal(16)

ggplot(iris, aes(Petal.Width, Petal.Length, fill = Species)) + 
  geom_point2(aes(shape = Species), shapes = shapes, size = 2.5) + 
  scale_shape_manual(values = c("double_cres", "waffle", "fatcross")) +
  theme_minimal(16)

Or even have a stab at recreating your original image like so:

data.frame(x = c(0.1, 0.3, 0.45, 0.9, 0.15, 0.25, 0.6, 0.85,
                 0.2, 0.28, 0.6, 0.85, 0.45, 0.5, 0.65, 0.85,
                 0.4, 0.45, 0.65, 0.7, 0.15, 0.25, 0.78, 0.9),
           y = c(0.4, 0.8, 0.1, 1, 0.75, 0.4, 0.47, 0.7,
                 0.95, 0.2, 0.1, 0.5, 0.3, 0.75, 0.9, 0.05,
                 0.5, 0.95, 0.65, 0.25, 0.1, 0.6, 0.85, 0.25),
           shape = rep(c("fatcross", "waffle", "crescent",
                         "circle", "donut", "double_cres"), each = 4)) |>
  ggplot(aes(x, y)) +
  geom_point2(aes(shape = shape), shapes = shapes, fill = "black", size = 10) +
  scale_shape_identity() +
  coord_cartesian(xlim = c(-0.2, 1.2), y = c(-0.2, 1.2)) +
  theme_void()

Writing geom_point2

Note that to achieve this, we have written a new geom_point2 function, which has a parameter called shapes. This is where we pass in the data representing the shapes we want to use as points.

For this setup, the object passed to shapes must be in a very specific format. It has to be a named list of data frames, containing one data frame for each type of shape we wish to draw.

The data frame for each shape must have an x column, a y column and a column called piece. This third column labels each distinct polygon within our shape. By default, one piece that is enclosed by another piece will become a "hole" in the enclosing shape. This is the format expected by grid::pathGrob, which we will use to do the drawing for us and allows us to create shapes such as the "square pretzel" (I think it looks more like a waffle!)

We specify which of the shapes we want to use in our plot using the values = argument of scale_shape_manual. All we need to do here is supply the names of the shapes we want to use as they appear in the shapes object. An alternative is to have a column with the name of the shape you want to plot included in your data and use scale_shape_identity as in the third example above.

The definition of geom_point2 is straightforward:

geom_point2 <- function (mapping = NULL, data = NULL, stat = "identity", 
                         position = "identity", shapes,
                         ..., na.rm = FALSE, show.legend = NA, 
                         inherit.aes = TRUE) 
{
  ggplot2::layer(data = data, mapping = mapping, stat = stat, geom = GeomPoint2, 
        position = position, show.legend = show.legend, 
        inherit.aes = inherit.aes, 
        params = rlang::list2(na.rm = na.rm, shapes = shapes, ...))
}

Writing the Geom object

The difficult part is that geom_point2 works by invoking a GeomPoint2 ggproto object to draw the layer, and this is where the real work is done. It is inside GeomPoint2 where the actual polygons for the plot and the legend are drawn.

We can define it as follows:

GeomPoint2 <- ggplot2::ggproto("point2", ggplot2::GeomPoint,
  draw_panel = function(self, data, panel_params, coord, shapes, na.rm = FALSE){

    if (is.character(data$shape)) {
      data$shape <- names(shapes)[match(data$shape, names(shapes))]
    }
    coords <- coord$transform(data, panel_params)
    stroke_size <- coords$stroke
    stroke_size[is.na(stroke_size)] <- 0
    
    if(!is.character(data$shape)) {
    return(grid::pointsGrob(coords$x, coords$y, 
                            pch = coords$shape, 
             gp = grid::gpar(
               col = ggplot2::alpha(coords$colour, coords$alpha), 
               fill = ggplot2::fill_alpha(coords$fill, coords$alpha), 
               fontsize = coords$size * .pt + stroke_size * .stroke/2, 
               lwd = coords$stroke * .stroke/2)))
    }
    g <- Map(function(x, y, shape, size, color, fill, strokewidth) {
      dat <- shapes[[shape]]
      xvals <- grid::unit(x, "npc") + grid::unit(dat$x * size * 2, "points")
      yvals <- grid::unit(y, "npc") + grid::unit(dat$y * size * 2, "points")
      grid::pathGrob(xvals, yvals, dat$piece, 
                     gp = grid::gpar(col = color, fill = fill, 
                                     lwd = strokewidth))
    }, coords$x, coords$y, coords$shape, 
    coords$size * ggplot2::.pt + stroke_size * ggplot2::.stroke/2, 
    ggplot2::alpha(coords$colour, coords$alpha), 
    ggplot2::fill_alpha(coords$fill, coords$alpha), 
    coords$stroke * .stroke/2)
    do.call(grid::grobTree, c(g, list(name = "geom_point2")))
  },
  
  draw_key = function (data, params, size) {
    if (is.null(data$shape)) {
        data$shape <- 19
    }
    
    stroke_size <- data$stroke %||% 0.5
    stroke_size[is.na(stroke_size)] <- 0
    
    if(!is.character(data$shape)) {

    return(grid::pointsGrob(0.5, 0.5, pch = data$shape, 
                     gp = grid::gpar(
        col = ggplot2::alpha(data$colour, data$alpha), 
        fill = ggplot2::fill_alpha(data$fill, data$alpha), 
        fontsize = (if(is.null(data$size)) 1.5 else data$size) * 
        ggplot2::.pt + stroke_size * ggplot2::.stroke/2, 
        lwd = stroke_size * ggplot2::.stroke/2)))
    }

    g <- Map(function(shape, size, color, fill, strokewidth) {
      dat <- params$shapes[[shape]]
      xvals <- grid::unit(0.5, "npc") + grid::unit(dat$x * size, "points")
      yvals <- grid::unit(0.5, "npc") + grid::unit(dat$y * size, "points")
      grid::pathGrob(xvals, yvals, dat$piece, 
                     gp = grid::gpar(col = color, fill = fill, 
                                     lwd = strokewidth))
    }, data$shape, 
    data$size * ggplot2::.pt + stroke_size * ggplot2::.stroke/2, 
    ggplot2::alpha(data$colour, data$alpha), 
    ggplot2::fill_alpha(data$fill, data$alpha), 
    data$stroke * .stroke/2)
    do.call(grid::grobTree, c(g, list(name = "geom_point2")))

})

Defining the shapes

That's all the code you need to draw your points in ggplot. Now all we have to do is define some shapes in the correct format. They should all have co-ordinates in the range -1 < x < 1 and -1 < y < 1 and be centered on x = 0, y = 0.

The shapes in your question are mostly fairly simple, so I just recreated them manually using the code below.

waffle <- data.frame(x = c(-5, -5, 5, 5, -3, -1, -1, -3, -3, -1, -1, 
                           -3, 1, 3, 3, 1, 1, 3, 3, 1) / 10,
                     y = c(-5, 5, 5, -5, -3, -3, -1, -1, 1, 1, 3, 3, 
                           -3, -3, -1, -1, 1, 1, 3, 3) / 10,
                     piece = rep(1:5, each = 4))

fatcross <- data.frame(x = c(-2, -1, -1, 1, 1, 2, 2, 1, 1, -1, -1, -2)/6,
                       y = c(1, 1, 2, 2, 1, 1, -1, -1, -2, -2, -1, -1)/6,
                       piece = 1)

theta <- seq(0, 2*pi, length = 100)

donut <- data.frame(x = c(0.4 * cos(theta), 0.2 * cos(theta)[100:1]),
                    y = c(0.4 * sin(theta), 0.2 * sin(theta)[100:1]),
                    piece = rep(1:2, each = 100))

circ <- donut[1:100,]

cres <- data.frame(x = c(0.4 * cos(theta[16:84]), 
                         0.32 * cos(theta)[75:26] + 0.24),
                   y = c(0.4 * sin(theta[16:84]), 
                         0.32 * sin(theta[75:26])),
                   piece = 1)

The double crescent is a little more fiddly

trans <- function(dat, theta = 0, shift_x = 0, shift_y = 0) {
  r <- sqrt(dat$x^2 + dat$y^2)
  ang <- (atan2(dat$y, dat$x) + 2*pi) %% (2*pi)
  theta <- ang + theta
  data.frame(x = r * cos(theta) + shift_x, y = r * sin(theta) + shift_y, 
             piece = dat$piece)
}

cres_l <- trans(cres, -0.6, -0.24, -0.24)
cres_r <- trans(cres, 2.6, 0.24, 0.24)

double_cres <- data.frame(x = c(cres_l$x[8:110], cres_r$x[8:110])/1.8,
                          y = c(cres_l$y[8:110], cres_r$y[8:110])/1.8,
                          piece = 1)

Now we have our shapes we can put them in a list ready for passing to geom_point2:

shapes <- list(waffle = waffle, fatcross = fatcross, double_cres = double_cres,
               circle = circ, donut = donut, crescent = cres)

The usual caveats apply - this is not production-ready code. It doesn't do any checking of the names in your shapes data, nor does it make sure that shapes is in the required format.

A production-ready extension should check the shape validity and possibly normalize the co-ordinates as well as enforcing the correct input format or allowing conversion from other formats.