Can I render a point-list over an area larger than the point itself?

I am rendering what is effectively a fancy point cloud. Each point is to occupy multiple pixels on the screen (depending on the depth) and has a bunch of data required for shading. With primitive.topology = 'point-list' I can have each point draw over a single pixel but I'd like my points to render as larger dots. I could convert the points to a triangle list on CPU but it means a huge duplication of shading data, which only needs to be processed once per point. Is it possible to have a vertex shader than ingests a point-list and emits multiple fragment shader calls? An example approach would be to have the vertex shader convert points into triangles (thus 3x-ing the number of vertices, minus culled points). Looking at the documentation of rasterize (point 4.) it seems like no, but such functionality seems so basic and useful I can't really imagine it being entirely impossible. Is there a standard workaround?

Solution

You can do this easily with instancing.

First let's make a sample that draws some points:

const { mat4 } = wgpuMatrix;

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();

  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');

  const presentationFormat = navigator.gpu.getPreferredCanvasFormat(adapter);
  context.configure({
    device,
    format: presentationFormat,
  });

  const shaderModule = device.createShaderModule({code: `
  struct Uniforms {
    mat: mat4x4f,
  };
  @group(0) @binding(0) var<uniform> uniforms: Uniforms;

  struct MyVSInput {
      @location(0) position: vec4f,
  };

  struct MyVSOutput {
    @builtin(position) position: vec4f,
  };

  @vertex
  fn myVSMain(v: MyVSInput) -> MyVSOutput {
    var vsOut: MyVSOutput;
    vsOut.position = uniforms.mat * v.position;
    return vsOut;
  }

  @fragment
  fn myFSMain(v: MyVSOutput) -> @location(0) vec4f {
    return vec4f(1, 1, 0, 1);
  }
  `});
  const r = (min, max) => Math.random() * (max - min) + min;

  const numPoints = 50;
  const positions = [];
  for (let i = 0; i < numPoints; ++i) {
    positions.push(r(-1, 1), r(-1, 1));
  }
  const positionData = new Float32Array(positions);
  const positionSize = 8; // 2 f32s per point

  const positionBuffer = device.createBuffer({
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
    size: positionData.byteLength,
  });
  device.queue.writeBuffer(positionBuffer, 0, positionData);

  const pipeline = device.createRenderPipeline({
    label: 'points',
    layout: 'auto',
    vertex: {
      module: shaderModule,
      entryPoint: 'myVSMain',
      buffers: [
        // position
        {
          arrayStride: positionSize,
          attributes: [
            {shaderLocation: 0, offset: 0, format: 'float32x2' },
          ],
        },
      ],
    },
    fragment: {
      module: shaderModule,
      entryPoint: 'myFSMain',
      targets: [
        {format: presentationFormat},
      ],
    },
    primitive: {
      topology: 'point-list',
    },
  });

  const uniformBufferSize = (16) * 4;      // 1 mat4x4f
  const uniformBuffer = device.createBuffer({
    size: uniformBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });
  const uniformValues = new Float32Array(uniformBufferSize / 4);
  const mat = uniformValues.subarray(0, 16);

  const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: uniformBuffer } },
    ],
  });

  const renderPassDescriptor = {
    colorAttachments: [
      {
        // view: undefined, // Assigned later
        // resolveTarget: undefined, // Assigned Later
        clearValue: [0, 0, 0, 1],
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
  };

  const colorTexture = context.getCurrentTexture();
  renderPassDescriptor.colorAttachments[0].view = colorTexture.createView();

  // update uniforms
  mat4.identity(mat);
  device.queue.writeBuffer(uniformBuffer, 0, uniformValues);

  const commandEncoder = device.createCommandEncoder();
  const passEncoder = commandEncoder.beginRenderPass(renderPassDescriptor);
  passEncoder.setPipeline(pipeline);
  passEncoder.setVertexBuffer(0, positionBuffer);
  passEncoder.setBindGroup(0, bindGroup);
  passEncoder.draw(numPoints);
  passEncoder.end();

  device.queue.submit([commandEncoder.finish()]);
}

main();

html, body {
  background-color: #333;
}

<script src="https://wgpu-matrix.org/dist/2.x/wgpu-matrix.js"></script>;
<canvas></canvas>

Now, to make it draw larger we can just put some quad points in the vertex shader and draw with instancing. We'll pass in a size for each point in "pixels".

First update the shader:

We make an array of 6 points for a unit quad
We add resolution to the uniforms so we can expand the quad in screen space
We add size to our MyVSInput struct to get size data from an attribute
We make our vertex shader take a new input vertexIndex: u32 which gets its value from the @builtin(vertex_index). We can use this to get the quad vertices.
We get a quad position, center them (the - 0.5 part), then convert to pixels in clip space (the * size * 2.0 / uniforms.resolution) part
And then finally add that to our vertex shader's position output.

  struct Uniforms {
    mat: mat4x4f,
    resolution: vec2f,
  };
  @group(0) @binding(0) var<uniform> uniforms: Uniforms;

  struct MyVSInput {
      @location(0) position: vec4f,
      @location(1) size: f32,
  };

  struct MyVSOutput {
    @builtin(position) position: vec4f,
  };

  @vertex
  fn myVSMain(v: MyVSInput, @builtin(vertex_index) vertexIndex: u32) -> MyVSOutput {
    let quadPos = array(
      vec2f(0, 0),
      vec2f(1, 0),
      vec2f(0, 1),
      vec2f(0, 1),
      vec2f(1, 0),
      vec2f(1, 1),
    );
    var vsOut: MyVSOutput;

    let pos = (quadPos[vertexIndex] - 0.5) * v.size * 2.0 / uniforms.resolution;

    vsOut.position = uniforms.mat * v.position + vec4f(pos, 0, 0);
    return vsOut;
  }

Back in JS, we need to create a buffer with sizes. That's practically the same as how we made positions so no need to spell it out.

We need to update the pipeline. Both positions and sizes we only want to step once per instance so we set stepMode: 'instance'

        // position
        {
          arrayStride: positionSize,
          stepMode: 'instance',
          attributes: [
            {shaderLocation: 0, offset: 0, format: 'float32x2' },
          ],
        },
        // size
        {
          arrayStride: sizeSize,
          stepMode: 'instance',
          attributes: [
            {shaderLocation: 1, offset: 0, format: 'float32'},
          ],
        },

Also we switch to triangle-list from point-list

    primitive: {
      topology: 'triangle-list',
    },

We increased the uniform buffer size to make room for resolution and at render time we set it:

      // update uniforms
      mat4.identity(mat);
      resolution.set([colorTexture.width, colorTexture.height]);
      device.queue.writeBuffer(uniformBuffer, 0, uniformValues);

And finally at draw time we need to include the size vertex buffer:

      passEncoder.setVertexBuffer(0, positionBuffer);
      passEncoder.setVertexBuffer(1, sizeBuffer);

and we need to move numPoints from the first parameter of draw (num vertices) to the second (num instances) and pass in 6 for the first (6 vertices per quad):

      passEncoder.draw(6, numPoints);

const { mat4 } = wgpuMatrix;

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();

  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');

  const presentationFormat = navigator.gpu.getPreferredCanvasFormat(adapter);
  context.configure({
    device,
    format: presentationFormat,
  });

  const shaderModule = device.createShaderModule({code: `
  struct Uniforms {
    mat: mat4x4f,
    resolution: vec2f,
  };
  @group(0) @binding(0) var<uniform> uniforms: Uniforms;

  struct MyVSInput {
      @location(0) position: vec4f,
      @location(1) size: f32,
  };

  struct MyVSOutput {
    @builtin(position) position: vec4f,
  };

  @vertex
  fn myVSMain(v: MyVSInput, @builtin(vertex_index) vertexIndex: u32) -> MyVSOutput {
    let quadPos = array(
      vec2f(0, 0),
      vec2f(1, 0),
      vec2f(0, 1),
      vec2f(0, 1),
      vec2f(1, 0),
      vec2f(1, 1),
    );
    var vsOut: MyVSOutput;

    let pos = (quadPos[vertexIndex] - 0.5) * v.size * 2.0 / uniforms.resolution;

    vsOut.position = uniforms.mat * v.position + vec4f(pos, 0, 0);
    return vsOut;
  }

  @fragment
  fn myFSMain(v: MyVSOutput) -> @location(0) vec4f {
    return vec4f(1, 1, 0, 1);
  }
  `});
  const r = (min, max) => Math.random() * (max - min) + min;

  const numPoints = 50;
  const positions = [];
  const sizes = [];
  for (let i = 0; i < numPoints; ++i) {
    positions.push(r(-1, 1), r(-1, 1));
    sizes.push(r(5, 20));
  }
  const positionData = new Float32Array(positions);
  const positionSize = 8; // 2 f32s per point
  const sizeData = new Float32Array(sizes);
  const sizeSize = 4 // 1 f32 per point

  const positionBuffer = device.createBuffer({
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
    size: positionData.byteLength,
  });
  device.queue.writeBuffer(positionBuffer, 0, positionData);
  const sizeBuffer = device.createBuffer({
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
    size: sizeData.byteLength,
  });
  device.queue.writeBuffer(sizeBuffer, 0, sizeData);

  const pipeline = device.createRenderPipeline({
    label: 'points',
    layout: 'auto',
    vertex: {
      module: shaderModule,
      entryPoint: 'myVSMain',
      buffers: [
        // position
        {
          arrayStride: positionSize,
          stepMode: 'instance',
          attributes: [
            {shaderLocation: 0, offset: 0, format: 'float32x2' },
          ],
        },
        // size
        {
          arrayStride: sizeSize,
          stepMode: 'instance',
          attributes: [
            {shaderLocation: 1, offset: 0, format: 'float32'},
          ],
        },
      ],
    },
    fragment: {
      module: shaderModule,
      entryPoint: 'myFSMain',
      targets: [
        {format: presentationFormat},
      ],
    },
    primitive: {
      topology: 'triangle-list',
    },
  });

  const uniformBufferSize = (16 + 2 + 2) * 4;      // 1 mat4x4f + 2 f32 + 2 padding
  const uniformBuffer = device.createBuffer({
    size: uniformBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });
  const uniformValues = new Float32Array(uniformBufferSize / 4);
  const mat = uniformValues.subarray(0, 16);
  const resolution = uniformValues.subarray(16, 18);

  const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: uniformBuffer } },
    ],
  });

  const renderPassDescriptor = {
    colorAttachments: [
      {
        // view: undefined, // Assigned later
        // resolveTarget: undefined, // Assigned Later
        clearValue: [0, 0, 0, 1],
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
  };

  const colorTexture = context.getCurrentTexture();
  renderPassDescriptor.colorAttachments[0].view = colorTexture.createView();

  // update uniforms
  mat4.identity(mat);
  resolution.set([colorTexture.width, colorTexture.height]);
  device.queue.writeBuffer(uniformBuffer, 0, uniformValues);

  const commandEncoder = device.createCommandEncoder();
  const passEncoder = commandEncoder.beginRenderPass(renderPassDescriptor);
  passEncoder.setPipeline(pipeline);
  passEncoder.setVertexBuffer(0, positionBuffer);
  passEncoder.setVertexBuffer(1, sizeBuffer);
  passEncoder.setBindGroup(0, bindGroup);
  passEncoder.draw(6, numPoints);
  passEncoder.end();

  device.queue.submit([commandEncoder.finish()]);
}

main();

html, body {
  background-color: #333;
}

<script src="https://wgpu-matrix.org/dist/2.x/wgpu-matrix.js"></script>
<canvas></canvas>

I used a unit quad with values of 0 to 1 because it's likely you'd want to pass them as inter-stage variables into the fragment shader so you can shade the quad, for example with a texture, and an example with rotation

If you're curious why this functionality isn't built in, it's generally because points have always been problematic across APIs and drivers. To take OpenGL as just one example. OpenGL supports sized points but it's up the driver if they're supported of any size larger or smaller than 1 pixel. The Core OpenGL spec even requires them to be 1 (vs the compatibility spec which doesn't). Some drivers have a limit of 1, some 64, some 256, some no limit. Further, some GPUs would not draw a point if its center was off the screen. Others would draw the portion not clipped. All of this meant using point rendering of size > 1 was not portable.

I assume the WebGPU committee decided, rather than pass on the all the non-portable issues, which is not good for the web, they just decided to limit points to be 1 pixel (which is portable) and require you to do something else if you want larger points (which will also end up being portable).