GL_NV_mesh_shader Simple Mesh Shader Example

Useful Links

What's a Mesh Shader?

A mesh shader is a compute-like shader, which is glued to a fragment shader. This completely lops off the entire rendering pipeline prior to rasterization. The mesh shader is a compute-like shader because you specify how many threads to a work group, and you just dispatch a bunch of work groups to run it, and you get to do things like group shared memory and so on. Mesh shaders are special because they output triangles, which are then rasterized and fed to a normal pixel shader. There's also an optional stage before the mesh shader called a task shader, which I'll explain later.

When rendering with mesh shaders, you still get early z/stencil culling before the pixel shader (in the circumstances which you would normally get early z culling), and you also get the normal ROP stuff you'd get from an ordinary raster pipeline.

What is this used for? Well, for one, you can do just about everything (and more!) with mesh shaders that the the ill-fated tessellation and geometry shaders could do. There's also this "meshlet" rendering technique, where you represent your models as short triangle strips, and you use the mesh shader to aggressively cull meshlets before rasterizing them. This are big scary techniques that are very case-specific, but I think there's a lot of potential for novel uses that don't involve mimicking the entire vertex pipeline.

In this post I call mesh shaders and task shaders "compute-like", because they are very similar to compute shaders by design. However, in Nvidia's current hardware, they are a type of graphics shader, and thus subject to different limitations and performance characteristics.

And OpenGL Supports This?!

No.

I mean, yes,

- but only with Nvidia's special vendor extension that adds it.

The new Turing architecture introduced a bunch of new and exciting features, like ray tracing! And and and, they even made OpenGL vendor extensions for some of these! But not for ray tracing >:(.

Also none of the GPU debuggers I've tried support OpenGL mesh shaders, including Nsight, so beware!

Great! What do I Need to OpenGL up Some Mesh Shaders?

For one, you need to have hardware that supports the GL_NV_mesh_shader extension.

To compile a shader program that comprises of one mesh shader and one pixel shader, you'll use the enums GL_MESH_SHADER_NV and GL_MESH_SHADER_BIT_NV (instead of GL_VERTEX_SHADER and GL_VERTEX_SHADER_BIT). And then you'll need a mesh shader and a pixel shader. We'll get to that in a second.

To draw this shader program, you'll use one of these guys:

Contrived Simple Mesh Shader

This mesh shader just draws a full screen quad.

#version 450 #extension GL_NV_mesh_shader : require layout(local_size_x=1) in; layout(max_vertices=4, max_primitives=2) out; layout(triangles) out; out gl_MeshPerVertexNV { vec4 gl_Position; } gl_MeshVerticesNV[]; out uint gl_PrimitiveCountNV; out uint gl_PrimitiveIndicesNV[]; // If we don't redeclare gl_PerVertex, compilation fails with the following error: // error C7592: ARB_separate_shader_objects requires built-in block gl_PerVertex to be redeclared before accessing its members out gl_PerVertex { vec4 gl_Position; } gl_Why; void main() { gl_MeshVerticesNV[0].gl_Position = vec4(-1.0, -1.0, 0.0, 1.0); // Upper Left gl_MeshVerticesNV[1].gl_Position = vec4( 1.0, -1.0, 0.0, 1.0); // Upper Right gl_MeshVerticesNV[2].gl_Position = vec4(-1.0, 1.0, 0.0, 1.0); // Bottom Left gl_MeshVerticesNV[3].gl_Position = vec4( 1.0, 1.0, 0.0, 1.0); // Bottom Right gl_PrimitiveIndicesNV[0] = 0; gl_PrimitiveIndicesNV[1] = 1; gl_PrimitiveIndicesNV[2] = 2; gl_PrimitiveIndicesNV[3] = 2; gl_PrimitiveIndicesNV[4] = 1; gl_PrimitiveIndicesNV[5] = 3; gl_PrimitiveCountNV += 2; }

A Corresponding Contrived Fragment Shader

There isn't really much to say here.

#version 450 layout(location = 0) out vec4 OutColor; void main() { OutColor = vec4(1.0, 0.0, 1.0, 1.0); }

And Then The Draw Call

This one just draws one work group w/ the starting index being 0.

glDrawMeshTasksNV(0, 1);

Disclaimer

I want to note that normally you'll have a higher value of "local_size_x", like "32". By setting this to 1 like my mesh shader above does, this launches one (1) thread per work group on your GPU, and we're only launching one work group, so it is essentially a scalar process. Which may or may not be what you want! I really doubt this the best way to draw a full screen quad, however.

Ok What About Task Shaders

Task shaders are more different compute-like shaders that launch mesh shaders. I haven't had a chance to play around with them yet, but they're not really any more complicated to use than mesh shaders are.

One really cool thing you can do here is just turn off rasterization and use a shader program that is just a task and mesh shader. Why? Well, what you have in that case is something like a compute shader that can launch a variable number of a different compute-like shader. Well, you can also do that with indirect dispatching, BUT, task shaders can also pass arbitrary output to the mesh shaders they launch without requiring you dump it into a UAV (or something) first, so that is neat!

What Else Did You Gloss Over?

A lot.

If you just want to play with a simple example, the things I'd look at next are interpolants and using gl_LocalInvocationID.x to figure out what thread you're in.