March 10th, 2022
This is a followup to a post I wrote last year, which describes an algorithm for efficiently rendering Signed Distance Functions by first reducing them into a set of locally relevant subsets of the original function. This was accomplished by establishing a bounding box for every node in the CSG tree, then finding all unique overlapping regions to calculate a set of new CSG tree permutations, and then using those trees to generate shaders and so on. In the time since I wrote that first post, I found that generating all of the boundary permutations was way too slow for some models, that it was prone to generating razor thin regions, that the system was way more complex than I was happy with, and that models featuring a rotating sequence of cuts would produce a single very slow shader at the end of a long compile.
I've since abandoned my earlier bounding box based approach in favor of a new approach that is much faster, is simpler to implement, easier to reason about, solves the degenerate cases from before, still fits well with contemporary cluster culling algorithms, and can be easily repurposed for a number of useful applications beyond shader generation and rendering.
This post describes my new approach to SDF decomposition: The Voxel Compiler.
The voxel compiler takes a SDF representation of a model, and then eliminates all irrelivant branches for each voxel in the model's volume. These variants of the original SDF are then used to compile greatly simplified shaders for fast ray marching within the respective voxel boundaries.
Here is how it works:
For my SDF evaluator implementation, see this header and this implementation.
The picture to the right is a model that was rendered using shaders generated by the voxel compiler, with an enhancement which is explained later in this post. The coloration indicates which shader variant was used to draw that region.
The cyan body of the gear indicates the region where the model's SDF was reduced to a flat cylinder with a recess on either side. The green regions indicate a similar region with a single cylinder cut away to form a hole. The red, blue, yellow, purple, and magenta regions around the perimeter of the gear indicate where one or more rotated cubes are subtracted from the edge of the gear to form its teeth.
For this model, the voxel compiler generated 7 shaders, all of which are substantially lower in complexity than the whole model. This number of variations is very low due to the parameter extraction technique, which is explained in the next section.
The explanation of the SDF evaluator above mentions a constraint where a transform node can only occur as the parent to a shape node (graph leaf). However this constraint should not be imposed on your authoring language for reasons of flexibility, and so transform folding is needed to reconcile this difference. Without transform folding, each shape node in the graph has an implicit stack of transforms that are applied to the sampling point to produce the shape function's local input. Transform folding just collapses this stack down into a single transform per shape.
Transform folding is an important optimization for two reasons. The first reason is that these shape functions will be evaluated many times, and transform folding greatly reduces the amount of ALU work needed to evaluate your model. The second reason is that the constant folding provided by shader compilers is not able to fold your math this aggressively, and will produce substantially slower results without some extra help.
When generating shaders, replace all numeric constants with an array lookup into an external buffer of parameters. These should be stored such that the array elements are accessed sequentially when the shader is executed to ensure good cache performance. The parameter array should not be embedded in the shader but supplied by a separate parameter buffer at run time. This results in a significant reduction in shader variants which need to be compiled, and has a negligible impact on rendering performance.
The specific reason you want this is to perform SDF evaluator clipping incrementally instead of re-clipping the original evaluator tree for every voxel. A naive array 3D works reasonably well for relatively small volumes, but scales very poorly very quickly due to literal cubic growth. Additionally, models with larger bounding volumes tend to have more empty space. Using a data structure like an octree allows for large swaths of dead space to be skipped cheaply.
Rendering your model means rendering a large number of voxels to produce fragments for ray marching within the voxel's bounds. This creates a lot of overdraw, especially with thicker models with intricate internal geometry. Thankfully, this approach plays well with existing cluster occlusion culling algorithms.
Compile your shaders at runtime in a separate thread, and either use a placeholder for uncompiled voxels or implement an interpreter in a shader. If you've implemented parameter extraction, you're already halfway there. Here is some excellent further reading on the subject:
Alas, I'm not going to be sending you back to your team cheering
"SHE DID IT! SIGNED DISTANCE FIELDS FOR EVERYBODY, FREE, AND MAY NO ONE BE LEFT BEHIND!" just yet.
I'm still building my implementation, and there are two performance problems that I need to work out before I'm ready to share benchmarks. The first is that my voxel compiler implementation currently uses a 3D array instead of an octree, which means large complex models take forever to compile. The second is that my renderer does not perform any kind of occlusion culling, so excessive overdraw makes up most of the draw time. So, uh, Stay tuned for part 3!
That said, I am very happy with how the voxel compiler is performing so far, and I plan to continue with this approach for the foreseeable future.
The SDF evaluator is wowie zowie cool beans on its own, and voxel compiler itself is just a straightforward application of it. With both, you have a really nice way to write SDFs once for use on both the CPU and the GPU.
Of course there's a lot else you can use the SDF evaluator for. Here are few ideas:
Be it for rendering or for 3D printing, having a CPU-side SDF evaluator is a prerequisite to all SDF meshing algorithms I'm aware of. But you can also just use it to generate a nice dense point cloud and let meshlab figure out the rest. Or export MagicaVoxel models with it.
You can use a SDF evaluator to perform sphere collision queries against your game level. First, find the voxels that the query sphere is able to collide with. Then, use ray marching to find the isosurface per voxel to produce a hit point or ray miss. While this method limits you to "sphere hits thing" queries, you can use this general method to build more complex queries.
Use it to make a fluid simulation! Or do audio propagation! Or global illumination stuff!
If the ability to evaluate SDFs is itself exposed to your SDF language, then you can have models do things like query for a uniform distribution of points on the surface of a model or within a model, and use the results to construct more complex models.
For example, say you have a SDF character model that you want to give majestic flowing locks of hair. You could model the hair directly with some trial and error, but that would be no fun. Instead, use the SDF evaluator to generate a set of points on the character's scalp to place strand emitters. Then, generate splines starting at the emitter points. Pull the splines downwards using gravity and/or along a flow map, and use the SDF evaluator to push the splines towards the gradient so they rest on the model instead of intersect it. Once the splines are generated, union together some nice shapes along the splines to produce the hair, and union the hair with the original mesh to produce a new evaluator.
This would work best with an octree implementation. At the top of the octree is your SDF evaluator tree, and each child node contains a progressively clipped version of it. From there it could be possible to further modify your model by appending nodes or replacing the tree entirely. When you modify the model, replace the top level octree node. From there, reclip the tree for each octree child. If the clipped tree doesn't match, replace the child and recurse. This could be useful for real time modifications to terrain such as digging tunnels, building structures, adding dynamic cutaways, and so on.