A Better Approach to SDF Decomposition

March 10th, 2022


This is a followup to a post I wrote last year, which describes an algorithm for efficiently rendering Signed Distance Functions by first reducing them into a set of locally relevant subsets of the original function. This was accomplished by establishing a bounding box for every node in the CSG tree, then finding all unique overlapping regions to calculate a set of new CSG tree permutations, and then using those trees to generate shaders and so on. In the time since I wrote that first post, I found that generating all of the boundary permutations was way too slow for some models, that it was prone to generating razor thin regions, that the system was way more complex than I was happy with, and that models featuring a rotating sequence of cuts would produce a single very slow shader at the end of a long compile.

I've since abandoned my earlier bounding box based approach in favor of a new approach that is much faster, is simpler to implement, easier to reason about, solves the degenerate cases from before, still fits well with contemporary cluster culling algorithms, and can be easily repurposed for a number of useful applications beyond shader generation and rendering.

This post describes my new approach to SDF decomposition: The Voxel Compiler.

The Voxel Compiler

The voxel compiler takes a SDF representation of a model, and then eliminates all irrelivant branches for each voxel in the model's volume. These variants of the original SDF are then used to compile greatly simplified shaders for fast ray marching within the respective voxel boundaries.

Here is how it works:

  1. The input SDF is expressed as a symbolic tree of CSG operations, where the graph's leaves are simple SDFs for common shapes, and the other nodes are CSG set operators or transforms. For a simple example, (diff (cube 1) (sphere 1.3)) describes a cube with a sphere subtracted from it. You can find several example models here in the form of simple Racket programs. This is essentially the same input language as my earlier version.
  2. Construct a SDF evaluator from the input SDF. The SDF evaluator is a tree data structure that is trivial to construct from the input model if they are not already one in the same. The graph's leaves are SDFs for common shapes, transform nodes may only occur as a singular optional parent to a shape node, and all other nodes are CSG set operators. All SDF evaluator nodes have the following functions:

    For my SDF evaluator implementation, see this header and this implementation.

  3. Determine the bounding box of the SDF, and the ideal voxel grid. For each voxel, call the evaluator's "clip" method to produce a new evaluator or a null value. If a null value is returned, discard the voxel. Otherwise, call the "compile" method to produce the locally relevant shader variant. Store the generated shader functions and their associated voxels in a map object, such that the generated shader source has a one-to-many mapping to the relevant voxels. The map keys should be unique so as to de-duplicate the generated shader functions.
  4. Combine the generated shader functions with template sources and pass the results to your shader compiler of choice.

A colorful rendering of a gear.

The picture to the right is a model that was rendered using shaders generated by the voxel compiler, with an enhancement which is explained later in this post. The coloration indicates which shader variant was used to draw that region.

The cyan body of the gear indicates the region where the model's SDF was reduced to a flat cylinder with a recess on either side. The green regions indicate a similar region with a single cylinder cut away to form a hole. The red, blue, yellow, purple, and magenta regions around the perimeter of the gear indicate where one or more rotated cubes are subtracted from the edge of the gear to form its teeth.

For this model, the voxel compiler generated 7 shaders, all of which are substantially lower in complexity than the whole model. This number of variations is very low due to the parameter extraction technique, which is explained in the next section.

Implementation Recommendations

  1. Perform transform folding before generating the SDF evaluator.

    The explanation of the SDF evaluator above mentions a constraint where a transform node can only occur as the parent to a shape node (graph leaf). However this constraint should not be imposed on your authoring language for reasons of flexibility, and so transform folding is needed to reconcile this difference. Without transform folding, each shape node in the graph has an implicit stack of transforms that are applied to the sampling point to produce the shape function's local input. Transform folding just collapses this stack down into a single transform per shape.

    Transform folding is an important optimization for two reasons. The first reason is that these shape functions will be evaluated many times, and transform folding greatly reduces the amount of ALU work needed to evaluate your model. The second reason is that the constant folding provided by shader compilers is not able to fold your math this aggressively, and will produce substantially slower results without some extra help.

  2. Remove All Numeric Constants From Generated Shaders.

    When generating shaders, replace all numeric constants with an array lookup into an external buffer of parameters. These should be stored such that the array elements are accessed sequentially when the shader is executed to ensure good cache performance. The parameter array should not be embedded in the shader but supplied by a separate parameter buffer at run time. This results in a significant reduction in shader variants which need to be compiled, and has a negligible impact on rendering performance.

  3. Use an octree instead of a 3D array.

    The specific reason you want this is to perform SDF evaluator clipping incrementally instead of re-clipping the original evaluator tree for every voxel. A naive array 3D works reasonably well for relatively small volumes, but scales very poorly very quickly due to literal cubic growth. Additionally, models with larger bounding volumes tend to have more empty space. Using a data structure like an octree allows for large swaths of dead space to be skipped cheaply.

  4. Use cluster culling.

    Rendering your model means rendering a large number of voxels to produce fragments for ray marching within the voxel's bounds. This creates a lot of overdraw, especially with thicker models with intricate internal geometry. Thankfully, this approach plays well with existing cluster occlusion culling algorithms.

  5. Compile your shaders asynchronously at runtime

    Compile your shaders at runtime in a separate thread, and either use a placeholder for uncompiled voxels or implement an interpreter in a shader. If you've implemented parameter extraction, you're already halfway there. Here is some excellent further reading on the subject:


Alas, I'm not going to be sending you back to your team cheering

I'm still building my implementation, and there are two performance problems that I need to work out before I'm ready to share benchmarks. The first is that my voxel compiler implementation currently uses a 3D array instead of an octree, which means large complex models take forever to compile. The second is that my renderer does not perform any kind of occlusion culling, so excessive overdraw makes up most of the draw time. So, uh, Stay tuned for part 3!

That said, I am very happy with how the voxel compiler is performing so far, and I plan to continue with this approach for the foreseeable future.

More Thoughts on the SDF Evaluator

The SDF evaluator is wowie zowie cool beans on its own, and voxel compiler itself is just a straightforward application of it. With both, you have a really nice way to write SDFs once for use on both the CPU and the GPU.

Of course there's a lot else you can use the SDF evaluator for. Here are few ideas: