Volumetric Bounding Structure: Ray Tracing Accelerator

Volumetric Bounding Structure: Ray Tracing Accelerator


A ray tracing acceleration structure is a system that reduces the time it takes to render each frame. The acceleration structure implemented in LinearMAX is a volumetric bounding structure. While a volumetric bounding hierarchy would be a better solution than the static structure in LinearMAX, HLSL was too restrictive, and implementing the hierarchy was impossible due to the lack of any sufficiently sized editable array type. The implementation of the bounding hierarchy was very difficult due to the restrictive nature of HLSL.


The bounding structure can be broken down into two parts: the parent volume surrounding the entire mesh, and the sub bounding boxes making up the mesh. The goal of this accelerator structure is to determine what triangles need to be checked for an intersection so as to reduce the time complexity of the Mesh tracing by a significant amount. When we check a ray’s collision with a mesh, we first check to see if the ray intersects the parent volumetric bounding box. If it does not intersect we know that we don’t need to check any of the triangles within the mesh. If it does intersect we check the ray for intersections with the sub bounding boxes within the larger parent volume. Then we only have to check triangles within each intersected sub bounding box.


We can implement this accelerator in 3 steps. First check for an intersection with the parent volume, then (if the parent volume is intersected) find the sub volumes which the ray intersects, finally check the intersections with the triangles within the sub volumes we hit with the ray.

Step 1:

The intersection with the parent bounding volume is very straightforward. Just run an intersection check which returns a boolean value for the ray and box.

Step 2:

This is where we start running into a lot of difficult restrictions. The largest editable storage structure HLSL has available is a matrix4x4. The matrix can only hold scalar data types (ie int, float, uint, ect...), so we are extremely limited in how we can store data. We can only store a maximum of 16 numbers in a convenient and fast structure at a time. We can’t run the volumetric bounding box intersections for each vert, so we have to store the sub boxes the ray intersects somehow. Assuming we only need one number to represent each entire bounding box, we can only hold a maximum of 16 intersected bounding boxes. When splitting a 2D square into 4 equal parts by slicing it from the midpoints of the edges, the maximum number of 2D “sub squares” a ray can intersect is 3. When scaling this up to 3 dimensions, the we can split a cube into 8 equal parts, again slicing the box along each axis at its center, the maximum number of “sub cubes” a single ray can intersect is 4. This number, 4, is very important because it tells us many sub boxes we can fit in the bounding volume. Therefore, if we split the parent volume into 64 equal parts, the maximum number of cubes which the ray can intersect is 16, or the number of elements we can fit in the largest editable data structure in HLSL. 

So now we know how many sub cubes we can have in our structure, we need a way to store the cubes we intersect. First we generate each bounding volume in C# and send it to the GPU through a ComputeBuffer. The cubes need to be generated in a specific order for the parsing and storage function explained later. They need to wrap from left to right then from front to back, and from bottom to top. Diagram below.

13 14 15 16

9 10 11 12      --->> next level 

5   6   7   8 21 22 23 24 ect… --->> To 64

1   2  3  4  17 18 19 20

Once we have our cubes in the GPU we can do intersection check in each, indexing them from a ComputeBuffer. If an intersection exists we need to store the cube in the integer matrix4x4. Since we can have the position and scale of the parent volume, all we really need to store each cube is a relative position inside the parent volume, since all cubes are the same scale of ¼ of the parent volumes scale. So we next need to parse the index of the cube in the ComputeShader to turn index 0 into 0, 1 into 1, 13 - 31, 52 - 330, ect. This parsing takes the index and transforms it into a number representing the relative position in the parent volume. 213 would be the second “floor”, the first row, and the 3rd column. 031 or 13 would be the zero “floor”, the third row and the first column. Now we have a relative position for each box which we can store in the int matrix4x4. We can spend more time parsing the numbers than unparsing them because parsing the numbers only happens once per ray, but unparsing can happen hundreds or even thousands of times per ray. To unparse we only need 3 modulo operations and 2 division operations. However we cannot simply just loop over each box in the ComputeBuffer, check for collision, and save the parsed position to the matrix because of restrictions regarding L-Values (values that appear on the left of an equal sign) being used to index an array or matrix. Since we are indexing a matrix we need the x, and y position of the matrix. A normal way of indexing this matrix in a for loop would be to increment the x position everytime we add a new variable to the matrix (which does not happen every run of the loop so we need to increment it within the for loop and not as the incrementation which runs each time the for loop runs) and if the x position is greater than the number of columns in the matrix we set the x to 0 and increment the y position. But as I said before we cannot increment a value which we use to index a matrix. While this is a significant restriction and took a while for me to find a work around, I did solve it. Since there are a maximum of 16 times when a value can be added to the matrix, we can have 16 consecutive for-loops each with a manual entry to our matrix position [x][y],  [2][0]. When we add a value to the matrix (when a sub volumetric bounding box is intersected) we break the for loop and enter the next one (with the next manually entered index for the matrix) but retain the same index for indexing the ComputeBuffer with all the sub volumetric bounding boxes. Finally, we have the intersected cubes saved and able to be quickly parsed and deciphered.

Step 3:

Now we must convert the saved matrix values to usable cube data with a maximum and minimum vertex position. All we need to check if a triangle is within a cube. We can unparse the matrix values like so: let the value be 221 we know the relative Y value is 2, the relative Z value is 2, and the relative X value is 1. Knowing the world position and scale of the parent box we can get the world position and scale of each sub box, and then find their maximum and minimum vertex positions. Now when we check each triangle to see if it is inside of any intersected sub boxes. However, there is a problem in this step [SEE PROBLEMS]. 


The largest problem facing the implementation of the system is that it will not work on some meshes. When checking if a triangle is inside of a bounding box we have two strategies. Check if the triangle and cube intersect, or check if any of the points of the triangle are within the cube. If we check if the triangle and cube intersection we may as well skip the whole acceleration structure and check the ray and triangle as they take equally long, but if we check to see if a vertex is within a bounding cube the acceleration structure saves a good amount of processing time. If all vertices are outside of the bounding cube, but the triangle intersects with the cube however, the algorithm will not check that triangle even though it should. 


Sadly the cases where this accelerator was too far and few between, and on the meshes it did function correctly it did not run too much faster.

Get LinearMAX

Leave a comment

Log in with itch.io to leave a comment.