Sounds strange right? In reality, there is not enough room for work load required by my ray marching engine. When i use high end GPU, performance went to the roof, rendering done so fast. But when it comes to the average available GPUs, no chance for this rendering engine. I do some inspection, while CPU are literally sits and wait for the GPU to complete its task, GPU resources are fully occupied. I check CPU load using Windows Task Manager, and using GPUz to check GPU load. while i got below 1% on CPU, i got more than 90% on GPU. That fact conclude that this method need to wait a decade before we can use it. I was too naive.

So what happen to the volume rendering? as the tittle says CPU assisting GPU, i finally accept the algorithm being overred in GPU gems. By using proxy geometry CPU actually assisting GPU. Texture sampling (texture2d) went down almost 64 times lower. This is because i’m using maximum 64 steps for the ray marching to reach object surface, and some times it hits 128 steps for precission. when doing raymarching, each steps requires to read a texel. While it is required for the distance function to reach the object surface, it also kills the performance. Not to mention other expensive mathematical operation being done each step. i must admit that GPU are really awesome. When imagining that the fragment program are run for every pixel color on the screen, it becomes clear that the requirement is just to big to handle most GPUs right now.

Back to proxy geometry, the idea of proxy geometry are simulating ray travel from the camera to the end of the volume data. this done by slicing the volume data using view aligned planes from the nearest to far most position relative to the camera. by doing so the tracing operation are gone. the tracing operation are replaced by alpha blending operation when rendering each slices.

according to the article there are some major disadvantages in using this method. first generating proxy geometry takes time, can be speed up by doing it in the geometry shader. Secondly 3d volume data are huge data sets, when doing rendering, it tooks time to upload into GPU memory, not to mention generating volume data also not an easy task. In general while the operation required are much much less, the amount of memory required are increasing. the third is for each slice we will find alot of blank area which are wasting our computational resources. of course it is easy fixed by doing segmentation on volume data so blank area are skipped.

from the last experiment, i’ve learned that volume data can be stored inside 2D images, there are at least 3 types, rectangular (imagin tessellating plane), cylindrical, and spherical. Well some people will say, just do a tessellation. I think that might be true. that is what GPU manufacturer, and rendering API improving this whole time. Still my curiosity gets me, i think there is a lot of geometrical shapes that impossible to achieve using tessellation “economically”. And i just went to coding mode.

first step are creating the proxy geometry, put some debugging data visible. as seen below, texture coordinate are correct, x rend, z green, and y blue. i also draw the outline of the slices.


the view aligned slices looks pretty, aren’t they? below the rendering without slices outline


after all the required element are setup, i do a test rendering using my favorite Indian temple texture, here is what i got


again tessellation might be much faster here. but volume rendering is where we can get the round surface to be perfectly round, and vocal points to look as sharp as possible. as you can see above, even without lighting (yet), we can really feel the bumpiness of the surface. above tests are running on a laptop intel i5 4200U, and integrated intel graphic chipset. dont worry it reach 200+ fps when i’m using my high performance nvidia 720GTm.

Okay, next post? deffered rendering :).