Tuesday, 18 September 2012

Some more optimizations

When you care about performance of a system you need to look everywhere. You start optimizing your code for the critical points, in the case of a render engine that might be ray traversal, shading code... acceleration structure build.

Lately I'm quite happy with Glimpse rendering speed, but somthing that always bothered me is how long it takes to get data from the host application, in this case Maya, to the render engine.
In the past I used to wait 10 minutes for a preview rendering. In this case 15 seconds of frame translation is not a big deal. But now I can get a low quality noisy frame rendered in a fraction of a second. In this scenario a 10 sec frame translation is boldly inappropriate!

I was doing some tests the other day. Frame rendering was about 70" for a very clean quality at 2k resolution; about a second for an half-res low auality preview, but 26" of data translation.
It turns out that many Maya API calls are rather slow. Some to avoid like pleague is MItMesh*.
Some others particularly slow subsystems are light linking and materials assignments. If you are ever trying to write your own translator from Maya, do yourself a favor and extract such data from plugs and connections rather than relaying on higher level API to do the work for you.

After some changes translator is between 4-7 times faster now.
Sorry, no pictures this time :)

Saturday, 1 September 2012

Motionblur

Motion blur is a deal breaker for production renderer. It is often one of those feature that determines if the engine can be considered "production ready". For many years raytracers had this "bad name" because they couldn't do it right. It was either too slow or too bad.

In practice, the concept is fairly simple when applied to bounding volume hierarchies. Each tree node would contain multiple bounds, describing the discreet motion steps. During ray traversal the motion bounds are first linearly interpolated based on ray time, the result is tested for ray intersection. Same happens for the leaf nodes and primitives.
What is the "ray time" then? When you generate primary samples you associate to them a random time, most likely i the [0-1) range referring to the fractional moment between shutter opening and close. All secondary rays, indirect illumination, shadow, etc deriving from a primary sample will inherit its time.
If you are using stratified sampling, or other low discrepancy sequences, you should stratify the random time too to get a nice distribution, but make sure you scramble the sequence to not have any correlation between ray direction and time. That would cause temporal aliasing.

Recently I have noticed that Intel Embree 1.1 got released. Some of the memory improvements Intel engineers implemented remarkably resemble some suggestion I have made in their forum. That apart, v1.1 sees a decent implementation of motion blur too.

Looking at the code I have noticed that their implementation is quite similar to the one I had in mind, except for the tree build which I had only a vague idea how to dealt with (in the motion-blur sense). A nice trick to simplify BVH tree construction it to build the tree half way across the motion. It is probably the most optimal place to minimize bounds overlap. Then at the end of the build process, refit your tree to the motion steps and store the multiple keys. Refit is a lot quicker than building a new tree from scratch for each motion step. Refit is know to under perform in animation comparing to a fresh tree rebuild, however this is motion of half a frame we are talking about (and building the tree at mid time, it's just a quarter of frame). Objects generally won't move so much in a fraction of frame, not enough to cause strong performance degradation.

During the course of last week I had a crack at implementing a mix of my ideas and inspirations taken from Embree.

It seems that performance degradation compared to non moving rendering is around 15%. Better than I expected! I know there are several optimizations I can do to make it better. For instance I am sampling motion bounds and primitives for static objects too, which is a big waste of computation and memory.

If you want to challenge yourself, this is a  nice paper:

It extends the idea of Split BVH to the concept of motion.