Sunday, November 29, 2009

Immediate Mode Instancing

Some definitions: instancing is drawing the same mesh over and over, but in different places/poses/transform/whatever. Instancing comes into play when, for example, you want to draw 8000 of the same runway light fixture or 5000 of the same car. Any time we have a truly huge number of, um, things, we can expect a larger number of instances than art assets.

The idea of instancing is to pay once in CPU time for the entire set of instances that share the same art assets, rather than paying once per instance. Since CPU time is often the limiting factor on rendering, this lets us get a lot closer to the geometry throughput the card is capable of.

Now the simplest way to draw a lot of stuff is something like this:
Set up all GL state first
For each instance
glPushMatrix();
glMultMatrix(...);
glDrawElements(...);
glPopMatrix();
This is instancing by matrix transforms, and it is surprisingly faster than you'd think. (That is, I am surprised that the matrix transforms aren't more expensive.) But we are still paying per instance.

Before continuing, I cannot say this strongly enough: no state change inside the for loop!!! I say this because if you allow state change but try to minimize it, it can sneak up on you. I went to inventory X-Plane's objects for "inner loop state change", expecting to find about 20% of our urban-area art assets doing this. As it turns out, it's more like 80% that are doing this. We're losing 30% or more fps due to this state change.

Immediate Mode Instancing

Immediate mode instancing goes something like this:
Set up all OpenGL state
For each instance
glVertexAttrib (to set matrix transform)
glDrawElements
In conjunction, your vertex shader decodes the vertex attributes that the matrix is being passed down in.

It turns out that this code is at least 30% faster than matrix-transform instancing on OS X. And in hindsight this shouldn't be surprising. I would expect the built in matrix stack to be uniform state, as it isn't expected to change per vertex. And I would expect the update of attribute state to be faster than the update of uniform state.

It may also be that the GL has to do legacy processing with matrices (such as computing inverse matrices) that can be avoided.

Why would you use immediate mode instancing instead of real hardware instancing? Well, if you are on an OS that, despite the availability of hardware instancing for years, doesn't provide the extension, immediate mode instancing provides a useful half-way point.

(In particular, you can pull your immediate mode instance values directly out of the instance array you would have used.)

4 comments:

  1. Hi,

    I haven't been using GLSL for long, so forgive my naivety here, - what I would love to do with GLSL is load a matrix stack into VRAM (say, a VBO) and switch between the matrices without reloading,

    I soon realised that you probably cant so that...but then I saw your example here where you are talking about loading a matrix as a glVertexAttrib and my ears pricked up.

    Can you point me to an source code example?

    cheers,
    Lindsay

    ReplyDelete
  2. Lindsay - that is precisely what is happening - the "instance" VBO (the one that says where the instances go) is simply the top 3 rows of a model view matrix, 12 floats each. (Since the instance model view transform is always a combination of rotate and translate, the bottom row is 0 0 0 1 and thus doesn't have to be stored.)

    Sorry, no code sample - my implementation is part of proprietary code. :-(

    ReplyDelete
  3. Did you try shader (constant) instancing?

    ReplyDelete