Saturday, July 7, 2012

Hardware Instancing

I ran into a few problems with hardware instancing while finishing up the basic grass I have been adding to my XNA terrain engine.

Instancing is used to draw multiple copies of a given item.  This is commonly used for grass and foliage (see virtually any game with a landscape).  There can be hundreds of these objects.  A very nice example of this, in my opinion, is Lord of the Rings Online:

 The naive solution is of course to draw them individually.  That in fact is what I did at first, when I was working to simply position them on the terrain (matching the height of the terrain at a given point was the trickiest bit).

This is terribly slow.  Doing so reduced my FPS from 60+ to about one.

Basically each draw call has an overhead, whether it is a large and complicated mesh, or a simple rectangle that will have a small grass texture.  The collective overhead of doing 1000 or more draw calls simply grinds the application to a halt.

So: hardware instancing.  Basically hardware instancing lets you send your normal vertex and index buffers to the video card with another vertex buffer bound into the mix.  There are several uses this additional buffer can be put to, but commonly it carries a position, rotation and maybe a color.  So really you are specifying your mesh once and then declaring 'draw this eleventy bajillion times at these positions'.  This is what it currently looks like in my program:

Not as nice as Lotro, but not bad.  A large improvement over the simple textures.

So, back to the problems I had.  Anyway, getting the basic instancing working was not too difficult.  A nice straightforward writeup can be found here at Ben Johnson's blog.  After a bit of fiddling I arrived at something pretty close to what he does (storing a simple world position in the 2nd, instancing, vertex buffer.  Other ways to implement this might be storing an entire transformation matrix there).

This is my current draw code:


public void Draw( Matrix view,  Matrix projection, GraphicsDevice device)
            int vertexCount;

            effect.CurrentTechnique = effect.Techniques["Instancing"];


            for (int i = 0; i < instancedModelParts; i++)
                device.Indices = indexBuffers[i];          
                vertexCount = bindingsList[i][0].VertexBuffer.VertexCount;
                device.DrawInstancedPrimitives(PrimitiveType.TriangleList, 0, 0, vertexCount, 0, vertexCount/2, instanceCount);

Pretty straightforward.  The interesting part is the model parts loop.  Initially, my grass model was a simple rectangle I whipped up and textured in Blender.  This worked fine, but I wanted to get a little better look, so I added a second rectangle, forming an 'X'.

Now, I knew this wouldn't work at first as the model would consider this a second part (the whole instancedModelParts loop was added to deal with this).  Ben's code kind of falls down on this point:

private void GetModelVerticesAndIndices(Model model, out VertexPositionNormalTexture[] vertices, out ushort[] indices)
            List verticesList = new List(); 
            List indicesList = new List();

            foreach (ModelMesh mesh in model.Meshes)
                foreach (ModelMeshPart part in mesh.MeshParts)
                    // Go through mesh parts building up a list of vertices and indices used for each part. 
                    VertexPositionNormalTexture[] partVerts = new

                    ushort[] partIndices = new ushort[part.IndexBuffer.IndexCount];


            vertices = verticesList.ToArray();
            indices = indicesList.ToArray();

You would THINK this would work with a multi-part model. However, it only draws the FIRST model part. What gives? It is definitely cycling through each model part.
Well the evil trick turns out to be that this:


part.VertexBuffer.GetData(partVerts); you the vertices and indices of the ENTIRE MODEL. So your vertices and indices are not at all what you thought they were when you try to draw them!

I guess this makes some sense, allowing model drawing to copy the vertex buffer only once. But then WHY expose it via Model.Mesh.ModelMeshPart.VertexBuffer? Why not Model.VertexBuffer, which is exactly what it actually is? That would cause far less confusion.

Why does only the first mesh show up (well, in my case; generally you will get some sort of screwiness)?

Well, there are two rectangles, each with four vertices. So eight total vertices in the vertex buffer. The ModelMeshPart indexBuffer in fact holds twelve vertices (you would think it would have six, but remember it in fact holds the indices for the whole model, not just this part!).

So the indices you pull end up looking something like (0,1,2,2,3,0,0,1,2,2,3,0). See the problem? Each of the parts has four vertices and six indices (so the order of the vertices is 0,1,2,2,3,0).

Basically if it was one buffer it should look like (0,1,2,2,3,0,4,5,5,6,4). But it isn't. All the vertex and index data is combined into one buffer for the model... but the indexes are still numbered per model!

Long and short, this code draws the first of the two model mesh parts twice. I was really quite flummoxed as I kept debugging through my code. It was definitely going through each model part and pulling the vertices and indices... why wasn't it drawing?

Eventually I noticed that the vertex positions were the same for the two parts; this shouldn't have been as they were 90 degrees offset from one another. I assumed I'd done something wrong in Blender, and spent quite a while fiddling around there, thinking maybe I had done the transformations wrong or something.

Then it dawned on me that I was seeing eight vertices and twelve indices in each part, exactly twice what I would expect. And looking closely at the vertices, they were the vertices for the entire model!  A bit of Google searching confirmed that it was indeed the entire model.

Well, fine, we can deal with that.  First instead of one vertex- and index- buffer for the model, we need an list of them equal to the number of model parts.



VertexBuffer geometryBuffer;
IndexBuffer indexBuffer;



List indexBuffers = new List(); 
List geometryBuffers = new List();

The code to get the indices and vertices has to change as well.  There is another 'gotcha' here, too.  One would naively think this would work:


partVerts = new VertexPositionNormalTexture[part.NumVertices];
part.VertexBuffer.GetData(partVerts, part.VertexOffset, part.NumVerticies);

Get the data from the part vertexbuffer, starting at the correct offset, and getting the correct number.  This crashes right away, complaining that the array index is out of bounds.  Wha?  From MSDN:

VertexBuffer.GetData (T[], Int32, Int32) Gets a copy of the vertex buffer data, specifying the start index and number of elements.

Hm.  That LOOKS right.  However it definitely blows up.  It turns out the startIndex is an offset into the DESTINATION array.  To me that makes NO sense at all. Not even a little.  But there it is.  I can't quite fathom why anyone would want it to work like this, but it does.

Crimeny.  So that didn't work.  Fine, then I tried the slightly more complicated variant of GetData, where you have to pass in the vertex stride (amount of bytes the vertex struct takes up).


int offsetInBytes = part.VertexBuffer.VertexDeclaration.VertexStride * part.VertexOffset;

partVerts = new VertexPositionNormalTexture[part.NumVertices];
part.VertexBuffer.GetData( offsetInBytes, partVerts, 0, part.NumVertices, 

Got that?  So just asking for the vertices doesn't work, you have to calculate the byte offset and then ask for X vertices after that (but of course it knows the vertex stride already, it HAS to... so why have to jump through the extra silly hoops?).

In any case, that works smashingly and returns exactly the four vertexes we expect for each mesh part.

Indices are handled similarly:


partIndicies = new ushort[part.PrimitiveCount * 3];

offsetInBytes = part.StartIndex * sizeof(ushort);
part.IndexBuffer.GetData(offsetInBytes, partIndicies, 0, 
     part.PrimitiveCount * 3);

The entire vertex/index retrieval function looks like this:


private int GetModelVerticesAndIndices(Model model, List vertices, List indicies)
    int parts = 0;
    boneTransforms = new Matrix[model.Bones.Count];

    foreach (ModelMesh mesh in model.Meshes) 
        foreach (ModelMeshPart part in mesh.MeshParts) 
            List verticesList = new List();
            List indicesList = new List();

            VertexPositionNormalTexture[] partVerts;
            ushort[] partIndicies;
            int offsetInBytes = part.VertexBuffer.VertexDeclaration.VertexStride * part.VertexOffset;

            partVerts = new VertexPositionNormalTexture[part.NumVertices];
            part.VertexBuffer.GetData(offsetInBytes, partVerts, 0, part.NumVertices, part.VertexBuffer.VertexDeclaration.VertexStride);

            partIndicies = new ushort[part.PrimitiveCount * 3];
            offsetInBytes = part.StartIndex * sizeof(ushort);
            part.IndexBuffer.GetData(offsetInBytes, partIndicies, 0, part.PrimitiveCount * 3);




    return parts;

After these modifications, the model and all its parts now draw correctly.  This should have been pretty simple, save for the two things I find VERY non-intuative:  model part buffers returning data for the entire model, and vertex- and index- buffer GetData functions behaving differently... GetData(destArray, startOffset, numItems) does one thing, GetData(byteOffset, destArray, numItems, ...) behaves radically differently.

In any case, hopefully this will help someone else avoid a few evenings of frustration.


  1. Hey, (it's Ben Johnson)thanks for pointing out a problem with vertices/indices extraction. You're right about the vertex/index buffer on the model parts. I assumed that it would be an individual buffer for each part. Since they are exposed in parts. It is indeed very strange to expose it the way it is! I never came across this problem because within my game and in the example I use meshes comprised of one part, which is my bad for not testing it further.

  2. Yeah, everything was working fine for me, too, till I got ambitious and tried a model with multiple parts. Hopefully this will help someone else be less bamboozled by the entire thing.