Dela via


Multiple Ways to Render Point Sprites in DX11

In Direct3D 11 one can render a point sprite in several different ways.

Commonly, presentations that explain how to port DX9 apps to DX10 and 11 mention that point sprites are best done in GS. In many cases this is not the fastest way of doing screen aligned point sprites.

Below are the timings from my tests on an AMD 6-something in my dekstop. Each point sprite comes in from a vertex buffer as a point, then it's expanded into a tri or a quad either in the VS or the GS or using the tesselator.

1. GS and VS triangles and quads.
In this method vertex shader passes the vertex data to the geometry shader that expands it into either triangles or quads.

2. GS only triangles and quads.
In this method the vertex shader is empty and the vertex is loaded by the GS. Given a raw byte view of a vertex buffer and the index of the point, the GS loads the vertex data manually first, then it proceeds as the geometry shader in method 1.

3. Tesselator triangles and quads.
The tesselator can be set up to generate triangles and quads. In this method, the vertex shader reads the vertex as usual, then it passes the data to the tesselator, which performs the expansion of the vertex into a point sprite. 

4. Manual vertex load in the VS that generates triangles and quads.
By drawing 3 times more vertices in case of triangles, and 6 times more vertices in case of quads, we can use SV_VertexID to figure out the vertex index and the corner index. Then we can manually load the vertex from the raw byte view of the vertex buffer and move that vertex to the corner using the corner index. No GS is required in this case.

5. Using instancing to load vertices, VS triangles and quads.
Same as method 4, only instead of loading the vertex data manually, we rely on the geometry instancing to do it for us.

Results for 2.10 million particles in each run are below, timing in milliseconds along the Y axis, size in pixels along the X axis.

So, from the data above, using a GS isn't the fastest way to render point sprites. In fact, a more or less competent tesselator in a GPU can be faster.

If you're after rendering circles or stars or something like that, the tesselator can actually produce a perfect circle from a quad. I haven't tested the performance of that method.

Comments

  • Anonymous
    January 17, 2012
    So, why does relative performance depend on the size of point sprites? Could it be because of some non-obvious global scheduling issues?

  • Anonymous
    January 17, 2012
    Ok, now I looked closer at the graph and see that the biggest gap is between triangle- and quad-rendering methods groups. Other than that, differences are rather negligible.

  • Anonymous
    January 17, 2012
    The comment has been removed