|
Introduction to the ProblemA common performance pitfall in Direct3D is caused by rendering a few polygons at a time, resulting in a large number of DrawPrimitive() or DrawIndexedPrimitive(). Often it is possible to group polygons together into larger batches, resulting in far greater performance. The worst case scenario occurs when an application is repeatedly rendering one or two triangles per rendering call. I see this in applications far more frequently than one might expect, and the majority of these occurrences seem to fall into several design categories:
Often such applications will also use DrawPrimitiveUP() or DrawIndexedPrimitiveUP() rather than their more efficient buffered counterparts. Batching PrimitivesThe solution is to gather together primitives into larger batches of polygons, and render them together in a single call. Typically each batch should contain at least 100-200 triangles, though with today's hardware rendering 1000 or more triangles per call is preferred. Batching triangles together for rendering in a single DP call requires that all of them be rendered with the same render states, including texture, lighting, transformations, etc. This will require sorting to be performed at some level. If the application already has polygons segregated in this manner then it will be easy to adapt, otherwise sorting will have to be performed. Use of a Vertex CacheAt the end of this article we will provide source code for a class providing a "vertex cache", which provides a cache that your application can stream primitives to during a scene, to be rendered together in larger batches. Note that the implementation provided is limited to triangle lists and is designed to accept arrays of vertices containing only those vertices to be rendered in a given primitive, but this can be easily modified to suit your needs. To use the vertex cache class, your application must first create an instance of CVertexCache after D3D initialization. The constructor takes five parameters: CVertexCache(UINT maxVertices,UINT maxIndices,UINT stride,DWORD fvf,DWORD processing)
To render with the vertex cache, three functions are used: HRESULT Start() Call this function prior to rendering with the vertex cache to set up the index and vertex streams, as well as the vertex shader. Note that these states must not be modified until any primitives subsequently rendered have been flushed from the cache. HRESULT Draw(UINT numVertices,UINT
numIndices,const WORD *pIndexData, Writes primitives to the cache, copying numVertices vertices from the pVertexStreamZeroData pointer and numIndices indices from the pIndexData pointer. Only 16 bit (WORD) indices are supported in this version, and that the number of indices rather than the number of primitives are specified. The value of numIndices should be three times the number of triangles to be rendered. Passing NULL for pIndexData causes the vertices to be treated as a non-indexed triangle list. The value of numIndices should still be set according to the triangle count, and should be equal to numVertices. The value of numVertices must be evenly divisible by three in this case. In this sample class the values of numIndices and numVertices cannot exceed the maximums set on initialization. This could easily be dealt with, though, by testing for this condition and immediately rendering the passed primitive if larger than the buffer sizes. If a call to Draw() fills either the index or vertex buffer to capacity, the triangles in the cache are rendered automatically and the cache cleared to make room for new primitives. HRESULT Flush() This function renders any triangles remaining in the cache. This function is called by Draw() whenever the cache is full, and should also be called prior to changing render states to render primitives requiring a different texture or other state changes. Basic UsageThe pseudocode below illustrates use of the vertex cache to render groups of polygons sorted by texture: // g_vertexCache points to a previously created instance of CVertexCache g_vertexCache->Start(); for (int i=0;i<numTextures;i++) { lpDev->SetTexture(0,textures[i]); for (int j=0;j<numTriangles[i];j++) g_vertexCache->Draw(..,numTriangles[i]*3,..,..); g_vertexCache->Flush(); } Source Code// VertexCache.h: interface for the CVertexCache class. // ////////////////////////////////////////////////////////////////////// #include <d3d8.h> class CVertexCache { public: HRESULT Flush(); HRESULT Start(); HRESULT Draw(UINT numVertices,UINT numIndices,const WORD *pIndexData, const void *pVertexStreamZeroData); CVertexCache(UINT maxVertices,UINT maxIndices,UINT stride,DWORD fvf,DWORD processing); virtual ~CVertexCache(); DWORD m_fvf; UINT m_maxVertices; UINT m_numVertices; UINT m_maxIndices; UINT m_numIndices; IDirect3DVertexBuffer8 *m_vBuf; IDirect3DIndexBuffer8 *m_iBuf; UINT m_stride; BYTE *m_vertPtr; WORD *m_indPtr; }; // VertexCache.cpp: implementation of the CVertexCache class. // ////////////////////////////////////////////////////////////////////// #include "VertexCache.h" #define SAFE_RELEASE(x) if (x) {x->Release(); x=NULL; } ////////////////////////////////////////////////////////////////////// // Construction/Destruction ////////////////////////////////////////////////////////////////////// CVertexCache::CVertexCache(UINT maxVertices,UINT maxIndices,UINT stride,DWORD fvf, DWORD processing) { // create the vertex buffer m_vBuf=NULL; lpDevice->CreateVertexBuffer(maxVertices*stride, D3DUSAGE_DYNAMIC|D3DUSAGE_WRITEONLY|processing, fvf,D3DPOOL_DEFAULT,&m_vBuf); // create the index buffer m_iBuf=NULL; lpDevice->CreateIndexBuffer(maxIndices*sizeof(WORD), D3DUSAGE_DYNAMIC|D3DUSAGE_WRITEONLY|processing, D3DFMT_INDEX16,D3DPOOL_DEFAULT,&m_iBuf); // clear the vertex and index counters m_numVertices=0; m_numIndices=0; // save buffer sizes, vertex format, and stride m_maxVertices=maxVertices; m_maxIndices=maxIndices; m_stride=stride; m_fvf=fvf; // clear buffer pointers m_indPtr=NULL; m_vertPtr=NULL; } CVertexCache::~CVertexCache() { // release vertex and index buffers SAFE_RELEASE(m_vBuf); SAFE_RELEASE(m_iBuf); } HRESULT CVertexCache::Draw(UINT numVertices, UINT numIndices, const WORD *pIndexData, const void *pVertexStreamZeroData) { HRESULT hr; // will this fit in the cache? if (m_numVertices+numVertices>m_maxVertices|| m_numIndices+numIndices>m_maxIndices) // no, flush the cache Flush(); // check to see if we have pointers into buffers, lock if needed if (!m_indPtr) if (FAILED(hr=m_iBuf->Lock(0,0,(BYTE **) &m_indPtr,D3DLOCK_DISCARD))) return hr; if (!m_vertPtr) if (FAILED(hr=m_vBuf->Lock(0,0,&m_vertPtr,D3DLOCK_DISCARD))) return hr; // copy the vertices into the cache memcpy(&m_vertPtr[m_stride*m_numVertices],pVertexStreamZeroData,m_stride*numVertices); // save current index count int startInd=m_numVertices; // loop through the indices for (int i=0;i<numIndices;i++) { // add the index m_indPtr[m_numIndices]=((pIndexData!=NULL)?pIndexData[i]:i)+startInd; // increment the index counter m_numIndices++; } // adjust vertex counter m_numVertices+=numVertices; // return success return S_OK; } HRESULT CVertexCache::Flush() { HRESULT hr; // unlock the vertex and index buffers if (m_indPtr) { m_iBuf->Unlock(); m_indPtr=NULL; } if (m_vertPtr) { m_vBuf->Unlock(); m_vertPtr=NULL; } // are there triangles to render? if (m_numIndices&&m_numVertices) // yes, render them if (FAILED(hr=lpDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0,m_numVertices, 0,m_numIndices/3))) return hr; // clear the vertex and index counters m_numVertices=0; m_numIndices=0; // return success return S_OK; } HRESULT CVertexCache::Start() { HRESULT hr; // set the index buffer, vertex buffer, and shader for the device lpDevice->SetIndices(m_iBuf,0); lpDevice->SetStreamSource(0,m_vBuf,m_stride); lpDevice->SetVertexShader(m_fvf); // clear the vertex and index counters m_numVertices=0; m_numIndices=0; // lock the vertex and index buffers m_indPtr=NULL; if (FAILED(hr=m_iBuf->Lock(0,0,(BYTE **) &m_indPtr,D3DLOCK_DISCARD))) return hr; m_vertPtr=NULL; if (FAILED(hr=m_vBuf->Lock(0,0,&m_vertPtr,D3DLOCK_DISCARD))) return hr; // return success return S_OK; } |
Visitors Since 1/1/2000:
|