多引擎 n 主體重力模擬
D3D12nBodyGravity範例示範如何以非同步方式執行計算工作。 此範例會啟動一些執行緒,每個執行緒都有一個計算命令佇列,並在執行 n 主體重力模擬的 GPU 上排程計算工作。 每個執行緒都會在兩個緩衝區上運作,其中包含位置和速度資料。 每次反復專案時,計算著色器都會從一個緩衝區讀取目前的位置和速度資料,並將下一個反復專案寫入另一個緩衝區。 反復專案完成時,計算著色器會交換哪一個緩衝區是用於讀取位置/速度資料的 SRV,而這是用來寫入位置/速度更新的 UAV,方法是變更每個緩衝區上的資源狀態。
建立根簽章
首先,我們會在 LoadAssets 方法中建立圖形和計算根簽章。 這兩個根簽章都有根常數緩衝區檢視, (CBV) 和著色器資源檢視 (SRV) 描述中繼資料表。 計算根簽章也有未排序的存取檢視, (UAV) 描述中繼資料表。
// Create the root signatures.
{
CD3DX12_DESCRIPTOR_RANGE ranges[2];
ranges[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1, 0);
ranges[1].Init(D3D12_DESCRIPTOR_RANGE_TYPE_UAV, 1, 0);
CD3DX12_ROOT_PARAMETER rootParameters[RootParametersCount];
rootParameters[RootParameterCB].InitAsConstantBufferView(0, 0, D3D12_SHADER_VISIBILITY_ALL);
rootParameters[RootParameterSRV].InitAsDescriptorTable(1, &ranges[0], D3D12_SHADER_VISIBILITY_VERTEX);
rootParameters[RootParameterUAV].InitAsDescriptorTable(1, &ranges[1], D3D12_SHADER_VISIBILITY_ALL);
// The rendering pipeline does not need the UAV parameter.
CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc;
rootSignatureDesc.Init(_countof(rootParameters) - 1, rootParameters, 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);
ComPtr<ID3DBlob> signature;
ComPtr<ID3DBlob> error;
ThrowIfFailed(D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, &error));
ThrowIfFailed(m_device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&m_rootSignature)));
// Create compute signature. Must change visibility for the SRV.
rootParameters[RootParameterSRV].ShaderVisibility = D3D12_SHADER_VISIBILITY_ALL;
CD3DX12_ROOT_SIGNATURE_DESC computeRootSignatureDesc(_countof(rootParameters), rootParameters, 0, nullptr);
ThrowIfFailed(D3D12SerializeRootSignature(&computeRootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, &error));
ThrowIfFailed(m_device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&m_computeRootSignature)));
}
建立 SRV 和 UAV 緩衝區
SRV 和 UAV 緩衝區是由位置和速度資料的陣列所組成。
// Position and velocity data for the particles in the system.
// Two buffers full of Particle data are utilized in this sample.
// The compute thread alternates writing to each of them.
// The render thread renders using the buffer that is not currently
// in use by the compute shader.
struct Particle
{
XMFLOAT4 position;
XMFLOAT4 velocity;
};
通話流程 | 參數 |
---|---|
XMFLOAT4 |
建立 CBV 和頂點緩衝區
針對圖形管線,CBV 是結構, 其中包含幾何著色器所使用的兩個矩陣。 幾何著色器會取得系統中每個物件的位置,並產生四邊形,以使用這些矩陣來表示它。
struct ConstantBufferGS
{
XMMATRIX worldViewProjection;
XMMATRIX inverseView;
// Constant buffers are 256-byte aligned in GPU memory. Padding is added
// for convenience when computing the struct's size.
float padding[32];
};
通話流程 | 參數 |
---|---|
XMMATRIX |
因此,頂點著色器所使用的頂點緩衝區實際上不包含任何位置資料。
// "Vertex" definition for particles. Triangle vertices are generated
// by the geometry shader. Color data will be assigned to those
// vertices via this struct.
struct ParticleVertex
{
XMFLOAT4 color;
};
通話流程 | 參數 |
---|---|
XMFLOAT4 |
針對計算管線,CBV 是 結構 ,其中包含計算著色器中 n 主體重力模擬所使用的一些常數。
struct ConstantBufferCS
{
UINT param[4];
float paramf[4];
};
同步處理轉譯和計算執行緒
緩衝區全部初始化之後,轉譯和計算工作就會開始。 計算執行緒會在模擬上逐一查看時,在 SRV 與 UAV 之間來回變更兩個位置/速度緩衝區的狀態,而且轉譯執行緒必須確定它會在 SRV 上運作的圖形管線上排程工作。 柵欄可用來同步存取這兩個緩衝區。
在轉譯執行緒上:
// Render the scene.
void D3D12nBodyGravity::OnRender()
{
// Let the compute thread know that a new frame is being rendered.
for (int n = 0; n < ThreadCount; n++)
{
InterlockedExchange(&m_renderContextFenceValues[n], m_renderContextFenceValue);
}
// Compute work must be completed before the frame can render or else the SRV
// will be in the wrong state.
for (UINT n = 0; n < ThreadCount; n++)
{
UINT64 threadFenceValue = InterlockedGetValue(&m_threadFenceValues[n]);
if (m_threadFences[n]->GetCompletedValue() < threadFenceValue)
{
// Instruct the rendering command queue to wait for the current
// compute work to complete.
ThrowIfFailed(m_commandQueue->Wait(m_threadFences[n].Get(), threadFenceValue));
}
}
// Record all the commands we need to render the scene into the command list.
PopulateCommandList();
// Execute the command list.
ID3D12CommandList* ppCommandLists[] = { m_commandList.Get() };
m_commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
// Present the frame.
ThrowIfFailed(m_swapChain->Present(0, 0));
MoveToNextFrame();
}
通話流程 | 參數 |
---|---|
InterlockedExchange | |
InterlockedGetValue | |
GetCompletedValue | |
Wait | |
ID3D12CommandList | |
ExecuteCommandLists | |
IDXGISwapChain1::Present1 |
為了簡化範例,計算執行緒會等候 GPU 完成每個反復專案,再排程更多計算工作。 實際上,應用程式可能會想要讓計算佇列保持完整,以達到 GPU 的最大效能。
在計算執行緒上:
DWORD D3D12nBodyGravity::AsyncComputeThreadProc(int threadIndex)
{
ID3D12CommandQueue* pCommandQueue = m_computeCommandQueue[threadIndex].Get();
ID3D12CommandAllocator* pCommandAllocator = m_computeAllocator[threadIndex].Get();
ID3D12GraphicsCommandList* pCommandList = m_computeCommandList[threadIndex].Get();
ID3D12Fence* pFence = m_threadFences[threadIndex].Get();
while (0 == InterlockedGetValue(&m_terminating))
{
// Run the particle simulation.
Simulate(threadIndex);
// Close and execute the command list.
ThrowIfFailed(pCommandList->Close());
ID3D12CommandList* ppCommandLists[] = { pCommandList };
pCommandQueue->ExecuteCommandLists(1, ppCommandLists);
// Wait for the compute shader to complete the simulation.
UINT64 threadFenceValue = InterlockedIncrement(&m_threadFenceValues[threadIndex]);
ThrowIfFailed(pCommandQueue->Signal(pFence, threadFenceValue));
ThrowIfFailed(pFence->SetEventOnCompletion(threadFenceValue, m_threadFenceEvents[threadIndex]));
WaitForSingleObject(m_threadFenceEvents[threadIndex], INFINITE);
// Wait for the render thread to be done with the SRV so that
// the next frame in the simulation can run.
UINT64 renderContextFenceValue = InterlockedGetValue(&m_renderContextFenceValues[threadIndex]);
if (m_renderContextFence->GetCompletedValue() < renderContextFenceValue)
{
ThrowIfFailed(pCommandQueue->Wait(m_renderContextFence.Get(), renderContextFenceValue));
InterlockedExchange(&m_renderContextFenceValues[threadIndex], 0);
}
// Swap the indices to the SRV and UAV.
m_srvIndex[threadIndex] = 1 - m_srvIndex[threadIndex];
// Prepare for the next frame.
ThrowIfFailed(pCommandAllocator->Reset());
ThrowIfFailed(pCommandList->Reset(pCommandAllocator, m_computeState.Get()));
}
return 0;
}
執行範例