실시간 오디오 조작을 사용하여 사용자 지정 음성 효과 적용

아티클
07/12/2023

PlayFab 파티는 실시간 네트워킹 및 음성 채팅 솔루션입니다. 음성 채팅을 위해 구성된 경우 PlayFab 파티는 마이크 오디오를 전송하고 수정되지 않은 상태로 다시 재생합니다. 일부 게임은 공간 오디오 또는 음성 필터와 같은 사용자 지정 오디오 효과를 구현하기 위해 음성 채팅 오디오 버퍼에 액세스해야 합니다. 이 문서에서는 실시간 오디오 조작 기능을 사용하여 PlayFab 파티의 음성 채팅 오디오를 가로채고 수정하는 방법에 대한 연습을 제공합니다.

필수 구성 요소

이 연습에서는 PlayFab 파티의 음성 채팅에 대해 기본적으로 잘 알고 있다고 가정합니다.

플랫폼 지원

실시간 오디오 조작은 일부 플랫폼에서는 사용할 수 없습니다. 실시간 오디오 조작과 관련된 메서드는 통합 플랫폼 간 헤더에 있지만 현재 Windows, Xbox, PlayStation® 5용으로만 구현됩니다. 이러한 메서드는 다른 플랫폼에서는 오류를 반환합니다.

오디오 스트림

실시간 오디오 조작은 오디오를 라이브러리에서 검색하거나 라이브러리에 제출하기 위한 오디오 스트림의 개념을 도입합니다. 오디오 스트림에는 두 가지 유형이 있습니다. 첫 번째는 원본 스트림입니다. 원본 스트림은 채팅 컨트롤에서 오디오를 검색하는 데 사용됩니다. 각 채팅 컨트롤에는 음성 스트림이라고 하는 단일 원본 스트림만 있을 수 있습니다. 로컬 채팅 컨트롤의 경우 이는 마이크 입력을 검색하는 데 사용됩니다. 원격 채팅 컨트롤의 경우에는 수신 음성 오디오를 검색하는 데 사용됩니다. 채팅 컨트롤에 대한 음성 스트림이 있는 경우 라이브러리는 해당 채팅 컨트롤의 원본 오디오를 자동으로 처리하는 대신 음성 스트림으로 리디렉션합니다. 로컬 채팅 컨트롤의 경우 이는 마이크 오디오를 자동으로 인코딩하고 전송하는 대신 음성 스트림으로 리디렉션하는 것을 의미합니다. 원격 채팅 컨트롤의 경우 이는 수신 음성 오디오를 재생할 각 로컬 채팅 컨트롤에 자동으로 제출하는 대신 음성 스트림으로 리디렉션하는 것을 의미합니다. 원본 스트림은 PartyAudioManipulationSourceStream로 표시됩니다.

두 번째 스트림 유형은 싱크 스트림입니다. 싱크 스트림은 채팅 컨트롤에 오디오를 제출하는 데 사용됩니다. 로컬 채팅 컨트롤만 싱크 스트림을 가질 수 있으며 각각 두 개를 가질 수 있습니다. 이를 캡처 스트림 그리고 렌더링 스트림이라고 합니다. 채팅 컨트롤에 대한 캡처 스트림이 있는 경우 라이브러리는 캡처 스트림에서 인코딩할 오디오를 가져와 마이크 대신 다른 채팅 컨트롤에 전송합니다. 채팅 컨트롤에 대한 렌더링 스트림이 있는 경우 라이브러리는 렌더링 스트림에서 오디오를 가져오고 원격 채팅 컨트롤에서 자동으로 재생되는 음성 채팅 오디오 외에 추가로 재생합니다. 캡처 스트림에 제출된 오디오는 로컬 채팅 컨트롤의 마이크 입력으로 사용됩니다. 렌더링 스트림에 제출된 오디오는 로컬 채팅 컨트롤의 오디오 출력 디바이스로 재생되거나 "렌더링"됩니다. 싱크 스트림은 PartyAudioManipulationSinkStream로 표시됩니다.

오디오 스트림 구성

기본적으로 라이브러리는 오디오 검색, 전송 및 재생을 처리합니다. 따라서 채팅 컨트롤은 오디오 스트림 없이 만들어집니다. 스트림 구성 메서드(PartyLocalChatControl::ConfigureAudioManipulationCaptureStream(), PartyLocalChatControl::ConfigureAudioManipulationRenderStream(), PartyChatControl::ConfigureAudioManipulationVoiceStream())을 통해 채팅 컨트롤에 대한 스트림을 하나 이상 만들 수 있습니다. 구성된 후에는 PartyLocalChatControl::GetAudioManipulationCaptureStream(), PartyLocalChatControl::GetAudioManipulationRenderStream(), PartyChatControl::GetAudioManipulationVoiceStream()을(를) 통해 스트림을 검색할 수 있습니다.

각 스트림 구성 방법을 사용하면 스트림에서 검색하거나 스트림에 제출할 오디오의 형식을 지정할 수 있습니다. 지원되는 형식에 대한 자세한 내용은 각 스트림 구성 방법에 대한 참조 설명서를 참조하세요.

원본 스트림에서 오디오 검색

PartyAudioManipulationSourceStream::GetNextBuffer()을(를) 통해 원본 스트림에서 오디오를 검색할 수 있습니다. 음성 활동이 감지되면 약 40ms마다 새 버퍼를 사용할 수 있습니다. 사용할 수 있는 버퍼가 없으면 호출이 성공하고 길이가 0인 버퍼를 제공합니다. 즉시 사용할 수 있는 총 버퍼 수는 PartyAudioManipulationSourceStream::GetAvailableBufferCount()을(를) 통해 검색할 수 있습니다.

효율성을 위해 GetNextBuffer()는 전체 버퍼를 복사하는 대신 라이브러리의 메모리를 가리키는 버퍼를 제공합니다. 이는 필요에 따라 수정할 수 있습니다. 버퍼 처리가 완료되면 라이브러리가 메모리를 회수할 수 있도록 PartyAudioManipulationSourceStream::ReturnBuffer()을(를) 통해 버퍼를 해제해야 합니다. 반환되기 전에 여러 버퍼를 검색할 수 있으며 버퍼를 검색한 순서대로 반환할 필요가 없습니다.

싱크 스트림에 오디오 제출

PartyAudioManipulationSinkStream::SubmitBuffer()을(를) 통해 싱크 스트림에 오디오를 제출할 수 있습니다. 버퍼는 라이브러리에서 복사되며 호출이 완료된 후 즉시 해제할 수 있습니다.

40ms마다 라이브러리는 싱크 스트림에 제출된 40ms의 오디오를 사용합니다. 오디오의 문제를 방지하려면 오디오를 일정한 속도로 제출해야 합니다.

시나리오

마이크 오디오 조작(예: 사전 인코딩 버퍼 조작)

마이크 오디오 조작은 마이크 오디오가 다른 채팅 컨트롤로 전송되기 전에 가로채고 변경하는 동작입니다. 마이크 오디오가 인코딩되어 다른 채팅 컨트롤로 전송되기 전에 수정되기 때문에 이를 "사전 인코딩 버퍼 조작"이라고도 합니다. 로컬 채팅 컨트롤에 대해 이 시나리오를 구현하려면 먼저 로컬 채팅 컨트롤에 대한 음성 스트림과 캡처 스트림을 구성합니다. 구성되면 전용 오디오 스레드의 각 틱에 의해 호출되어 해당 단일 채팅 컨트롤의 마이크 오디오를 처리하는 함수는 다음과 같을 수 있습니다.

// An app-defined function that takes a microphone buffer and generates a new
// buffer that should be transmitted to other chat controls.
std::vector<uint8_t>
ProcessLocalVoiceBuffer(
    PartyMutableDataBuffer* inputBuffer
    );

void
ProcessLocalMicrophoneAudioForSingleChatControl(
    PartyLocalChatControl* chatControl
    )
{
    // Get the voice stream from which we want to retrieve audio. This provides
    // the audio generated by the chat control's input device.
    PartyAudioManipulationSourceStream* voiceStream;
    RETURN_VOID_IF_FAILED(chatControl->GetAudioManipulationVoiceStream(&voiceStream));

    // Get the capture stream to which we want to submit audio. This is used to
    // submit audio that will be transmitted to other chat controls.
    PartyAudioManipulationSinkStream* captureStream;
    RETURN_VOID_IF_FAILED(chatControl->GetAudioManipulationCaptureStream(&captureStream));

    // Get the next audio buffer from the voice stream.
    PartyMutableDataBuffer buffer;
    RETURN_VOID_IF_FAILED(voiceStream->GetNextBuffer(&buffer));

    // If we retrieved a buffer, process it.
    if (buffer.bufferByteCount > 0)
    {
        // Use the buffer we retrieved to generate a new buffer that will be
        // treated as the "real" capture input and transmitted to other chat
        // controls.
        std::vector<uint8_t> processedBuffer = ProcessLocalVoiceBuffer(&buffer);

        // Convert the buffer to a Party type.
        PartyDataBuffer partyBuffer;
        partyBuffer.bufferByteCount = static_cast<uint32_t>(processedBuffer.size());
        partyBuffer.buffer = processedBuffer.data();

        // Submit the processed buffer to the capture stream.
        PartyError error = captureStream->SubmitBuffer(&partyBuffer);
        if (PARTY_FAILED(error))
        {
            printf("Failed to submit buffer to sink stream! error = 0x%08x", error);
        }

        // Return the original buffer back to the voice stream.
        error = voiceStream->ReturnBuffer(buffer.buffer);
        if (PARTY_FAILED(error))
        {
            printf("Failed to return buffer to source stream! error = 0x%08x", error);
        }
    }
}

원격 오디오 조작(예: 디코딩 후 버퍼 조작)

원격 오디오 조작은 각 로컬 채팅 컨트롤에 렌더링되기 전에 수신 오디오를 가로채고 변경하는 동작입니다. 수신 오디오가 디코딩된 후 렌더링되기 전에 수정되기 때문에 이를 "사후 디코딩 버퍼 조작"이라고도 합니다. 이 시나리오를 구현하려면 먼저 각 원격 채팅 컨트롤에 대한 음성 스트림과 각 로컬 채팅 컨트롤에 대한 렌더링 스트림을 구성합니다. 그런 다음 오디오 스레드의 각 틱은 각 음성 스트림에서 오디오를 가져오고, 필요에 따라 효과를 적용하는 동안 오디오를 단일 스트림으로 혼합하고, 혼합 버퍼를 각 렌더링 스트림에 제출해야 합니다. 게임 시나리오에 따라 각 로컬 채팅 컨트롤에 대해 버퍼를 다양한 스트림에 혼합해야 할 수 있습니다. 수신 음성 오디오를 처리하기 위해 전용 오디오 스레드의 각 틱에 의해 호출되는 함수는 다음과 같습니다.

// This is an app-defined function that takes a local chat control and list of remote voice buffers and generates
// a single mixed buffer to submit to the local chat control's audio output.
std::vector<uint8_t>
GetOutputMixBuffer(
    PartyLocalChatControl& localChatControl,
    const std::map<PartyAudioManipulationSourceStream*, PartyMutableDataBuffer>& remoteVoiceBuffers
    );

void
ProcessRemoteVoiceAudio(
    const std::vector<PartyChatControl*>& remoteChatControls,
    const std::vector<PartyLocalChatControl*>& localChatControls
    )
{
    std::map<PartyAudioManipulationSourceStream*, PartyMutableDataBuffer> remoteVoiceBuffers;

    // Acquire voice buffers from each remote chat control.
    for (auto remoteChatControl : remoteChatControls)
    {
        // Get the voice stream for this chat control from which we will retrieve audio.
        PartyAudioManipulationSourceStream* voiceStream;
        PartyError error = remoteChatControl->GetAudioManipulationVoiceStream(&voiceStream);
        if (PARTY_FAILED(error))
        {
            printf("Failed to get voice stream! error = 0x%08x", error);
            continue;
        }

        // Get the next audio buffer from the voice stream.
        PartyMutableDataBuffer buffer;
        error = voiceStream->GetNextBuffer(&buffer);
        if (PARTY_FAILED(error))
        {
            printf("Failed to get next buffer! error = 0x%08x", error);
            continue;
        }

        // If we retrieved a buffer, cache it in the map for mixing.
        if (buffer.bufferByteCount > 0)
        {
            remoteVoiceBuffers[voiceStream] = buffer;
        }
    }

    // If we didn't acquire any source buffers, we don't have anything to mix.
    if (remoteVoiceBuffers.empty())
    {
        return;
    }

    // Mix the voice buffers and submit to each render stream.
    for (auto localChatControl : localChatControls)
    {
        // Get the render stream for this chat control to which we will submit audio.
        PartyAudioManipulationSinkStream* renderStream;
        PartyError error = localChatControl->GetAudioManipulationRenderStream(&renderStream);
        if (PARTY_FAILED(error))
        {
            printf("Failed to get render stream! error = 0x%08x", error);
            continue;
        }

        // Mix the buffers the buffers to generate a new, mixed buffer.
        std::vector<uint8_t> mixedBuffer = GetOutputMixBuffer(*localChatControl, remoteVoiceBuffers);

        // Convert the buffer to a party type.
        PartyDataBuffer partyBuffer;
        partyBuffer.bufferByteCount = static_cast<uint32_t>(mixedBuffer.size());
        partyBuffer.buffer = mixedBuffer.data();

        // Submit the mixed buffer to the render stream.
        error = renderStream->SubmitBuffer(&partyBuffer);
        if (PARTY_FAILED(error))
        {
            printf("Failed to submit buffer to render stream! error = 0x%08x", error);
        }
    }

    // Release the voice buffers.
    for (auto voiceBuffer : remoteVoiceBuffers)
    {
        // Return the voice buffer that we had cached from this voice stream.
        PartyError error = voiceBuffer.first->ReturnBuffer(voiceBuffer.second.buffer);
        if (PARTY_FAILED(error))
        {
            printf("Failed to return buffer! error = 0x%08x", error);
        }
    }
}

개인 정보 보호 및 믹싱 고려 사항

원격 채팅 컨트롤이 오디오 및 채팅 권한을 생성하고 음소거 구성을 통해 오디오를 하나 이상의 로컬 채팅 컨트롤에서 재생할 수 있는 한 라이브러리는 원격 채팅 컨트롤의 음성 스트림을 통해 오디오를 제공합니다. 오디오를 한 로컬 채팅 컨트롤에 대해 재생해야 하지만 다른 채팅 컨트롤에 대해 재생하지 않아야 하는 경우 후자의 채팅 컨트롤에 대한 오디오 믹스에서 오디오를 생략해야 합니다.

채팅 표시기 고려 사항

원격 채팅 컨트롤에 대한 음성 스트림 구성은 채팅 표시기에 영향을 주지 않습니다. 올바른 UI 표시기를 선택하려면 채팅 표시기와 믹싱 논리 간의 차이를 조정하는 논리를 구현해야 할 수 있습니다. 예를 들어 채팅 표시기는 채팅 컨트롤이 말하고 있음을 나타낼 수 있지만 사용자 지정 믹싱 논리는 오디오를 삭제하도록 선택할 수 있습니다.

혼합 시나리오

일부 시나리오에서는 일부 채팅 컨트롤에 대해 오디오 조작을 사용하도록 설정할 수 있지만 다른 채팅 컨트롤에는 오디오 조작을 사용하도록 설정할 수 없습니다. 예를 들어, 게임 매치 도중에 직면한 상대 플레이어에게 효과를 적용하고, 같은 팀의 플레이어에게는 효과를 적용하지 않으려 할 수 있습니다. 이러한 시나리오에서는 이전에 원격 오디오 조작에 대해 설명한 단계를 수행하면서 오디오 효과를 적용하려는 채팅 컨트롤에 대한 음성 스트림만 구성할 수 있습니다. 음소거 및 권한 구성에서 허용하는 한 나머지 원격 채팅 컨트롤의 오디오는 로컬 채팅 컨트롤에 자동으로 렌더링됩니다.

다음을 통해 공유