PushAudioInputStream write uses high CPU and memory when under load

Question

Hi team,

I observe high CPU and Memory usage when sending audio using the PushAudioInputStream write method during load.

I am using the Java SDK version 1.42.0

Our use case involves getting multiple streams of Audio which we need to send to Azure for transcription. Each stream may have different configs such as language and timeouts.

We currently have a shared SpeechConfig and create a new SpeechRecognizer for every request/stream on which we call the startContinuousRecognitionAsync method.

For every incoming stream, we create a PushAudioInputStream and add that to AudioConfig which is provided to a SpeechRecognizer during initialization.

When we start receiving the audio, we write it using pushAudioInputStream.write method.

Need help with the following questions:

Best practices for this use case, especially under load conditions.
How does the PushAudioInputStream work internally? According to the documentation, the write method makes an internal copy of the data. This would result in increased CPU and memory consumption. Can anything be done here, especially under load conditions?
Is there a way to multiplex multiple streams on the same SpeechRecognizer and map the outputs of event listeners to the appropriate stream?

Would appreciate a quick response.

Thanks,

Sai Vishnu Soudri

Answer

Hi Sai Vishnu Soudri,
Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.

Best Practices for High Load Conditions:

Instead of creating a new SpeechRecognizer for each stream, consider reusing instances. This can help reduce the overhead associated with creating and destroying recognizers.

Adjust the size of the audio buffers you are writing to the PushAudioInputStream. Smaller buffers can reduce memory usage but may increase CPU load.

If you have multiple CPU cores available, distribute the processing load across them. This can help manage CPU usage more effectively.

Internal Workings of PushAudioInputStream:

The PushAudioInputStream works by making an internal copy of the data you write to it. This is necessary to ensure that the audio data is available for processing even after the original buffer is no longer in use. While this does increase memory usage, it is essential for the stability of the stream.

Reducing CPU and Memory Consumption:

Implement a buffer pool to reuse audio buffers instead of allocating new ones for each write operation. This can help reduce memory fragmentation and improve performance.

Instead of writing small chunks of audio data frequently, try to batch the data and write larger chunks less frequently. This can reduce the overhead of the write operations.

Multiplexing Multiple Streams:

Currently, the SpeechRecognizer does not support multiplexing multiple streams directly. Each SpeechRecognizer instance is designed to handle a single audio stream. However, you can manage multiple recognizers in parallel and map the outputs to the appropriate streams using a custom implementation.

I hope this information helps.

Share via

PushAudioInputStream write uses high CPU and memory when under load

1 answer

Your answer