PushAudioInputStream write uses high CPU and memory when under load

Sai Vishnu Soudri 60 Reputation points
2025-02-18T09:57:02.8933333+00:00

Hi team,

I observe high CPU and Memory usage when sending audio using the PushAudioInputStream write method during load.

I am using the Java SDK version 1.42.0

Our use case involves getting multiple streams of Audio which we need to send to Azure for transcription. Each stream may have different configs such as language and timeouts.

We currently have a shared SpeechConfig and create a new SpeechRecognizer for every request/stream on which we call the startContinuousRecognitionAsync method.

For every incoming stream, we create a PushAudioInputStream and add that to AudioConfig which is provided to a SpeechRecognizer during initialization.

When we start receiving the audio, we write it using pushAudioInputStream.write method.

Need help with the following questions:

  1. Best practices for this use case, especially under load conditions.
  2. How does the PushAudioInputStream work internally? According to the documentation, the write method makes an internal copy of the data. This would result in increased CPU and memory consumption. Can anything be done here, especially under load conditions?
  3. Is there a way to multiplex multiple streams on the same SpeechRecognizer and map the outputs of event listeners to the appropriate stream?

Would appreciate a quick response.

Thanks,

Sai Vishnu Soudri

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,924 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,159 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Pavankumar Purilla 3,410 Reputation points Microsoft Vendor
    2025-02-19T00:49:12.59+00:00

    Hi Sai Vishnu Soudri,
    Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.

    Best Practices for High Load Conditions:

    Instead of creating a new SpeechRecognizer for each stream, consider reusing instances. This can help reduce the overhead associated with creating and destroying recognizers.

    Adjust the size of the audio buffers you are writing to the PushAudioInputStream. Smaller buffers can reduce memory usage but may increase CPU load.

    If you have multiple CPU cores available, distribute the processing load across them. This can help manage CPU usage more effectively.

    Internal Workings of PushAudioInputStream:

    The PushAudioInputStream works by making an internal copy of the data you write to it. This is necessary to ensure that the audio data is available for processing even after the original buffer is no longer in use. While this does increase memory usage, it is essential for the stability of the stream.

    Reducing CPU and Memory Consumption:

    Implement a buffer pool to reuse audio buffers instead of allocating new ones for each write operation. This can help reduce memory fragmentation and improve performance.

    Instead of writing small chunks of audio data frequently, try to batch the data and write larger chunks less frequently. This can reduce the overhead of the write operations.

    Multiplexing Multiple Streams:

    Currently, the SpeechRecognizer does not support multiplexing multiple streams directly. Each SpeechRecognizer instance is designed to handle a single audio stream. However, you can manage multiple recognizers in parallel and map the outputs to the appropriate streams using a custom implementation.

    I hope this information helps.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.