Azure Communication Services WebSocket Disconnection Issue

Mohammed Ibrahim 0 Reputation points
2025-02-26T13:26:59.1966667+00:00

Azure Communication Services WebSocket Disconnection Issue

Priority: High

Service: Azure Communication Services

Component: WebSocket Transcription

Summary

We're experiencing frequent WebSocket disconnections with status code 1006 (abnormal closure) when using Azure Communication Services transcription WebSockets, particularly after media operations like Text-to-Speech. These disconnections disrupt our call flow and require frequent reconnection attempts.

Environment Details

  • Service: Azure Communication Services
  • Client: Python FastAPI application
  • SDK Version: azure-communication-callautomation [latest version]
  • Connection Pattern: WebSocket TranscriptionOptions with TranscriptionTransportType.WEBSOCKET

Issue Details

  1. WebSocket abnormally closes (code 1006) approximately 30-40 seconds after establishing connection
  2. Connection failures increase dramatically after TTS operations
  3. Ping attempts fail with error: Unexpected ASGI message 'websocket.send', after sending 'websocket.close'
  4. Connection state becomes inconsistent between client and server
  5. Reconnection attempts often lead to the same pattern of failure

Reproduction Steps

  1. Establish WebSocket connection to ACS transcription service
  2. Send and receive messages successfully for ~15-30 seconds
  3. Execute a TTS media operation via call_connection_client.play_media_to_all()
  4. Observe WebSocket closing with code 1006 within seconds/minutes after the TTS operation

Logs and Evidence

Our logs show:

  • Connections disrupted with CloseCode.ABNORMAL_CLOSURE: 1006
  • Error messages: JSON ping failed and Text ping failed
  • Connection state errors: Cannot call "send" once a close message has been sent
  • Connection monitor reports: Connection has missed pings for 22.6s
  • Message: Skipping proactive refresh due to recent media operation

Workarounds Attempted

  1. Implemented multiple ping strategies (JSON, text, standard WebSocket ping)
  2. Added exponential backoff and jitter for reconnections
  3. Created proactive connection refreshing before timeout occurs
  4. Implemented special handling of connections during TTS operations
  5. Added connection state tracking and health monitoring

Technical Questions

  1. Is there a specific ping format or heartbeat mechanism ACS WebSockets expect?
  2. What is the exact timeout period for ACS WebSockets if no activity occurs?
  3. Is there a recommended way to maintain WebSocket connections during/after TTS operations?
  4. Are there any specific headers or parameters we should include in our WebSocket connections?
  5. Does ACS provide any official client libraries or recommendations for handling WebSocket reconnections?

Business Impact

These disconnections impact our call center operation by causing:

  • Interrupted customer conversations
  • Lost transcription data
  • Degraded user experience
  • Increased server load from frequent reconnection attempts

Thank you for your assistance in resolving this critical issue.


Would you like me to suggest any other modifications to your codebase to help with the connection stability?

Made changes.

Azure Communication Services
Azure Communication Services
An Azure communication platform for deploying applications across devices and platforms.
1,020 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Siva Nair 410 Reputation points Microsoft Vendor
    2025-02-27T08:40:57.1933333+00:00

    Hi @Mohammed Ibrahim,

     Welcome to Microsoft Q&A,

    1. ACS WebSockets utilize standard WebSocket protocols, which support ping/pong frames as a heartbeat mechanism to maintain connection vitality. Implementing regular ping messages from your client can help detect and prevent idle disconnections. Ensure that your server responds appropriately to these pings with pong frames. In a Python FastAPI application, you can implement a ping mechanism using the websockets library.
    2. While the specific idle timeout period for ACS WebSockets isn't explicitly documented, it's common for services to enforce timeouts on inactive connections to conserve resources. Implementing regular heartbeat messages, as described above, can help maintain the connection by ensuring periodic activity. Additionally, monitoring the connection for any missed pings and attempting reconnection strategies with exponential backoff can be beneficial.
    3. TTS operations can introduce additional load and potential delays in your application, which might affect WebSocket connections. To maintain stability:
    • Asynchronous Handling: Ensure that TTS operations are handled asynchronously, preventing them from blocking the main event loop and allowing the WebSocket to continue processing incoming and outgoing messages.
    • Resource Management: Monitor system resources during TTS operations to ensure that CPU or memory constraints aren't leading to unresponsive behavior, which could cause the WebSocket connection to drop.
    • Sequential Operations: If TTS operations are resource-intensive, consider queuing them to prevent multiple simultaneous operations from overwhelming the system.
    1. ACS WebSocket connections primarily rely on standard WebSocket protocols. However, ensuring that your WebSocket client correctly handles subprotocols and any required authentication headers is crucial. Refer to the ACS documentation for any specific requirements regarding headers or subprotocols.
    2. ACS offers client libraries that facilitate interaction with its services. While these libraries handle many aspects of communication, implementing robust reconnection logic is often left to the developer to accommodate specific application needs. Implementing exponential backoff strategies for reconnection attempts can prevent overwhelming the server and provide a more stable reconnection process. 

    note-

    • Logging and Monitoring: Implement comprehensive logging around your WebSocket connections and TTS operations. This will help identify patterns leading to disconnections and facilitate troubleshooting.
    • Regularly check the Azure Communication Services documentation and release notes for updates or changes that might affect WebSocket behavior or provide new features to enhance stability.

      If you have any further assistant, do let me know. 

    If the answer is helpful, please click Accept Answer and kindly upvote it so that other people who faces similar issue may get benefitted from it.

    0 comments No comments

  2. Mohammed Ibrahim 0 Reputation points
    2025-02-27T13:50:47.82+00:00

    in order to help humanity go forward😅,

    the solution was changing uvicorn server to daphne server, once I changed it the disconnct disappeared.

    I want to also say thanks to the great people who tried to help.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.