I've found an answer here (maybe not directly, but it gave me some ideas):
https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2552
In order to connect to the private endpoint to use speech to text service, you have to specify the endpoint manually, the same way as using the Endpoint + key authentication, e.g.:
var uri = new Uri($"wss://{domain}/stt/speech/recognition/conversation/cognitiveservices/v1");
This is mentioned here:
In order to authenticate using Microsoft Entra, just pass the received AAD token to the AuthorizationToken property, e.g.
config.AuthorizationToken = "my aad token";
Do not modify the authorization token like mentioned here:
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-configure-azure-ad-auth?tabs=portal&pivots=programming-language-csharp
var authorizationToken = $"aad#{resourceId}#{aadToken}";
Finally the code that is working for me is as follows:
var tokenRequest = new TokenRequestContext(["https://cognitiveservices.azure.com/.default"]);
var azureCredential = new WorkloadIdentityCredential();
var aadToken = await azureCredential.GetTokenAsync(tokenRequest, default);
var endpoint = "my-private-endpoint uri";
var domain = new Uri(endpoint).DnsSafeHost;
var uri = new Uri($"wss://{domain}/stt/speech/recognition/conversation/cognitiveservices/v1");
var config = SpeechConfig.FromEndpoint(uri);
config.AuthorizationToken = aadToken;
I hope this is helpful for anyone who struggles with authenticating with private endpoint using Microsoft Entra.
I hope the documentation will be updated to explain such case.