Azure Speech SDK offers real-time speaker diarization where you can distinguish between different speakers during live transcription.
How does it work? It is simple, it assigns unique speaker ID to each participant which allows you to identify who is speaking in real-time.
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization
When it comes to sentence segmentation during real-time transcription, you can use the continuous recognition feature of the Speech SDK, it processes speech input in a continuous way and provides transcription results with appropriate sentence boundaries.
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-speech-to-text
If you want to improve the low volume speech, I think audio preprocessing is a must where you will have more amplification and noise suppression techniques to improve audio clarity.
Or you can train a custom speech model with audio data that includes low-volume speech, you just need to provide the model examples of low-volume speech and it can learn to better transcribe such inputs.
https://azure.microsoft.com/en-us/blog/improve-speechtotext-accuracy-with-azure-custom-speech
This is a good tutorial you can use it : https://blog.gopenai.com/real-time-transcription-with-diarization-using-azure-speech-sdk-a9bd801499a8