Streaming issue with @azure-rest/ai-inference package using Mistral-Large deployment
I'm trying to use Mistral-Large-2407 model for chat completions via an azure AI services deployment. We followed the docs for deploying the model as a serverless deployment, we have the resource up and running, with the endpoint & the API key.
However, I'm struggling to get streaming responses when making calls from my node.js app
I have a fastify server running on node.js 22
I've followed the documentation to use the node.js @azure-rest/ai-inference package, create the modelClient and make the call with the stream: true
parameter in the body as well as using the asNodeStream()
function
But it seems that I'm not getting any chunk, and then suddenly, I get all the chunks all at once. I tried several approaches with the same result:
- use async iterators
- consume stream with
on('data')
callback - consume stream with
on('readable')
callback, and stream.read() - use the @azure/core-sse 's createSseStream function
I observe the same behaviour every time: it feels like the code waits for the full request to return, and just return the response as a stream all at once, when I would expect to get chunks progressively as I do on our azure open ai deployment or when using Mistral's api directly
I also tried to use the @mistralai/mistralai-azure package, but making a requests yields a 500 response.
I'm a bit lost here, any help would be much appreciated @Mistral