Streaming issue with @azure-rest/ai-inference package using Mistral-Large deployment

Francois Roland 0 Reputation points
2025-01-10T09:44:23.9966667+00:00

I'm trying to use Mistral-Large-2407 model for chat completions via an azure AI services deployment. We followed the docs for deploying the model as a serverless deployment, we have the resource up and running, with the endpoint & the API key.

However, I'm struggling to get streaming responses when making calls from my node.js app

I have a fastify server running on node.js 22

I've followed the documentation to use the node.js @azure-rest/ai-inference package, create the modelClient and make the call with the stream: true parameter in the body as well as using the asNodeStream() function

But it seems that I'm not getting any chunk, and then suddenly, I get all the chunks all at once. I tried several approaches with the same result:

  • use async iterators
  • consume stream with on('data') callback
  • consume stream with on('readable') callback, and stream.read()
  • use the @azure/core-sse 's createSseStream function

I observe the same behaviour every time: it feels like the code waits for the full request to return, and just return the response as a stream all at once, when I would expect to get chunks progressively as I do on our azure open ai deployment or when using Mistral's api directly

I also tried to use the @mistralai/mistralai-azure package, but making a requests yields a 500 response.

I'm a bit lost here, any help would be much appreciated @Mistral

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,041 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.