Streaming issue with @azure-rest/ai-inference package using Mistral-Large deployment

Francois Roland 0

I'm trying to use Mistral-Large-2407 model for chat completions via an azure AI services deployment. We followed the docs for deploying the model as a serverless deployment, we have the resource up and running, with the endpoint & the API key.

However, I'm struggling to get streaming responses when making calls from my node.js app

I have a fastify server running on node.js 22

I've followed the documentation to use the node.js @azure-rest/ai-inference package, create the modelClient and make the call with the stream: true parameter in the body as well as using the asNodeStream() function

But it seems that I'm not getting any chunk, and then suddenly, I get all the chunks all at once. I tried several approaches with the same result:

use async iterators
consume stream with on('data') callback
consume stream with on('readable') callback, and stream.read()
use the @azure/core-sse 's createSseStream function

I observe the same behaviour every time: it feels like the code waits for the full request to return, and just return the response as a stream all at once, when I would expect to get chunks progressively as I do on our azure open ai deployment or when using Mistral's api directly

I also tried to use the @mistralai/mistralai-azure package, but making a requests yields a 500 response.

I'm a bit lost here, any help would be much appreciated @Mistral

kothapally Snigdha 850 Reputation points Microsoft Vendor

2025-01-10T17:24:07.3466667+00:00
Hi Francois Roland,

Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.

I understand that you are encountering an issue with streaming responses from the Mistral-Large-2407 model when using Azure AI services in your Node.js application. can you please try these steps once.

Ensure that your request body includes the stream: true parameter correctly.

check that you are using the correct endpoint URL and API key for your Azure deployment. Look for any typographical errors.

Use as Node Stream (): When invoking the API, ensure you are using the as Node Stream () function properly to handle streaming.

Make sure you are correctly handling events such as data, end, and any potential errors. If you are not receiving chunks progressively, it might indicate an issue with how the stream is being consumed.

Implement comprehensive logging throughout your code to identify where it might be hanging or why chunks are not being emitted as expected. This will help in pinpointing the issue.

Utilize tools like Postman or browser developer tools to inspect the network requests being made to your Azure endpoint. Check if the requests are sent correctly and if there are any errors in the response.

Look for error messages in your application logs or in the Azure portal that may indicate issues with the deployment or API usage.

If you're using Server-Sent Events (SSE), ensure that you handle the text/event-stream content type correctly and parse incoming data appropriately. Each chunk should start with data: and end with a newline.

I hope this helps you. Thank you.

Share via

Streaming issue with @azure-rest/ai-inference package using Mistral-Large deployment

Your answer