Is Azure AI Foundry Meta Llama a quantized model?

Mark Ward 20

Is Azure AI Foundry Model catalog Meta models running quantized versions of the model?
I believe the Meta Llama models in the Model catalog are quantized.

I created a Serverless API deployment of Meta Llama 3.1 8B Instruct and Meta Llama 3.2 11B Vision Instruct and tested them. The results are similar to those of a Meta Llama 3.1 8B Instruct q4_0 and Meta Llama 3.2 11B Vision Instruct q4_0. I was expecting to see results similar to those of the FP16 versions of the models.

I have been using Ollama to run the Meta Llama model locally. I have benchmarks for the various quantized versions. The Azure AI Foundry deploy model isn't performing like the FP16; it is performing like the q4_0 model.

This impacts our ability to use this deployment because the model will not perform as needed.

I would like to know if these models are quantized.

Azar 24,365 Reputation points MVP

2024-12-21T20:01:08.88+00:00

Hi there Mark Ward

thanks for using QandA platform

guess the Azure AI Foundry Model Catalog are likely quantized, as this optimizes performance for serverless deployments by reducing memory usage, computational demands. Your observation abt the performance resembles the q4_0 quantized models rather than FP16 supports this.

If this helps kindly accept the answer thanks much.

Share via

Is Azure AI Foundry Meta Llama a quantized model?

Your answer