Is Azure AI Foundry Meta Llama a quantized model?

Mark Ward 20 Reputation points
2024-12-21T19:40:17.6833333+00:00

Is Azure AI Foundry Model catalog Meta models running quantized versions of the model?
I believe the Meta Llama models in the Model catalog are quantized.

I created a Serverless API deployment of Meta Llama 3.1 8B Instruct and Meta Llama 3.2 11B Vision Instruct and tested them. The results are similar to those of a Meta Llama 3.1 8B Instruct q4_0 and Meta Llama 3.2 11B Vision Instruct q4_0. I was expecting to see results similar to those of the FP16 versions of the models.

I have been using Ollama to run the Meta Llama model locally. I have benchmarks for the various quantized versions. The Azure AI Foundry deploy model isn't performing like the FP16; it is performing like the q4_0 model.

This impacts our ability to use this deployment because the model will not perform as needed.

I would like to know if these models are quantized.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,999 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.