Is Azure AI Foundry Meta Llama a quantized model?
Is Azure AI Foundry Model catalog Meta models running quantized versions of the model?
I believe the Meta Llama models in the Model catalog are quantized.
I created a Serverless API deployment of Meta Llama 3.1 8B Instruct and Meta Llama 3.2 11B Vision Instruct and tested them. The results are similar to those of a Meta Llama 3.1 8B Instruct q4_0 and Meta Llama 3.2 11B Vision Instruct q4_0. I was expecting to see results similar to those of the FP16 versions of the models.
I have been using Ollama to run the Meta Llama model locally. I have benchmarks for the various quantized versions. The Azure AI Foundry deploy model isn't performing like the FP16; it is performing like the q4_0 model.
This impacts our ability to use this deployment because the model will not perform as needed.
I would like to know if these models are quantized.