Azure ML job logging issues - Transformers model
I am working in azure trying to run a job that calls a training notebook. I can train and even evaluate my model just fine within said notebook but when I try to log it at the end it throws errors. The error that I am seeing is
[0;31mHFValidationError[0m: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './models/finetuned_llama3/'. Use
repo_type argument if needed.
From some research it seems that this means that it is trying to pull straight from hugging face based on my artifact path. I know that the the model exists where I am referencing because I am logging the directory and can see it exists there. I have tried setting arguments and environment variables telling it not to look for a repo with no success.
Here is what my logging logic looks like:
job_model_path = 'models/finetuned_llama3'
peft_model = AutoPeftModelForCausalLM.from_pretrained(
job_model_path,
config=LoraConfig(
r=lora_config_dict["r"],
lora_alpha=lora_config_dict["lora_alpha"],
target_modules=lora_config_dict["target_modules"],
lora_dropout=lora_config_dict["lora_dropout"],
bias=lora_config_dict["bias"],
task_type=lora_config_dict["task_type"] ),
device_map="cuda"
)
peft_model.model.config.quantization_config.use_exllama = True
peft_model.model.config.quantization_config.exllama_config = {"version": 2}
mlflow.transformers.log_model(
transformers_model={"model": peft_model, "tokenizer": tokenizer},
artifact_path="finetuned_llama3", # Ensure the artifact path is correct
registered_model_name="huggingface-finetuned-model",
task="text-generation" # Specify the task type here
)
When I try to log the model in this manner in an ML studio notebook it works as expected so it’s something with how we configure the job
Being that the mlflow flavor is relatively new it has been hard to find a ton of stuff out there about it. I have tried to find other posts / forums about this issue but haven’t found anything that was helpful. GPT and Copilot seem to have no clue how to solve my issue either.
I’ve seen people say that my artifact path cannot look like a full URL so I have changed that variable many times from full URLs to relative ones. I have also played around with my ‘transformers_model’ argument inputs from referencing the objects to just inputting the path.
I am expecting this to log a model to the azure model registry.
For reference this is the model we are finetuning: (astronomer/Llama-3-8B-Instruct-GPTQ-8-Bit · Hugging Face)
I've posted this question on on the hugging face forums and stackoverflow but this seems like it is an azure specific issue.