ResourceNotReady Error when deploying an imported Hugging Face model for inferencing

Jonathan Reyes 55 Reputation points
2024-01-19T16:32:24.3933333+00:00

Hello, I'm currently a student trying to create an endpoint and deploy a model from Hugging Face for inferencing. This is the model I have imported: https://huggingface.co/defog/sqlcoder-7b I am on an Azure for Students account and have a compute instance running on STANDARD_E4DS_V4. I have followed the following notebook within Azure ML Studio for importing this model: https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/import/import_model_into_registry.ipynb I'm able to import the model, but I'm encountering issues with deploying and trying to create an endpoint. I am then following this notebook to deploy my model: https://github.com/Azure/azureml-examples/blob/main/sdk/python/using-mlflow/deploy/mlflow_sdk_online_endpoints.ipynb

However, I am encountering this error when running the following cell:

deployment = deployment_client.create_deployment(
    name=deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)
................................................................................................................................................---------------------------------------------------------------------------
OperationFailed                           Traceback (most recent call last)
File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/polling/base_polling.py:466, in LROBasePolling.run(self)
    465 try:
--> 466     self._poll()
    468 except BadStatus as err:

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/polling/base_polling.py:500, in LROBasePolling._poll(self)
    499 if _failed(self.status()):
--> 500     raise OperationFailed("Operation failed or canceled")
    502 final_get_url = self._operation.get_final_get_url(self._pipeline_response)

OperationFailed: Operation failed or canceled

During handling of the above exception, another exception occurred:

HttpResponseError                         Traceback (most recent call last)
Cell In[23], line 1
----> 1 deployment = deployment_client.create_deployment(
      2     name=deployment_name,
      3     endpoint=endpoint_name,
      4     model_uri=f"models:/{model_name}/{version}",
      5     config={"deploy-config-file": deployment_config_path},
      6 )

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azureml/mlflow/deploy/deployment_client.py:137, in AzureMLDeploymentClient.create_deployment(self, name, model_uri, flavor, config, endpoint)
    134     deployment = self._v1_create_deployment(name, model_name, model_version, config,
    135                                             v1_deploy_config, no_wait)
    136 else:
--> 137     deployment = self._v2_create_deployment_new(name, model_name, model_version, v2_deploy_config, endpoint)
    139 if 'flavor' not in deployment:
    140     deployment['flavor'] = flavor if flavor else 'python_function'

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azureml/mlflow/deploy/deployment_client.py:522, in AzureMLDeploymentClient._v2_create_deployment_new(self, name, model_name, model_version, v2_deploy_config, endpoint)
    520 # Create Deployment using v2_deploy_config
    521 endpoint_name = endpoint if endpoint else name
--> 522 self._mir_client.create_online_deployment(deployment_config=v2_deploy_config,
    523                                           deployment_name=name,
    524                                           endpoint_name=endpoint_name, model_name=model_name,
    525                                           model_version=model_version)
    527 if not endpoint:
    528     _logger.info('Updating endpoint to serve 100 percent traffic to deployment {}'.format(name))

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azureml/mlflow/deploy/_mir/mir_deployment_client.py:132, in MirDeploymentClient.create_online_deployment(self, deployment_config, deployment_name, endpoint_name, model_name, model_version, **kwargs)
    130 if no_wait is False:
    131     _logger.info("Creating deployment {}".format(deployment_name))
--> 132     poller.result(timeout=3600)

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/polling/_poller.py:230, in LROPoller.result(self, timeout)
    222 def result(self, timeout: Optional[float] = None) -> PollingReturnType:
    223     """Return the result of the long running operation, or
    224     the result available after the specified timeout.
    225 
   (...)
    228     :raises ~azure.core.exceptions.HttpResponseError: Server problem with the query.
    229     """
--> 230     self.wait(timeout)
    231     return self._polling_method.resource()

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/core/tracing/decorator.py:76, in distributed_trace.
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,965 questions
{count} vote

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.