Azure Machine Learning Endpoint Deployment

Gupta, Shalu 30 Reputation points
2024-11-22T10:58:37.6933333+00:00

Hello everyone,

I am facing an issue while deploying an Azure Machine Learning setup with private endpoints.

Here’s what I have done so far:

  1. Provisioned an Azure Storage account with private endpoints for both Blob and File services.
  2. Deployed an Azure Machine Learning workspace, Container Registry, Application Insights, and Key Vault.
  3. Configured all resources in a shared virtual network with private endpoints enabled.

The Issue: While running a training pipeline stage that registers a dataset, I encounter the following error:

Error in creating data asset with Exception:

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,989 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,250 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 12,976 Reputation points
    2024-11-22T12:30:55.59+00:00

    Hello Gupta, Shalu,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having issue while deploying an Azure Machine Learning setup with private endpoints.

    Most of all, the error you got, "Failed to establish a new connection: [Errno -2] Name or service not known," typically points to DNS resolution issues. However, the lists of a few steps below and links associated provide more steps and reading for better understand of related error on how you can resolve the issue.

    1. Your DNS settings should be correctly configured, if you're using Azure DNS, verify that the private DNS zones for privatelink.blob.core.windows.net and privatelink.file.core.windows.net are correctly linked to your virtual network - and if you're using a custom DNS solution, make sure it can resolve the necessary FQDNs. You can test this by running nslookup <fqdn> from a VM within the same virtual network. - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-secure-connection-workspace?view=azureml-api-2
    2. Double-check your private endpoints is associated with the appropriate resources and recreating the private endpoints can resolve configuration issues. - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-private-link?view=azureml-api-2
    3. Verify that your NSGs allow traffic to and from the necessary services.- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-private-link?view=azureml-api-2
    4. Should there be any proxy, it might be interfering with the connection. Try disabling the proxy temporarily to see if it resolves the issue. - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-secure-connection-workspace?view=azureml-api-2
    5. Use the Azure Machine Learning workspace diagnostic API to identify any configuration problems with your workspace. - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-secure-connection-workspace?view=azureml-api-2
    6. You've mentioned that the roles are configured, but it's worth double-checking that there are no missing permissions. - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-private-link?view=azureml-api-2.

    The above are areas the which the error might be troubleshoot. Except there is general (temporary) downtime and/or IP Address issue.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.