How to call AzureML Batch Model endpoint from Azure Data Factory (ADF Pipeline).
Uzing Azure ML I am able to create and deploy models to a batch endpoint and run them through the UI. When I try and execute them through the Machine Learning Execute Pipeline activity in ADF I get the error "Data set node xxx references parameter dataset_param which doesn't have a specified value or a default value." and the pipeline fails. I have tried following the data here: https://learn.microsoft.com/en-us/answers/questions/2006922/providing-data-paths-to-azure-machine-learning-pip but I am still not getting the data sent through the call.
This is the setup I have in ADF. Any help would be awesome.
Azure Data Factory
-
Ganesh Gurram • 5,135 Reputation points • Microsoft External Staff
2025-02-04T06:25:37.13+00:00 Hi @Kirby Jackson
Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.
To call an Azure Machine Learning (Azure ML) batch model endpoint from an Azure Data Factory (ADF) pipeline, you need to ensure that the parameters for your datasets are correctly configured. The error message you are encountering indicates that the dataset parameter (
dataset_param
) does not have a specified value or a default value.Here are some steps to troubleshoot and resolve the issue:
Parameterize the Dataset - Make sure that you have parameterized your dataset correctly in the ADF pipeline. You can do this by selecting the dataset in the designer and setting it as a pipeline parameter.
Provide Default Values - Ensure that the parameters you are using in your dataset have default values assigned. This can be done in the pipeline parameters section.
Check Dataset Configuration - Verify that the datasets you are using are registered correctly in your Azure ML workspace and are in the supported format (e.g., .csv for tabular datasets).
Pipeline Configuration - When configuring the ADF pipeline, ensure that you are passing the correct values for all required parameters, especially those related to the dataset.
Use the Correct Activity - Make sure you are using the appropriate activity in ADF to invoke the batch endpoint. The "Web Activity" can be used to call the batch endpoint directly, or you can use the "Azure Machine Learning" activity if you are working with Azure ML assets.
By following these steps, you should be able to resolve the issue and successfully call the Azure ML batch model endpoint from your ADF pipeline.
For more details refer to these documentations:
I hope this information helps.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
-
Ganesh Gurram • 5,135 Reputation points • Microsoft External Staff
2025-02-05T05:46:17.1266667+00:00 @Kirby Jackson - We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
KirbyJackson-1614 • 5 Reputation points
2025-02-05T14:35:04.2233333+00:00 Ganesh, thank you for the response. I looked through both links and tried a few things yesterday. Also, the pipelines I am trying to run were not built using the designer they were created through a code first approach as I have many different models to train with a lot of feature engineering to happen so the designer version did not fit my application well. First, I tried adding in the dataset_param as a parameter into the Azure Activity which gave me the same issues as before, and same error messages.
Also, I tried publishing on the of the pipelines that did run to a pipeline endpoint in AzureML. I can call that one from ADF but it just runs the same dataset each time and won't recognize that I am sending in the parameter for a different version of the dataset.
-
Ganesh Gurram • 5,135 Reputation points • Microsoft External Staff
2025-02-06T14:01:14.68+00:00 @Kirby Jackson - Thanks for providing the additional screenshots and details. It looks like the main confusion is how the input data gets to your "FeatureEngineeringScore" component within your Azure ML pipeline.
dataset_param
is meant to hold the input data for your "FeatureEngineeringScore" component (like the "34" you're showing). It's not directly related to the output path ("output_data/"). Think of it this way:dataset_param
is what the component uses to do its work; the output path is where it puts the results.Identify the Correct Input Spot - Inside your Azure ML pipeline, find the "FeatureEngineeringScore" component. It has a specific place where it expects to receive its input. This place has a name (e.g., "input_data," "training_data," or something similar). This is the most important thing to find*.* It's not likely called
dataset_param
.Link the Input in ADF - In your Azure Data Factory pipeline, you have an activity that runs your Azure ML pipeline. In that activity, you need to connect your
dataset_param
to that specific input spot you found in step 1. You're essentially saying, "Take the value ofdataset_param
(like '34') and give it to the 'FeatureEngineeringScore' component at its 'input_data' (or whatever it's called) spot."Set Up
dataset_param
in ADF - In your ADF pipeline, you need to definedataset_param
. This is where you'll put the actual data that the "FeatureEngineeringScore" component needs (like "34"). When you run the ADF pipeline, you can change this value to provide different input data.Output Path is Internal - The "output_data/" path is something the "FeatureEngineeringScore" component handles itself. You set that up inside the component's settings, not in your ADF pipeline.
I hope this information helps!
Thank you.
-
KirbyJackson-1614 • 5 Reputation points
2025-02-06T15:30:22.5666667+00:00 Ganesh, thank you again for the information. I am still working on finding the exact input from the AzureML pipeline. I have a few things I am finding but nothing seems to be exactly what I am looking for.
From the ML Pipeline that I call from ADF, I can see how the code to run a job would look. This made me think I should be sending in as a parameter either "data_asset" or "inputs" but neither of those changed anything.
Also, when I am in the details of the original pipeline, I can see the settings for the pipeline. I can see that there is a dataset_param that should be editable.
But each time I change that to a different datsource and hit save nothing happens and it goes back to the original dataset that was first submitted.
Also, when I try to change it by setting up a new pipeline job through the UI I am seeing this message.
I think that is telling me that it is not suppored to modify this type of asset.
Do you think this is something to do with my orignal job setup or something in AzureML itself?
-
Ganesh Gurram • 5,135 Reputation points • Microsoft External Staff
2025-02-06T19:28:32.03+00:00 @Kirby Jackson - Apologies for the inconvenience.
Please reach out to our support team to gain deeper insights and explore potential solutions. It's highly recommended that you reach out to our support team. Their expertise will be invaluable in suggesting the most appropriate approach.
After creating a Support ticket, please provide the ticket number as it would help us to track for more information. Azure support
Thank you.
-
KirbyJackson-1614 • 5 Reputation points
2025-02-06T19:30:34.8766667+00:00 Thank for for the help and suggestions. I will get a support ticket going. If I get a useful resolution I will post it back in this thread.
Sign in to comment