Handling Multiple Directories in Azure ML Component YAML for Modular Code

ngavranovic 20 Reputation points
2024-12-13T11:34:42.89+00:00

Dear Microsoft Support Team,

I am working on an Azure ML pipeline to train two different models (model_1 and model_2), each with its own scripts and associated files. My project structure is as follows:

src/
├── model_1/
│   ├── train_model_1.py
│   ├── evaluation.py
│   └── score.py
├── model_2/
│   ├── train_model_2.py
│   ├── evaluation.py
│   └── score.py
├── common/
│   ├── preprocess.py
│   └── utils.py

I would like to specify distinct directories (src/model_1 and src/model_2) in the code field of their respective component YAML files, as these components execute independently. This approach would:

  • Include only the relevant scripts (e.g., src/model_1 for the model_1 component) in the component's execution environment.
  • Exclude unnecessary scripts and directories that are irrelevant to a particular execution (e.g., src/model_2 when training model_1).

I understand that I could point the code field to the top-level src directory, but this would unnecessarily include unrelated code in the component environment, impacting modularity and deployment efficiency.

Questions:

  1. Is there a way to specify multiple directories in the code field or restrict the inclusion of code to specific subdirectories?
  2. If not, what is the recommended way to achieve this while maintaining modularity and avoiding duplication of shared logic (e.g., src/common) across directories?
  3. Are there plans to enhance the code attribute to support this type of use case in the future?

For now, I am considering:

  • Duplicating shared logic (src/common) in each model directory, which is not ideal for maintainability.
  • Bundling all scripts into a single directory, which defeats the purpose of separating model-specific code.
  • Packaging shared logic as a Python library and installing it via pip during the component setup.

Your guidance on the best practices for this scenario would be greatly appreciated.

Thank you for your time and support.

Best regards,
Nikola

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,043 questions
{count} votes

Accepted answer
  1. Pavankumar Purilla 1,965 Reputation points Microsoft Vendor
    2024-12-14T01:00:01.1266667+00:00

    Hi ngavranovic,
    Hope you are doing well.
    To address your modularity concerns in Azure ML, the best approach is to package the shared logic (e.g., common/) as a Python package and install it via pip in each component's execution environment. This can be done by creating a setup.py for the common/ directory, making it installable, and then specifying the package in the component YAML file under the environment section.
    Alternatively, you can upload the common/ code to a storage location like Azure Blob Storage or GitHub and use it as an external dependency in your components. Another option is to adjust the sys.path in your model scripts to include the common/ directory without duplication. While the code attribute in the component YAML currently only supports specifying a single directory, Azure ML frequently updates its features, so this functionality could be enhanced in the future.

    I hope this information helps. Thank you!


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.