Handling Multiple Directories in Azure ML Component YAML for Modular Code

Question

Dear Microsoft Support Team,

I am working on an Azure ML pipeline to train two different models (model_1 and model_2), each with its own scripts and associated files. My project structure is as follows:

src/
├── model_1/
│   ├── train_model_1.py
│   ├── evaluation.py
│   └── score.py
├── model_2/
│   ├── train_model_2.py
│   ├── evaluation.py
│   └── score.py
├── common/
│   ├── preprocess.py
│   └── utils.py

I would like to specify distinct directories (src/model_1 and src/model_2) in the code field of their respective component YAML files, as these components execute independently. This approach would:

Include only the relevant scripts (e.g., src/model_1 for the model_1 component) in the component's execution environment.
Exclude unnecessary scripts and directories that are irrelevant to a particular execution (e.g., src/model_2 when training model_1).

I understand that I could point the code field to the top-level src directory, but this would unnecessarily include unrelated code in the component environment, impacting modularity and deployment efficiency.

Questions:

Is there a way to specify multiple directories in the code field or restrict the inclusion of code to specific subdirectories?
If not, what is the recommended way to achieve this while maintaining modularity and avoiding duplication of shared logic (e.g., src/common) across directories?
Are there plans to enhance the code attribute to support this type of use case in the future?

For now, I am considering:

Duplicating shared logic (src/common) in each model directory, which is not ideal for maintainability.
Bundling all scripts into a single directory, which defeats the purpose of separating model-specific code.
Packaging shared logic as a Python library and installing it via pip during the component setup.

Your guidance on the best practices for this scenario would be greatly appreciated.

Thank you for your time and support.

Best regards,
Nikola

Accepted Answer

Hi ngavranovic,
Hope you are doing well.
To address your modularity concerns in Azure ML, the best approach is to package the shared logic (e.g., common/) as a Python package and install it via pip in each component's execution environment. This can be done by creating a setup.py for the common/ directory, making it installable, and then specifying the package in the component YAML file under the environment section.
Alternatively, you can upload the common/ code to a storage location like Azure Blob Storage or GitHub and use it as an external dependency in your components. Another option is to adjust the sys.path in your model scripts to include the common/ directory without duplication. While the code attribute in the component YAML currently only supports specifying a single directory, Azure ML frequently updates its features, so this functionality could be enhanced in the future.

I hope this information helps. Thank you!

Share via

Handling Multiple Directories in Azure ML Component YAML for Modular Code

0 additional answers

Your answer