FQDN rules does not propagate for Spark serverless Job in managed VNET AzureML

François V 0 Reputation points
2025-01-25T14:03:33.59+00:00

As you wrote in your documentation :

When Allow Only Approved Outbound is enabled (isolation_mode: allow_only_approved_outbound), conda package dependencies defined in Spark session configuration will fail to install. To resolve this problem, upload a self-contained Python package wheel with no external dependencies to an Azure storage account and create private endpoint to this storage account. Use the path to Python package wheel as py_files parameter in your Spark job. Setting an FQDN outbound rule will not bypass this issue as FQDN rule propagation is not supported by Spark.

But, i'm using built-in component in you registry : generation_safety_quality_signal_monitor (v0.5.23).

Few things to know:

  • I'm using AML with managed vnet and only approved outbound rule.
  • I'm using connexion and private endpoint to access AzureOpenAI resources.
  • I'm using FQDN outbound rules to allow some traffic (conda for example)
  • This component (generation_safety_quality_signal_monitor) contains nested component and you only have access to some parameters like workspace_connection_arm_id
  • One of the nested component define conda dependency directly in its yaml conf.spark.synapse.library.python.env key.

Different scenario:

Scenario 1:

AML workspace:

  • include-spark=False
  • isolationMode=AllowOnlyApprovedOutbound

AzureOpenAI private endpoint outbound rules:

  • sparkEnabled=true
  • sparkStatus=Inactive

Result scenario 1: unable to access azure openai ressource (Access denied due to Virtual Network/Firewall rules). Ok i realize i probably need to enable spark config in my private outbound rule endpoint.

Scenario 2:

AML workspace:

  • include-spark: true
  • isolationMode: AllowOnlyApprovedOutbound

AzureOpenAI private endpoint outbound rules:

  • sparkEnabled: true
  • sparkStatus: Active

Result scenario 2: unable to access azure openai ressource (Access denied due to Virtual Network/Firewall rules). Ok i realize i probably need to update AML workspace to allow spark serverless config.

Scenario 3:

AML workspace:

  • include-spark: true
  • isolationMode: AllowOnlyApprovedOutbound

AzureOpenAI private endpoint outbound rules:

  • sparkEnabled: true
  • sparkStatus: Active

Result scenario 3: unable to access conda.anaconda.org to install dependencies of the nested dependencies of generation_safety_quality_signal_monitor built-in component.

Question

What do you suggest to make it work ? Is there a planned fix to enable FQDN outbound rules propagation in serverless spark job ?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,086 questions
{count} votes

1 answer

Sort by: Most helpful
  1. François V 0 Reputation points
    2025-01-27T15:17:04.54+00:00

    Thank you Manas for your reply and validation of spark config to access AzureOpenAI resources.

    But you're missing my point: when using built-in component generation_safety_quality_signal_monitor i don't have access to nested job and therefore i can't configure py_files.

    Moreoever, it's a bit nasty to re-recreate all the pipeline and the wheel package. This is the reason why i was wondering if propagation of outbound rules into spark serverless is coming one day ?

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.