As you wrote in your documentation :
When Allow Only Approved Outbound is enabled (isolation_mode: allow_only_approved_outbound
), conda package dependencies defined in Spark session configuration will fail to install. To resolve this problem, upload a self-contained Python package wheel with no external dependencies to an Azure storage account and create private endpoint to this storage account. Use the path to Python package wheel as py_files
parameter in your Spark job. Setting an FQDN outbound rule will not bypass this issue as FQDN rule propagation is not supported by Spark.
But, i'm using built-in component in you registry : generation_safety_quality_signal_monitor
(v0.5.23).
Few things to know:
- I'm using AML with managed vnet and only approved outbound rule.
- I'm using connexion and private endpoint to access AzureOpenAI resources.
- I'm using FQDN outbound rules to allow some traffic (conda for example)
- This component (
generation_safety_quality_signal_monitor
) contains nested component and you only have access to some parameters like workspace_connection_arm_id
- One of the nested component define conda dependency directly in its yaml
conf.spark.synapse.library.python.env
key.
Different scenario:
Scenario 1:
AML workspace:
-
include-spark=False
-
isolationMode=AllowOnlyApprovedOutbound
AzureOpenAI private endpoint outbound rules:
-
sparkEnabled=true
-
sparkStatus=Inactive
Result scenario 1: unable to access azure openai ressource (Access denied due to Virtual Network/Firewall rules
). Ok i realize i probably need to enable spark config in my private outbound rule endpoint.
Scenario 2:
AML workspace:
-
include-spark: true
-
isolationMode: AllowOnlyApprovedOutbound
AzureOpenAI private endpoint outbound rules:
-
sparkEnabled: true
-
sparkStatus: Active
Result scenario 2: unable to access azure openai ressource (Access denied due to Virtual Network/Firewall rules
). Ok i realize i probably need to update AML workspace to allow spark serverless config.
Scenario 3:
AML workspace:
-
include-spark: true
-
isolationMode: AllowOnlyApprovedOutbound
AzureOpenAI private endpoint outbound rules:
-
sparkEnabled: true
-
sparkStatus: Active
Result scenario 3: unable to access conda.anaconda.org
to install dependencies of the nested dependencies of generation_safety_quality_signal_monitor
built-in component.
Question
What do you suggest to make it work ? Is there a planned fix to enable FQDN outbound rules propagation in serverless spark job ?