Dela via


SparkComponent Class

Spark component version, used to define a Spark Component or Job.

Constructor

SparkComponent(*, code: str | PathLike | None = '.', entry: Dict[str, str] | SparkJobEntry | None = None, py_files: List[str] | None = None, jars: List[str] | None = None, files: List[str] | None = None, archives: List[str] | None = None, driver_cores: int | str | None = None, driver_memory: str | None = None, executor_cores: int | str | None = None, executor_memory: str | None = None, executor_instances: int | str | None = None, dynamic_allocation_enabled: bool | str | None = None, dynamic_allocation_min_executors: int | str | None = None, dynamic_allocation_max_executors: int | str | None = None, conf: Dict[str, str] | None = None, environment: Environment | str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, args: str | None = None, additional_includes: List | None = None, **kwargs: Any)

Keyword-Only Parameters

Name Description
code

The source code to run the job. Can be a local path or "http:", "https:", or "azureml:" url pointing to a remote location. Defaults to ".", indicating the current directory.

Default value: .
entry

The file or class entry point.

Default value: None
py_files

The list of .zip, .egg or .py files to place on the PYTHONPATH for Python apps. Defaults to None.

Default value: None
jars

The list of .JAR files to include on the driver and executor classpaths. Defaults to None.

Default value: None
files

The list of files to be placed in the working directory of each executor. Defaults to None.

Default value: None
archives

The list of archives to be extracted into the working directory of each executor. Defaults to None.

Default value: None
driver_cores

The number of cores to use for the driver process, only in cluster mode.

Default value: None
driver_memory

The amount of memory to use for the driver process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").

Default value: None
executor_cores

The number of cores to use on each executor.

Default value: None
executor_memory

The amount of memory to use per executor process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").

Default value: None
executor_instances

The initial number of executors.

Default value: None
dynamic_allocation_enabled

Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. Defaults to False.

Default value: None
dynamic_allocation_min_executors

The lower bound for the number of executors if dynamic allocation is enabled.

Default value: None
dynamic_allocation_max_executors

The upper bound for the number of executors if dynamic allocation is enabled.

Default value: None
conf

A dictionary with pre-defined Spark configurations key and values. Defaults to None.

Default value: None
environment

The Azure ML environment to run the job in.

Default value: None
inputs
Optional[dict[str, Union[ <xref:azure.ai.ml.entities._job.pipeline._io.NodeOutput>, Input, str, bool, int, float, <xref:Enum>, ]]]

A mapping of input names to input data sources used in the job. Defaults to None.

Default value: None
outputs

A mapping of output names to output data sources used in the job. Defaults to None.

Default value: None
args

The arguments for the job. Defaults to None.

Default value: None
additional_includes

A list of shared additional files to be included in the component. Defaults to None.

Default value: None

Examples

Creating SparkComponent.


   from azure.ai.ml.entities import SparkComponent

   component = SparkComponent(
       name="add_greeting_column_spark_component",
       display_name="Aml Spark add greeting column test module",
       description="Aml Spark add greeting column test module",
       version="1",
       inputs={
           "file_input": {"type": "uri_file", "mode": "direct"},
       },
       driver_cores=2,
       driver_memory="1g",
       executor_cores=1,
       executor_memory="1g",
       executor_instances=1,
       code="./src",
       entry={"file": "add_greeting_column.py"},
       py_files=["utils.zip"],
       files=["my_files.txt"],
       args="--file_input ${{inputs.file_input}}",
       base_path="./sdk/ml/azure-ai-ml/tests/test_configs/dsl_pipeline/spark_job_in_pipeline",
   )


Methods

dump

Dump the component content into a file in yaml format.

dump

Dump the component content into a file in yaml format.

dump(dest: str | PathLike | IO, **kwargs: Any) -> None

Parameters

Name Description
dest
Required
Union[<xref:PathLike>, str, IO[AnyStr]]

The destination to receive this component's content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.

Attributes

base_path

The base path of the resource.

Returns

Type Description
str

The base path of the resource.

creation_context

The creation context of the resource.

Returns

Type Description

The creation metadata for the resource.

display_name

Display name of the component.

Returns

Type Description
str

Display name of the component.

entry

environment

The Azure ML environment to run the Spark component or job in.

Returns

Type Description

The Azure ML environment to run the Spark component or job in.

id

The resource ID.

Returns

Type Description

The global ID of the resource, an Azure Resource Manager (ARM) ID.

inputs

Inputs of the component.

Returns

Type Description

Inputs of the component.

is_deterministic

Whether the component is deterministic.

Returns

Type Description

Whether the component is deterministic

outputs

Outputs of the component.

Returns

Type Description

Outputs of the component.

type

Type of the component, default is 'command'.

Returns

Type Description
str

Type of the component.

version

Version of the component.

Returns

Type Description
str

Version of the component.

CODE_ID_RE_PATTERN

CODE_ID_RE_PATTERN = re.compile('\\/subscriptions\\/(?P<subscription>[\\w,-]+)\\/resourceGroups\\/(?P<resource_group>[\\w,-]+)\\/providers\\/Microsoft\\.MachineLearningServices\\/workspaces\\/(?P<workspace>[\\w,-]+)\\/codes\\/(?P<co)