Sdílet prostřednictvím


SparkComponent Class

Spark component version, used to define a Spark Component or Job.

Inheritance
azure.ai.ml.entities._component.component.Component
SparkComponent
azure.ai.ml.entities._job.parameterized_spark.ParameterizedSpark
SparkComponent
azure.ai.ml.entities._job.spark_job_entry_mixin.SparkJobEntryMixin
SparkComponent
azure.ai.ml.entities._component._additional_includes.AdditionalIncludesMixin
SparkComponent

Constructor

SparkComponent(*, code: PathLike | str | None = '.', entry: Dict[str, str] | SparkJobEntry | None = None, py_files: List[str] | None = None, jars: List[str] | None = None, files: List[str] | None = None, archives: List[str] | None = None, driver_cores: int | str | None = None, driver_memory: str | None = None, executor_cores: int | str | None = None, executor_memory: str | None = None, executor_instances: int | str | None = None, dynamic_allocation_enabled: bool | str | None = None, dynamic_allocation_min_executors: int | str | None = None, dynamic_allocation_max_executors: int | str | None = None, conf: Dict[str, str] | None = None, environment: Environment | str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, args: str | None = None, additional_includes: List | None = None, **kwargs: Any)

Keyword-Only Parameters

Name Description
code

The source code to run the job. Can be a local path or "http:", "https:", or "azureml:" url pointing to a remote location. Defaults to ".", indicating the current directory.

Default value: .
entry

The file or class entry point.

py_files

The list of .zip, .egg or .py files to place on the PYTHONPATH for Python apps. Defaults to None.

jars

The list of .JAR files to include on the driver and executor classpaths. Defaults to None.

files

The list of files to be placed in the working directory of each executor. Defaults to None.

archives

The list of archives to be extracted into the working directory of each executor. Defaults to None.

driver_cores

The number of cores to use for the driver process, only in cluster mode.

driver_memory

The amount of memory to use for the driver process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").

executor_cores

The number of cores to use on each executor.

executor_memory

The amount of memory to use per executor process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").

executor_instances

The initial number of executors.

dynamic_allocation_enabled

Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. Defaults to False.

dynamic_allocation_min_executors

The lower bound for the number of executors if dynamic allocation is enabled.

dynamic_allocation_max_executors

The upper bound for the number of executors if dynamic allocation is enabled.

conf

A dictionary with pre-defined Spark configurations key and values. Defaults to None.

environment

The Azure ML environment to run the job in.

inputs
Optional[dict[str, Union[ <xref:azure.ai.ml.entities._job.pipeline._io.NodeOutput>, Input, str, bool, int, float, <xref:Enum>, ]]]

A mapping of input names to input data sources used in the job. Defaults to None.

outputs

A mapping of output names to output data sources used in the job. Defaults to None.

args

The arguments for the job. Defaults to None.

additional_includes

A list of shared additional files to be included in the component. Defaults to None.

Examples

Creating SparkComponent.


   from azure.ai.ml.entities import SparkComponent

   component = SparkComponent(
       name="add_greeting_column_spark_component",
       display_name="Aml Spark add greeting column test module",
       description="Aml Spark add greeting column test module",
       version="1",
       inputs={
           "file_input": {"type": "uri_file", "mode": "direct"},
       },
       driver_cores=2,
       driver_memory="1g",
       executor_cores=1,
       executor_memory="1g",
       executor_instances=1,
       code="./src",
       entry={"file": "add_greeting_column.py"},
       py_files=["utils.zip"],
       files=["my_files.txt"],
       args="--file_input ${{inputs.file_input}}",
       base_path="./sdk/ml/azure-ai-ml/tests/test_configs/dsl_pipeline/spark_job_in_pipeline",
   )


Methods

dump

Dump the component content into a file in yaml format.

dump

Dump the component content into a file in yaml format.

dump(dest: str | PathLike | IO, **kwargs: Any) -> None

Parameters

Name Description
dest
Required
Union[<xref:PathLike>, str, IO[AnyStr]]

The destination to receive this component's content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.

Attributes

base_path

The base path of the resource.

Returns

Type Description
str

The base path of the resource.

creation_context

The creation context of the resource.

Returns

Type Description

The creation metadata for the resource.

display_name

Display name of the component.

Returns

Type Description
str

Display name of the component.

entry

environment

The Azure ML environment to run the Spark component or job in.

Returns

Type Description

The Azure ML environment to run the Spark component or job in.

id

The resource ID.

Returns

Type Description

The global ID of the resource, an Azure Resource Manager (ARM) ID.

inputs

Inputs of the component.

Returns

Type Description

Inputs of the component.

is_deterministic

Whether the component is deterministic.

Returns

Type Description

Whether the component is deterministic

outputs

Outputs of the component.

Returns

Type Description

Outputs of the component.

type

Type of the component, default is 'command'.

Returns

Type Description
str

Type of the component.

version

Version of the component.

Returns

Type Description
str

Version of the component.

CODE_ID_RE_PATTERN

CODE_ID_RE_PATTERN = re.compile('\\/subscriptions\\/(?P<subscription>[\\w,-]+)\\/resourceGroups\\/(?P<resource_group>[\\w,-]+)\\/providers\\/Microsoft\\.MachineLearningServices\\/workspaces\\/(?P<workspace>[\\w,-]+)\\/codes\\/(?P<co)