DataTransferStep 類別

參考

建立 Azure ML 管線步驟，以在儲存體選項之間傳輸資料。

DataTransferStep 支援常見的儲存體類型，例如Azure Blob 儲存體和 Azure Data Lake 作為來源和接收。如需詳細資訊，請參閱一節。

如需使用 DataTransferStep 的範例，請參閱筆記本 https://aka.ms/pl-data-trans 。

建立 Azure ML 管線步驟，以在儲存體選項之間傳輸資料。

繼承: azureml.pipeline.core._data_transfer_step_base._DataTransferStepBase

DataTransferStep

建構函式

DataTransferStep(name, source_data_reference=None, destination_data_reference=None, compute_target=None, source_reference_type=None, destination_reference_type=None, allow_reuse=True)

參數

名稱	Description
name 必要	str [必要]步驟的名稱。
source_data_reference	Union[InputPortBinding, DataReference, PortDataReference, PipelineData] [必要]做為資料傳輸作業來源的輸入連接。預設值: None
destination_data_reference	Union[InputPortBinding, PipelineOutputAbstractDataset, DataReference] [必要]作為資料傳輸作業目的地的輸出連線。預設值: None
compute_target	DataFactoryCompute, str [必要]用於傳輸資料的Azure Data Factory。預設值: None
source_reference_type	str 指定型別的 `source_data_reference` 選擇性字串。可能的值包括：'file'、'directory'。未指定時，會使用現有路徑的類型。使用此參數來區分相同名稱的檔案和目錄。預設值: None
destination_reference_type	str 指定型別的 `destination_data_reference` 選擇性字串。可能的值包括：'file'、'directory'。未指定時，Azure ML 會依該順序使用現有路徑、來源參考或 'directory' 的類型。預設值: None
allow_reuse	bool 指出當使用相同的設定重新執行時，步驟是否應該重複使用先前的結果。預設會啟用重複使用。如果步驟引數保持不變，則會重複使用此步驟上一次執行的輸出。重複使用步驟時，不會再次傳輸資料，而是立即將先前執行的結果提供給任何後續步驟使用。如果您使用 Azure Machine Learning 資料集做為輸入，則重複使用取決於資料集的定義是否已變更，而非基礎資料是否已變更。預設值: True
name 必要	str [必要]步驟的名稱。
source_data_reference 必要	Union[InputPortBinding, DataReference, PortDataReference, PipelineData] [必要]做為資料傳輸作業來源的輸入連接。
destination_data_reference 必要	Union[InputPortBinding, PipelineOutputAbstractDataset, DataReference] [必要]作為資料傳輸作業目的地的輸出連線。
compute_target 必要	DataFactoryCompute, str [必要]用於傳輸資料的Azure Data Factory。
source_reference_type 必要	str 指定型別的 `source_data_reference` 選擇性字串。可能的值包括：'file'、'directory'。未指定時，會使用現有路徑的類型。使用此參數來區分相同名稱的檔案和目錄。
destination_reference_type 必要	str 指定型別的 `destination_data_reference` 選擇性字串。可能的值包括：'file'、'directory'。未指定時，Azure ML 會依該順序使用現有路徑、來源參考或 'directory' 的類型。
allow_reuse 必要	bool 指出當使用相同的設定重新執行時，步驟是否應該重複使用先前的結果。預設會啟用重複使用。如果步驟引數保持不變，則會重複使用此步驟上一次執行的輸出。重複使用步驟時，不會再次傳輸資料，而是立即將先前執行的結果提供給任何後續步驟使用。如果您使用 Azure Machine Learning 資料集做為輸入，則重複使用取決於資料集的定義是否已變更，而非基礎資料是否已變更。

備註

此步驟支援下列儲存體類型作為來源和接收，但未注意：

Azure Blob 儲存體
Azure Data Lake Storage Gen1 和 Gen2
Azure SQL Database
適用於 PostgreSQL 的 Azure 資料庫
適用於 MySQL 的 Azure 資料庫

針對 Azure SQL 資料庫，您必須使用服務主體驗證。如需詳細資訊，請參閱服務主體驗證。如需針對 Azure SQL Database 使用服務主體驗證的範例，請參閱 https://aka.ms/pl-data-trans 。

若要在步驟之間建立資料相依性，請使用 get_output 方法來取得 PipelineData 代表此資料傳輸步驟輸出的物件，並可作為管線後續步驟的輸入。


   data_transfer_step = DataTransferStep(name="copy data", ...)

   # Use output of data_transfer_step as input of another step in pipeline
   # This will make training_step wait for data_transfer_step to complete
   training_input = data_transfer_step.get_output()
   training_step = PythonScriptStep(script_name="train.py",
                           arguments=["--model", training_input],
                           inputs=[training_input],
                           compute_target=aml_compute,
                           source_directory=source_directory)

若要建立 InputPortBinding 具有特定名稱的，您可以將get_output () 輸出與的或 as_mount 方法 PipelineData 的輸出 as_input 結合。


   data_transfer_step = DataTransferStep(name="copy data", ...)
   training_input = data_transfer_step.get_output().as_input("my_input_name")

方法

create_node

從 DataTransfer 步驟建立節點，並將它新增至指定的圖表。

這個方法不適合直接使用。使用此步驟具現化管線時，Azure ML 會自動傳遞透過此方法所需的參數，以便將步驟新增至代表工作流程的管線圖形。

get_output

取得步驟的輸出作為 PipelineData。

create_node

從 DataTransfer 步驟建立節點，並將它新增至指定的圖表。

這個方法不適合直接使用。使用此步驟具現化管線時，Azure ML 會自動傳遞透過此方法所需的參數，以便將步驟新增至代表工作流程的管線圖形。

create_node(graph, default_datastore, context)

參數

名稱	Description
graph 必要	Graph 要加入節點的繪圖物件。
default_datastore 必要	Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore] 預設資料存放區。
context 必要	<xref:azureml.pipeline.core._GraphContext> 圖形內容。

傳回

類型	Description
Node	已建立的節點。

get_output

取得步驟的輸出作為 PipelineData。

get_output()

傳回

類型	Description
PipelineData	步驟的輸出。

備註

若要建立步驟之間的資料相依性，請使用 get_output 方法來取得 PipelineData 代表此資料傳輸步驟輸出的物件，並可作為管線後續步驟的輸入。


   data_transfer_step = DataTransferStep(name="copy data", ...)

   # Use output of data_transfer_step as input of another step in pipeline
   # This will make training_step wait for data_transfer_step to complete
   training_input = data_transfer_step.get_output()
   training_step = PythonScriptStep(script_name="train.py",
                           arguments=["--model", training_input],
                           inputs=[training_input],
                           compute_target=aml_compute,
                           source_directory=source_directory)

若要建立 InputPortBinding 具有特定名稱的，您可以將 get_output () 呼叫與 as_input 或 as_mount 協助程式方法結合。


   data_transfer_step = DataTransferStep(name="copy data", ...)

   training_input = data_transfer_step.get_output().as_input("my_input_name")

共用方式為

DataTransferStep 類別

建構函式

參數

備註

方法

create_node

參數

傳回

get_output

傳回

備註

意見反應

其他資源