編輯

共用方式為


適用於 Python 的 HDInsight SDKHDInsight SDK for Python

概觀Overview

適用於 Python 的 Azure HDInsight SDK 提供可讓您管理 HDInsight 叢集的類別和方法。The HDInsight SDK for Python provides classes and methods that allow you to manage your HDInsight clusters. 它包含用來建立、刪除、更新、列出、調整大小、執行指令碼動作、監視、取得 HDInsight 叢集屬性的作業,和其他多種作業。It includes operations to create, delete, update, list, resize, execute script actions, monitor, get properties of HDInsight clusters, and more.

必要條件Prerequisites

SDK 安裝SDK Installation

您可以在 Python 套件索引中找到 適用於 Python 的 HDInsight SDK,然後執行下列命令進行安裝:The HDInsight SDK for Python can be found in the Python Package Index and can be installed by running:

pip install azure-mgmt-hdinsight

AuthenticationAuthentication

SDK 必須先使用您的 Azure 訂用帳戶進行驗證。The SDK first needs to be authenticated with your Azure subscription. 請依照下列範例建立服務主體,並使用它來驗證。Follow the example below to create a service principal and use it to authenticate. 此動作完成後,您會有 HDInsightManagementClient 的執行個體,其中包含許多可用來執行管理作業的方法 (分述於下列各節中)。After this is done, you will have an instance of an HDInsightManagementClient, which contains many methods (outlined in below sections) that can be used to perform management operations.

注意

除了下列範例以外,還有其他方式可進行驗證,可能更符合您的需求。There are other ways to authenticate besides the below example that could potentially be better suited for your needs. 此處概述所有方法:使用適用於 Python 的 Azure 管理程式庫來進行驗證All methods are outlined here: Authenticate with the Azure Management Libraries for Python

使用服務主體的驗證範例Authentication Example Using a Service Principal

首先,請登入 Azure Cloud ShellFirst, login to Azure Cloud Shell. 確認您目前使用的訂用帳戶,是您希望建立服務主體的位置。Verify you are currently using the subscription in which you want the service principal created.

az account show

您的訂用帳戶資訊會以 JSON 的形式顯示。Your subscription information is displayed as JSON.

{
  "environmentName": "AzureCloud",
  "id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "isDefault": true,
  "name": "XXXXXXX",
  "state": "Enabled",
  "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "user": {
    "cloudShellID": true,
    "name": "XXX@XXX.XXX",
    "type": "user"
  }
}

如果您未登入正確的訂用帳戶,請執行下列命令以選取正確的訂用帳戶:If you're not logged into the correct subscription, select the correct one by running:

az account set -s <name or ID of subscription>

重要

如果您尚未以其他方法註冊 HDInsight 資源提供者 (例如,透過 Azure 入口網站建立 HDInsight 叢集),您必須立即執行此動作,才能進行驗證。If you have not already registered the HDInsight Resource Provider by another method (such as by creating an HDInsight Cluster through the Azure Portal), you need to do this once before you can authenticate. 此動作也可以從 Azure Cloud Shell 完成,只要執行下列命令即可:This can be done from the Azure Cloud Shell by running the following command:

az provider register --namespace Microsoft.HDInsight

接下來,請選擇服務主體的名稱,並使用下列命令加以建立:Next, choose a name for your service principal and create it with the following command:

az ad sp create-for-rbac --name <Service Principal Name> --sdk-auth

服務主體資訊會顯示為 JSON。The service principal information is displayed as JSON.

{
  "clientId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "clientSecret": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "subscriptionId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "tenantId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

複製下列 Python 程式碼片段,並且在 TENANT_IDCLIENT_IDCLIENT_SECRETSUBSCRIPTION_ID 中填入在執行建立服務主體的命令後傳回的 JSON 中所包含的字串。Copy the below Python snippet and fill in TENANT_ID, CLIENT_ID, CLIENT_SECRET, and SUBSCRIPTION_ID with the strings from the JSON that was returned after running the command to create the service principal.

from azure.mgmt.hdinsight import HDInsightManagementClient
from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.hdinsight.models import *

# Tenant ID for your Azure Subscription
TENANT_ID = ''
# Your Service Principal App Client ID
CLIENT_ID = ''
# Your Service Principal Client Secret
CLIENT_SECRET = ''
# Your Azure Subscription ID
SUBSCRIPTION_ID = ''

credentials = ServicePrincipalCredentials(
    client_id = CLIENT_ID,
    secret = CLIENT_SECRET,
    tenant = TENANT_ID
)

client = HDInsightManagementClient(credentials, SUBSCRIPTION_ID)

叢集管理Cluster Management

注意

本節假設您已通過驗證並建構 HDInsightManagementClient 執行個體,並且將其儲存在名為 client 的變數中。This section assumes you have already authenticated and constructed an HDInsightManagementClient instance and store it in a variable called client. 驗證及取得 HDInsightManagementClient 的指示可在先前的「驗證」一節中找到。Instructions for authenticating and obtaining an HDInsightManagementClient can be found in the Authentication section above.

建立叢集Create a Cluster

新叢集可藉由呼叫 client.clusters.create() 來建立。A new cluster can be created by calling client.clusters.create().

範例Samples

這裡提供建立數個常見 HDInsight 叢集類型的程式碼範例:HDInsight Python 範例.Code samples for creating several common types of HDInsight clusters are available: HDInsight Python Samples.

範例Example

此範例示範如何使用 2 個前端節點和 1 個背景工作節點來建立 Spark 叢集。This example demonstrates how to create a Spark cluster with 2 head nodes and 1 worker node.

注意

您必須先建立資源群組和儲存體帳戶,說明如下。You first need to create a Resource Group and Storage Account, as explained below. 如果您已建立這些項目,則可以略過這些步驟。If you have already created these, you can skip these steps.

建立資源群組Creating a Resource Group

您可以使用 Azure Cloud Shell 建立資源群組,只要執行下列命令即可You can create a resource group using the Azure Cloud Shell by running

az group create -l <Region Name (i.e. eastus)> --n <Resource Group Name>
建立儲存體帳戶Creating a Storage Account

您可以使用 Azure Cloud Shell 建立儲存體帳戶,只要執行下列命令即可:You can create a storage account using the Azure Cloud Shell by running:

az storage account create -n <Storage Account Name> -g <Existing Resource Group Name> -l <Region Name (i.e. eastus)> --sku <SKU i.e. Standard_LRS>

現在請執行下列命令,以取得儲存體帳戶的金鑰 (您需要此金鑰才能建立叢集):Now run the following command to get the key for your storage account (you will need this to create a cluster):

az storage account keys list -n <Storage Account Name>

下列 Python 程式碼片段會使用 2 個前端節點和 1 個背景工作節點來建立 Spark 叢集。The below Python snippet creates a Spark cluster with 2 head nodes and 1 worker node. 請依照註解中的說明填入空白變數,並依據您的特定需求變更其他參數。Fill in the blank variables as explained in the comments and feel free to change other parameters to suit your specific needs.

# The name for the cluster you are creating
cluster_name = ""
# The name of your existing Resource Group
resource_group_name = ""
# Choose a username
username = ""
# Choose a password
password = ""
# Replace <> with the name of your storage account
storage_account = "<>.blob.core.windows.net"
# Storage account key you obtained above
storage_account_key = ""
# Choose a region
location = ""
container = "default"

params = ClusterCreateProperties(
    cluster_version="3.6",
    os_type=OSType.linux,
    tier=Tier.standard,
    cluster_definition=ClusterDefinition(
        kind="spark",
        configurations={
            "gateway": {
                "restAuthCredential.enabled_credential": "True",
                "restAuthCredential.username": username,
                "restAuthCredential.password": password
            }
        }
    ),
    compute_profile=ComputeProfile(
        roles=[
            Role(
                name="headnode",
                target_instance_count=2,
                hardware_profile=HardwareProfile(vm_size="Large"),
                os_profile=OsProfile(
                    linux_operating_system_profile=LinuxOperatingSystemProfile(
                        username=username,
                        password=password
                    )
                )
            ),
            Role(
                name="workernode",
                target_instance_count=1,
                hardware_profile=HardwareProfile(vm_size="Large"),
                os_profile=OsProfile(
                    linux_operating_system_profile=LinuxOperatingSystemProfile(
                        username=username,
                        password=password
                    )
                )
            )
        ]
    ),
    storage_profile=StorageProfile(
        storageaccounts=[StorageAccount(
            name=storage_account,
            key=storage_account_key,
            container=container,
            is_default=True
        )]
    )
)

client.clusters.create(
    cluster_name=cluster_name,
    resource_group_name=resource_group_name,
    parameters=ClusterCreateParametersExtended(
        location=location,
        tags={},
        properties=params
    ))

取得叢集詳細資料Get Cluster Details

若要取得特定叢集的屬性:To get properties for a given cluster:

client.clusters.get("<Resource Group Name>", "<Cluster Name>")

範例Example

您可以使用 get 來確認您已成功建立叢集。You can use get to confirm that you have successfully created your cluster.

my_cluster = client.clusters.get("<Resource Group Name>", "<Cluster Name>")
print(my_cluster)

輸出應會顯示如下:The output should look like:

{'additional_properties': {}, 'id': '/subscriptions/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourceGroups/<Resource Group Name>/providers/Microsoft.HDInsight/clusters/<Cluster Name>', 'name': '<Cluster Name>', 'type': 'Microsoft.HDInsight/clusters', 'location': '<Location>', 'tags': {}, 'etag': 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX', 'properties': <azure.mgmt.hdinsight.models.cluster_get_properties_py3.ClusterGetProperties object at 0x0000013766D68048>}

列出叢集List Clusters

列出訂用帳戶下的叢集List Clusters Under The Subscription

client.clusters.list()

依資源群組列出叢集List Clusters By Resource Group

client.clusters.list_by_resource_group("<Resource Group Name>")

注意

list()list_by_resource_group() 都會傳回 ClusterPaged 物件。Both list() and list_by_resource_group() return a ClusterPaged object. 呼叫 advance_page() 會傳回該頁面上的叢集清單,並將 ClusterPaged 物件往前送到下一個頁面。Calling advance_page() returns a list of clusters on that page and advances the ClusterPaged object to the next page. 這可以重複到出現 StopIteration 例外狀況為止,表示沒有更多的頁面。This can be repeated until a StopIteration exception is raised, indicating that there are no more pages.

範例Example

下列範例會列印目前訂用帳戶所有叢集的屬性:The following example prints the properties of all clusters for the current subscription:

clusters_paged = client.clusters.list()
while True:
  try:
    for cluster in clusters_paged.advance_page():
      print(cluster)
  except StopIteration: 
    break

刪除叢集Delete a Cluster

若要刪除叢集:To delete a cluster:

client.clusters.delete("<Resource Group Name>", "<Cluster Name>")

更新叢集標記Update Cluster Tags

您可以更新指定叢集的標記,如下所示:You can update the tags of a given cluster like so:

client.clusters.update("<Resource Group Name>", "<Cluster Name>", tags={<Dictionary of Tags>})

範例Example

client.clusters.update("<Resource Group Name>", "<Cluster Name>", tags={"tag1Name" : "tag1Value", "tag2Name" : "tag2Value"})

調整叢集大小Resize Cluster

您可以藉由指定新的大小來調整指定叢集的背景工作節點數目,如下所示:You can resize a given cluster's number of worker nodes by specifying a new size like so:

client.clusters.resize("<Resource Group Name>", "<Cluster Name>", target_instance_count=<Num of Worker Nodes>)

叢集監視Cluster Monitoring

HDInsight 管理 SDK 也可用來透過 Operations Management Suite (OMS) 管理您對叢集的監視。The HDInsight Management SDK can also be used to manage monitoring on your clusters via the Operations Management Suite (OMS).

啟用 OMS 監視Enable OMS Monitoring

注意

若要啟用 OMS 監視,您必須擁有現有的 Log Analytics 工作區。To enable OMS Monitoring, you must have an existing Log Analytics workspace. 如果您尚未建立工作區,您可以在此了解如何建立:在 Azure 入口網站中建立 Log Analytics 工作區If you have not already created one, you can learn how to do that here: Create a Log Analytics workspace in the Azure portal.

若要對您的叢集啟用 OMS 監視:To enable OMS Monitoring on your cluster:

client.extension.enable_monitoring("<Resource Group Name>", "<Cluster Name>", workspace_id="<Workspace Id>")

檢視 OMS 監視的狀態View Status Of OMS Monitoring

若要取得叢集的 OMS 狀態:To get the status of OMS on your cluster:

client.extension.get_monitoring_status("<Resource Group Name", "Cluster Name")

停用 OMS 監視Disable OMS Monitoring

若要對您的叢集停用 OMS:To disable OMS on your cluster:

client.extension.disable_monitoring("<Resource Group Name>", "<Cluster Name>")

指令碼動作Script Actions

HDInsight 提供名為指令碼動作的設定方法,此方法會叫用自訂指令碼來自訂叢集。HDInsight provides a configuration method called script actions that invokes custom scripts to customize the cluster.

注意

您可以在此處找到如何使用指令碼動作的詳細資訊:使用指令碼動作自訂 Linux 型 HDInsight 叢集More information on how to use script actions can be found here: Customize Linux-based HDInsight clusters using script actions

執行指令碼動作Execute Script Actions

若要對指定的叢集執行指令碼動作:To execute script actions on a given cluster:

script_action1 = RuntimeScriptAction(name="<Script Name>", uri="<URL To Script>", roles=[<List of Roles>]) #valid roles are "headnode", "workernode", "zookeepernode", and "edgenode"

client.clusters.execute_script_actions("<Resource Group Name>", "<Cluster Name>", <persist_on_success (bool)>, script_actions=[script_action1]) #add more RuntimeScriptActions to the list to execute multiple scripts

刪除指令碼動作Delete Script Action

若要刪除對給定叢集指定的持續性指令碼動作:To delete a specified persisted script action on a given cluster:

client.script_actions.delete("<Resource Group Name>", "<Cluster Name", "<Script Name>")

列出持續性指令碼動作List Persisted Script Actions

注意

list()list_persisted_scripts() 會傳回 RuntimeScriptActionDetailPaged 物件。list() and list_persisted_scripts() return a RuntimeScriptActionDetailPaged object. 呼叫 advance_page() 會傳回該頁面上的 RuntimeScriptActionDetail 清單,並將 RuntimeScriptActionDetailPaged 物件往前送到下一個頁面。Calling advance_page() returns a list of RuntimeScriptActionDetail on that page and advances the RuntimeScriptActionDetailPaged object to the next page. 這可以重複到出現 StopIteration 例外狀況為止,表示沒有更多的頁面。This can be repeated until a StopIteration exception is raised, indicating that there are no more pages. 請參閱下方的範例。See the example below.

若要列出指定叢集的所有持續性指令碼動作:To list all persisted script actions for the specified cluster:

client.script_actions.list_persisted_scripts("<Resource Group Name>", "<Cluster Name>")

範例Example

scripts_paged = client.script_actions.list_persisted_scripts(resource_group_name, cluster_name)
while True:
  try:
    for script in scripts_paged.advance_page():
      print(script)
  except StopIteration:
    break

列出所有指令碼的執行歷程記錄List All Scripts' Execution History

若要針對指定的叢集列出所有指令碼的執行歷程記錄:To list all scripts' execution history for the specified cluster:

client.script_execution_history.list("<Resource Group Name>", "<Cluster Name>")

範例Example

此範例會列印過去所有指令碼執行的所有詳細資料。This example prints all the details for all past script executions.

script_executions_paged = client.script_execution_history.list("<Resource Group Name>", "<Cluster Name>")
while True:
  try:
    for script in script_executions_paged.advance_page():            
      print(script)
    except StopIteration:       
      break