快速入門：使用 Azure CLI 在 Azure HDInsight 中建立 Apache Spark 叢集

發行項
11/25/2024

在本快速入門中，您會瞭解如何使用 Azure CLI，在 Azure HDInsight 上建立 Apache Spark 叢集。 Azure HDInsight 是供企業使用的受控、全方位的開放原始碼分析服務。適用於 HDInsight 的 Apache Spark 架構能夠運用記憶體內部處理，使得資料分析及叢集運算更為快速。 Azure CLI 是用來管理 Azure 資源的 Microsoft 跨平台命令列體驗。

如果您同時使用多個叢集，您可建立虛擬網路，而如果您使用的是 Spark 叢集，則可使用 Hive Warehouse Connector。如需詳細資訊，請參閱針對 Azure HDInsight 規劃虛擬網路和整合 Apache Spark 和 Apache Hive 與 Hive Warehouse Connector。

如果您沒有 Azure 訂閱，請在開始之前，先建立 Azure 免費帳戶。

必要條件

在 Azure Cloud Shell 中使用 Bash 環境。如需詳細資訊，請參閱 Azure Cloud Shell 中的 Bash 快速入門。
若要在本地執行 CLI 參考命令，請安裝 Azure CLI。若您在 Windows 或 macOS 上執行，請考慮在 Docker 容器中執行 Azure CLI。如需詳細資訊，請參閱〈如何在 Docker 容器中執行 Azure CLI〉。
- 如果您使用的是本機安裝，請使用 az login 命令，透過 Azure CLI 來登入。請遵循您終端機上顯示的步驟，完成驗證程序。如需其他登入選項，請參閱使用 Azure CLI 登入。
- 出現提示時，請在第一次使用時安裝 Azure CLI 延伸模組。如需擴充功能詳細資訊，請參閱使用 Azure CLI 擴充功能。
- 執行 az version 以尋找已安裝的版本和相依程式庫。若要升級至最新版本，請執行 az upgrade。

建立 Apache Spark 叢集

登入 Azure 訂用帳戶。如果您打算使用 Azure Cloud Shell，請選取以下程式碼區塊右上角的 [試試看]。另外，輸入下列命令：
```
az login

# If you have multiple subscriptions, set the one to use
# az account set --subscription "SUBSCRIPTIONID"
```

設定環境變數。本快速入門中使用的變數是以 Bash 為基礎的。針對其他環境，會需要一點變化。將以下程式碼片段中的 RESOURCEGROUPNAME、LOCATION、CLUSTERNAME、STORAGEACCOUNTNAME 和 PASSWORD 取代為所需的值。然後輸入 CLI 命令來設定環境變數。

export resourceGroupName=RESOURCEGROUPNAME
export location=LOCATION
export clusterName=CLUSTERNAME
export AZURE_STORAGE_ACCOUNT=STORAGEACCOUNTNAME
export httpCredential='PASSWORD'
export sshCredentials='PASSWORD'

export AZURE_STORAGE_CONTAINER=$clusterName
export clusterSizeInNodes=1
export clusterVersion=4.0
export clusterType=spark
export componentVersion=Spark=2.3

輸入以下命令以建立資源群組：

az group create \
    --location $location \
    --name $resourceGroupName

輸入以下命令以建立 Azure 儲存體帳戶：

az storage account create \
    --name $AZURE_STORAGE_ACCOUNT \
    --resource-group $resourceGroupName \
    --https-only true \
    --kind StorageV2 \
    --location $location \
    --sku Standard_LRS

輸入以下命令，從 Azure 儲存體帳戶擷取主索引鍵並將其儲存至變數：

export AZURE_STORAGE_KEY=$(az storage account keys list \
    --account-name $AZURE_STORAGE_ACCOUNT \
    --resource-group $resourceGroupName \
    --query [0].value -o tsv)

輸入以下命令以建立 Azure 儲存體容器：

az storage container create \
    --name $AZURE_STORAGE_CONTAINER \
    --account-key $AZURE_STORAGE_KEY \
    --account-name $AZURE_STORAGE_ACCOUNT

輸入下列命令來建立 Apache Spark 叢集：

az hdinsight create \
    --name $clusterName \
    --resource-group $resourceGroupName \
    --type $clusterType \
    --component-version $componentVersion \
    --http-password $httpCredential \
    --http-user admin \
    --location $location \
    --workernode-count $clusterSizeInNodes \
    --ssh-password $sshCredentials \
    --ssh-user sshuser \
    --storage-account $AZURE_STORAGE_ACCOUNT \
    --storage-account-key $AZURE_STORAGE_KEY \
    --storage-container $AZURE_STORAGE_CONTAINER \
    --version $clusterVersion

清除資源

完成此快速入門之後，您可以刪除叢集。利用 HDInsight，您的資料會儲存在 Azure 儲存體中，以便您在未使用叢集時安全地刪除該叢集。您也需支付 HDInsight 叢集的費用 (即使未使用該叢集)。由於叢集費用是儲存體費用的許多倍，所以刪除未使用的叢集符合經濟效益。

輸入所有或部分的下列命令來移除資源：

# Remove cluster
az hdinsight delete \
    --name $clusterName \
    --resource-group $resourceGroupName

# Remove storage container
az storage container delete \
    --account-name $AZURE_STORAGE_ACCOUNT \
    --name $AZURE_STORAGE_CONTAINER

# Remove storage account
az storage account delete \
    --name $AZURE_STORAGE_ACCOUNT \
    --resource-group $resourceGroupName

# Remove resource group
az group delete \
    --name $resourceGroupName

下一步

在本快速入門中，您會了解如何使用 Azure CLI，在 Azure HDInsight 中建立 Apache Spark 叢集。前往下一個教學課程，以了解如何使用 HDInsight 叢集來執行範例資料的互動式查詢。

在 Apache Spark 上執行互動式查詢

共用方式為

快速入門：使用 Azure CLI 在 Azure HDInsight 中建立 Apache Spark 叢集

必要條件

建立 Apache Spark 叢集

清除資源

下一步

意見反應

其他資源