使用 API 收集 Apache Spark 應用程式計量

發行項
10/18/2023

概觀

在本教學課程中，您將瞭解如何使用 Synapse Prometheus 連接器，將現有的內部部署 Prometheus 伺服器與 Azure Synapse 工作區整合，以取得近乎即時的 Apache Spark 應用程式計量。

本教學課程也會介紹 Azure Synapse REST 計量 API。您可以透過 REST API 擷取 Apache Spark 應用程式計量資料，以建置您自己的監視和診斷工具組，或與您的監視系統整合。

針對內部部署 Prometheus 伺服器使用 Azure Synapse Prometheus 連接器

Azure Synapse Prometheus 連接器是開放原始碼專案。 Synapse Prometheus 連接器會使用檔案型服務探索方法來讓您：

透過 Microsoft Entra 服務主體向 Synapse 工作區進行驗證。
擷取工作區 Apache Spark 應用程式清單。
透過 Prometheus 檔案型組態提取 Apache Spark 應用程式計量。

1.必要條件

您必須在 Linux VM 上部署 Prometheus 伺服器。

2.建立服務主體

若要在內部部署 Prometheus 伺服器中使用 Azure Synapse Prometheus 連接器，您應該遵循下列步驟來建立服務主體。

2.1 建立服務主體：

az ad sp create-for-rbac --name <service_principal_name> --role Contributor --scopes /subscriptions/<subscription_id>

結果看起來應該像這樣：

{
  "appId": "abcdef...",
  "displayName": "<service_principal_name>",
  "name": "http://<service_principal_name>",
  "password": "abc....",
  "tenant": "<tenant_id>"
}

記下 appId、密碼和租使用者識別碼。

2.2 將對應的許可權新增至在上述步驟中建立的服務主體。

screenshot grant permission srbac

以 Synapse 管理員istrator 身分登入您的 Azure Synapse Analytics 工作區
在 Synapse Studio 的左側窗格中，選取 [ 管理存取控制] >
按一下左上方的 [新增] 按鈕以 新增角色指派
針對 [範圍]，選擇 [ 工作區]
針對 [角色]，選擇 [ Synapse 計算運算子]
針對 [選取使用者]，輸入您的 < service_principal_name > 並按一下您的服務主體
按一下 [套用 ] （等候 3 分鐘讓許可權生效。

3.下載 Azure Synapse Prometheus 連線or

使用命令來安裝 Azure Synapse Prometheus 連線or。

git clone https://github.com/microsoft/azure-synapse-spark-metrics.git
cd ./azure-synapse-spark-metrics/synapse-prometheus-connector/src
python pip install -r requirements.txt

4.建立 Azure Synapse 工作區的設定檔

在 config 資料夾中建立 config.yaml 檔案，並填入下欄欄位：workspace_name、tenant_id、service_principal_name和service_principal_password。您可以在 yaml 組態中新增多個工作區。

workspaces:
  - workspace_name: <your_workspace_name>
    tenant_id: <tenant_id>
    service_principal_name: <service_principal_app_id>
    service_principal_password: "<service_principal_password>"

5.更新 Prometheus 設定

在 Prometheus scrape_config中新增下列組態區段，並將your_workspace_name > 取代 < 為您的工作區名稱和 < 複製的 synapse-prometheus-connector 資料夾path_to_synapse_connector >

- job_name: synapse-prometheus-connector
  static_configs:
  - labels:
      __metrics_path__: /metrics
      __scheme__: http
    targets:
    - localhost:8000
- job_name: synapse-workspace-<your_workspace_name>
  bearer_token_file: <path_to_synapse_connector>/output/workspace/<your_workspace_name>/bearer_token
  file_sd_configs:
  - files:
    - <path_to_synapse_connector>/output/workspace/<your_workspace_name>/application_discovery.json
    refresh_interval: 10s
  metric_relabel_configs:
  - source_labels: [ __name__ ]
    target_label: __name__
    regex: metrics_application_[0-9]+_[0-9]+_(.+)
    replacement: spark_$1
  - source_labels: [ __name__ ]
    target_label: __name__
    regex: metrics_(.+)
    replacement: spark_$1

6.在 Prometheus 伺服器 VM 中啟動連接器

在 Prometheus 伺服器 VM 中啟動連接器伺服器，如下所示。

python main.py

等候幾秒鐘，連接器應該開始運作。您可以在 Prometheus 服務探索頁面中看到「synapse-prometheus-connector」。

使用 Azure Synapse Prometheus 或 REST 計量 API 來收集計量資料

1. 驗證

您可以使用用戶端認證流程來取得存取權杖。若要存取計量 API，您應該取得服務主體的 Microsoft Entra 存取權杖，此權杖具有存取 API 的適當許可權。

參數	必要	描述
tenant_id	True	您的 Azure 服務主體（應用程式）租使用者識別碼
grant_type	True	指定要求的授與類型。在用戶端認證授與流程中，值必須client_credentials。
client_id	True	您在 Azure 入口網站或 Azure CLI 中註冊之應用程式的應用程式（服務主體）識別碼。
client_secret	True	為應用程式產生的秘密（服務主體）
resource	True	Synapse 資源 URI，應該是 ' https://dev.azuresynapse.net '

curl -X GET -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'grant_type=client_credentials&client_id=<service_principal_app_id>&resource=<azure_synapse_resource_id>&client_secret=<service_principal_secret>' \
  https://login.microsoftonline.com/<tenant_id>/oauth2/token

回應看起來如下：

{
  "token_type": "Bearer",
  "expires_in": "599",
  "ext_expires_in": "599",
  "expires_on": "1575500666",
  "not_before": "1575499766",
  "resource": "2ff8...f879c1d",
  "access_token": "ABC0eXAiOiJKV1Q......un_f1mSgCHlA"
}

2.列出在 Azure Synapse 工作區中執行的應用程式

若要取得 Synapse 工作區的 Apache Spark 應用程式清單，您可以遵循此檔監視 - 取得 Apache Spark 作業清單。

3.使用 Prometheus 或 REST API 收集 Apache Spark 應用程式計量

使用 Prometheus API 收集 Apache Spark 應用程式計量

依 Prometheus API 取得指定 Apache Spark 應用程式的最新計量

GET https://{endpoint}/livyApi/versions/{livyApiVersion}/sparkpools/{sparkPoolName}/sessions/{sessionId}/applications/{sparkApplicationId}/metrics/executors/prometheus?format=html

參數	必要	描述
endpoint	True	工作區開發端點，例如 `https://myworkspace.dev.azuresynapse.net.`
livyApiVersion	True	要求的有效 API 版本。目前為 2019-11-01-preview
sparkPoolName	True	Spark 集區的名稱。
sessionId	True	會話的識別碼。
sparkApplicationId	True	Spark 應用程式識別碼

範例要求：

GET https://myworkspace.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkpools/mysparkpool/sessions/1/applications/application_1605509647837_0001/metrics/executors/prometheus?format=html

範例回應：

狀態碼：200 回應看起來像：

metrics_executor_rddBlocks{application_id="application_1605509647837_0001", application_name="mynotebook_mysparkpool_1605509570802", executor_id="driver"} 0
metrics_executor_memoryUsed_bytes{application_id="application_1605509647837_0001", application_name="mynotebook_mysparkpool_1605509570802", executor_id="driver"} 74992
metrics_executor_diskUsed_bytes{application_id="application_1605509647837_0001", application_name="mynotebook_mysparkpool_1605509570802", executor_id="driver"} 0
metrics_executor_totalCores{application_id="application_1605509647837_0001", application_name="mynotebook_mysparkpool_1605509570802", executor_id="driver"} 0
metrics_executor_maxTasks{application_id="application_1605509647837_0001", application_name="mynotebook_mysparkpool_1605509570802", executor_id="driver"} 0
metrics_executor_activeTasks{application_id="application_1605509647837_0001", application_name="mynotebook_mysparkpool_1605509570802", executor_id="driver"} 1
metrics_executor_failedTasks_total{application_id="application_1605509647837_0001", application_name="mynotebook_mysparkpool_1605509570802", executor_id="driver"} 0
metrics_executor_completedTasks_total{application_id="application_1605509647837_0001", application_name="mynotebook_mysparkpool_1605509570802", executor_id="driver"} 2
...

使用 REST API 收集 Apache Spark 應用程式計量

GET https://{endpoint}/livyApi/versions/{livyApiVersion}/sparkpools/{sparkPoolName}/sessions/{sessionId}/applications/{sparkApplicationId}/executors

參數	必要	描述
endpoint	True	工作區開發端點，例如 `https://myworkspace.dev.azuresynapse.net.`
livyApiVersion	True	要求的有效 API 版本。目前為 2019-11-01-preview
sparkPoolName	True	Spark 集區的名稱。
sessionId	True	會話的識別碼。
sparkApplicationId	True	Spark 應用程式識別碼

範例要求

GET https://myworkspace.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkpools/mysparkpool/sessions/1/applications/application_1605509647837_0001/executors

範例回應狀態碼：200

[
    {
        "id": "driver",
        "hostPort": "f98b8fc2aea84e9095bf2616208eb672007bde57624:45889",
        "isActive": true,
        "rddBlocks": 0,
        "memoryUsed": 75014,
        "diskUsed": 0,
        "totalCores": 0,
        "maxTasks": 0,
        "activeTasks": 0,
        "failedTasks": 0,
        "completedTasks": 0,
        "totalTasks": 0,
        "totalDuration": 0,
        "totalGCTime": 0,
        "totalInputBytes": 0,
        "totalShuffleRead": 0,
        "totalShuffleWrite": 0,
        "isBlacklisted": false,
        "maxMemory": 15845975654,
        "addTime": "2020-11-16T06:55:06.718GMT",
        "executorLogs": {
            "stdout": "http://f98b8fc2aea84e9095bf2616208eb672007bde57624:8042/node/containerlogs/container_1605509647837_0001_01_000001/trusted-service-user/stdout?start=-4096",
            "stderr": "http://f98b8fc2aea84e9095bf2616208eb672007bde57624:8042/node/containerlogs/container_1605509647837_0001_01_000001/trusted-service-user/stderr?start=-4096"
        },
        "memoryMetrics": {
            "usedOnHeapStorageMemory": 75014,
            "usedOffHeapStorageMemory": 0,
            "totalOnHeapStorageMemory": 15845975654,
            "totalOffHeapStorageMemory": 0
        },
        "blacklistedInStages": []
    },
    // ...
]

4.建置您自己的診斷和監視工具

Prometheus API 和 REST API 提供有關 Apache Spark 應用程式執行資訊的豐富計量資料。您可以透過 Prometheus API 和 REST API 收集應用程式相關計量資料。並建置您自己的診斷和監視工具，更適合您的需求。

共用方式為