Sådan opretter og opdaterer du en Spark Job Definition med Microsoft Fabric Rest API

Artikel
11/15/2023

Microsoft Fabric Rest API indeholder et tjenesteslutpunkt for CRUD-handlinger for Fabric-elementer. I dette selvstudium gennemgår vi et scenarie fra ende til anden, hvordan du opretter og opdaterer en Spark Job Definition-artefakt. Der er tre trin på højt niveau:

opret et Spark Job Definition-element med en indledende tilstand
upload hoveddefinitionsfilen og andre lib-filer
opdater spark jobdefinitionselementet med URL-adressen til OneLake for hoveddefinitionsfilen og andre lib-filer

Forudsætninger

Der kræves et Microsoft Entra-token for at få adgang til Fabric Rest-API'en. MSAL-biblioteket anbefales for at hente tokenet. Du kan få flere oplysninger under Understøttelse af godkendelsesflow i MSAL.
Der kræves et lagertoken for at få adgang til OneLake-API'en. Du kan få flere oplysninger under MSAL til Python.

Opret et spark jobdefinitionselement med den oprindelige tilstand

Microsoft Fabric Rest API definerer et samlet slutpunkt for CRUD-handlinger for Fabric-elementer. Slutpunktet er https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items.

Elementdetaljerne er angivet i anmodningens brødtekst. Her er et eksempel på brødteksten i anmodningen om oprettelse af et Element til definition af Spark-job:

{
    "displayName": "SJDHelloWorld",
    "type": "SparkJobDefinition",
    "definition": {
        "format": "SparkJobDefinitionV1",
        "parts": [
            {
                "path": "SparkJobDefinitionV1.json",
                "payload":"eyJleGVjdXRhYmxlRmlsZSI6bnVsbCwiZGVmYXVsdExha2Vob3VzZUFydGlmYWN0SWQiOiIiLCJtYWluQ2xhc3MiOiIiLCJhZGRpdGlvbmFsTGFrZWhvdXNlSWRzIjpbXSwicmV0cnlQb2xpY3kiOm51bGwsImNvbW1hbmRMaW5lQXJndW1lbnRzIjoiIiwiYWRkaXRpb25hbExpYnJhcnlVcmlzIjpbXSwibGFuZ3VhZ2UiOiIiLCJlbnZpcm9ubWVudEFydGlmYWN0SWQiOm51bGx9",
                "payloadType": "InlineBase64"
            }
        ]
    }
}

I dette eksempel er elementet Spark Job Definition navngivet som SJDHelloWorld. Feltet payload er det base64-kodede indhold i detaljekonfigurationen, efter afkodning er indholdet:

{
    "executableFile":null,
    "defaultLakehouseArtifactId":"",
    "mainClass":"",
    "additionalLakehouseIds":[],
    "retryPolicy":null,
    "commandLineArguments":"",
    "additionalLibraryUris":[],
    "language":"",
    "environmentArtifactId":null
}

Her er to hjælpefunktioner til at kode og afkode den detaljerede konfiguration:

import base64

def json_to_base64(json_data):
    # Serialize the JSON data to a string
    json_string = json.dumps(json_data)
    
    # Encode the JSON string as bytes
    json_bytes = json_string.encode('utf-8')
    
    # Encode the bytes as Base64
    base64_encoded = base64.b64encode(json_bytes).decode('utf-8')
    
    return base64_encoded

def base64_to_json(base64_data):
    # Decode the Base64-encoded string to bytes
    base64_bytes = base64_data.encode('utf-8')
    
    # Decode the bytes to a JSON string
    json_string = base64.b64decode(base64_bytes).decode('utf-8')
    
    # Deserialize the JSON string to a Python dictionary
    json_data = json.loads(json_string)
    
    return json_data

Her er kodestykket til oprettelse af et Element til definition af Spark-job:

import requests

bearerToken = "breadcrumb"; # replace this token with the real AAD token

headers = {
    "Authorization": f"Bearer {bearerToken}", 
    "Content-Type": "application/json"  # Set the content type based on your request
}

payload = "eyJleGVjdXRhYmxlRmlsZSI6bnVsbCwiZGVmYXVsdExha2Vob3VzZUFydGlmYWN0SWQiOiIiLCJtYWluQ2xhc3MiOiIiLCJhZGRpdGlvbmFsTGFrZWhvdXNlSWRzIjpbXSwicmV0cnlQb2xpY3kiOm51bGwsImNvbW1hbmRMaW5lQXJndW1lbnRzIjoiIiwiYWRkaXRpb25hbExpYnJhcnlVcmlzIjpbXSwibGFuZ3VhZ2UiOiIiLCJlbnZpcm9ubWVudEFydGlmYWN0SWQiOm51bGx9"

# Define the payload data for the POST request
payload_data = {
    "displayName": "SJDHelloWorld",
    "Type": "SparkJobDefinition",
    "definition": {
        "format": "SparkJobDefinitionV1",
        "parts": [
            {
                "path": "SparkJobDefinitionV1.json",
                "payload": payload,
                "payloadType": "InlineBase64"
            }
        ]
    }
}

# Make the POST request with Bearer authentication
sjdCreateUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items"
response = requests.post(sjdCreateUrl, json=payload_data, headers=headers)

Upload hoveddefinitionsfilen og andre lib-filer

Der kræves et lagertoken for at uploade filen til OneLake. Her er en hjælpefunktion til at hente lagertokenet:


import msal

def getOnelakeStorageToken():
    app = msal.PublicClientApplication(
        "{client id}", # this filed should be the client id 
        authority="https://login.microsoftonline.com/microsoft.com")

    result = app.acquire_token_interactive(scopes=["https://storage.azure.com/.default"])

    print(f"Successfully acquired AAD token with storage audience:{result['access_token']}")

    return result['access_token']

Nu har vi oprettet et Element til definition af Spark-job, så det kan køres. Vi skal konfigurere den primære definitionsfil og de påkrævede egenskaber. Slutpunktet for overførsel af filen for dette SJD-element er https://onelake.dfs.fabric.microsoft.com/{workspaceId}/{sjdartifactid}. Det samme "workspaceId" fra det forrige trin skal bruges. Værdien af "sjdartifactid" blev fundet i svarteksten i det forrige trin. Her er kodestykket til konfiguration af hoveddefinitionsfilen:

import requests

# three steps are required: create file, append file, flush file

onelakeEndPoint = "https://onelake.dfs.fabric.microsoft.com/workspaceId/sjdartifactid"; # replace the id of workspace and artifact with the right one
mainExecutableFile = "main.py"; # the name of the main executable file
mainSubFolder = "Main"; # the sub folder name of the main executable file. Don't change this value


onelakeRequestMainFileCreateUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?resource=file" # the url for creating the main executable file via the 'file' resource type
onelakePutRequestHeaders = {
    "Authorization": f"Bearer {onelakeStorageToken}", # the storage token can be achieved from the helper function above
}

onelakeCreateMainFileResponse = requests.put(onelakeRequestMainFileCreateUrl, headers=onelakePutRequestHeaders)
if onelakeCreateMainFileResponse.status_code == 201:
    # Request was successful
    print(f"Main File '{mainExecutableFile}' was successfully created in onelake.")

# with previous step, the main executable file is created in OneLake, now we need to append the content of the main executable file

appendPosition = 0;
appendAction = "append";

### Main File Append.
mainExecutableFileSizeInBytes = 83; # the size of the main executable file in bytes
onelakeRequestMainFileAppendUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={appendPosition}&action={appendAction}";
mainFileContents = "filename = 'Files/' + Constant.filename; tablename = 'Tables/' + Constant.tablename"; # the content of the main executable file, please replace this with the real content of the main executable file
mainExecutableFileSizeInBytes = 83; # the size of the main executable file in bytes, this value should match the size of the mainFileContents

onelakePatchRequestHeaders = {
    "Authorization": f"Bearer {onelakeStorageToken}",
    "Content-Type" : "text/plain"
}

onelakeAppendMainFileResponse = requests.patch(onelakeRequestMainFileAppendUrl, data = mainFileContents, headers=onelakePatchRequestHeaders)
if onelakeAppendMainFileResponse.status_code == 202:
    # Request was successful
    print(f"Successfully Accepted Main File '{mainExecutableFile}' append data.")

# with previous step, the content of the main executable file is appended to the file in OneLake, now we need to flush the file

flushAction = "flush";

### Main File flush
onelakeRequestMainFileFlushUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={mainExecutableFileSizeInBytes}&action={flushAction}"
print(onelakeRequestMainFileFlushUrl)
onelakeFlushMainFileResponse = requests.patch(onelakeRequestMainFileFlushUrl, headers=onelakePatchRequestHeaders)
if onelakeFlushMainFileResponse.status_code == 200:
    print(f"Successfully Flushed Main File '{mainExecutableFile}' contents.")
else:
    print(onelakeFlushMainFileResponse.json())

Følg den samme proces for at uploade de andre lib-filer, hvis det er nødvendigt.

Opdater elementet Spark Job Definition med URL-adressen til OneLake for hoveddefinitionsfilen og andre lib-filer

Indtil nu har vi oprettet et Spark Job Definition-element med en indledende tilstand, uploadet hoveddefinitionsfilen og andre lib-filer. Det sidste trin er at opdatere Spark Job Definition-elementet for at angive URL-egenskaberne for hoveddefinitionsfilen og andre lib-filer. Slutpunktet for opdatering af elementet Spark Job Definition er https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{sjdartifactid}. Det samme "workspaceId" og "sjdartifactid" fra tidligere trin skal bruges. Her er kodestykket til opdatering af spark jobdefinitionselementet:


mainAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Main/{mainExecutableFile}" # the workspaceId and sjdartifactid are the same as previous steps, the mainExecutableFile is the name of the main executable file
libsAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Libs/{libsFile}"  # the workspaceId and sjdartifactid are the same as previous steps, the libsFile is the name of the libs file
defaultLakehouseId = 'defaultLakehouseid'; # replace this with the real default lakehouse id

updateRequestBodyJson = {
    "executableFile":mainAbfssPath,
    "defaultLakehouseArtifactId":defaultLakehouseId,
    "mainClass":"",
    "additionalLakehouseIds":[],
    "retryPolicy":None,
    "commandLineArguments":"",
    "additionalLibraryUris":[libsAbfssPath],
    "language":"Python",
    "environmentArtifactId":None}

# Encode the bytes as a Base64-encoded string
base64EncodedUpdateSJDPayload = json_to_base64(updateRequestBodyJson)

# Print the Base64-encoded string
print("Base64-encoded JSON payload for SJD Update:")
print(base64EncodedUpdateSJDPayload)

# Define the API URL
updateSjdUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items/{sjdartifactid}/updateDefinition"

updatePayload = base64EncodedUpdateSJDPayload
payloadType = "InlineBase64"
path = "SparkJobDefinitionV1.json"
format = "SparkJobDefinitionV1"
Type = "SparkJobDefinition"

# Define the headers with Bearer authentication
bearerToken = "breadcrumb"; # replace this token with the real AAD token

headers = {
    "Authorization": f"Bearer {bearerToken}", 
    "Content-Type": "application/json"  # Set the content type based on your request
}

# Define the payload data for the POST request
payload_data = {
    "displayName": "sjdCreateTest11",
    "Type": Type,
    "definition": {
        "format": format,
        "parts": [
            {
                "path": path,
                "payload": updatePayload,
                "payloadType": payloadType
            }
        ]
    }
}


# Make the POST request with Bearer authentication
response = requests.post(updateSjdUrl, json=payload_data, headers=headers)
if response.status_code == 200:
    print("Successfully updated SJD.")
else:
    print(response.json())
    print(response.status_code)

For at opsummere hele processen skal både Fabric REST API og OneLake API oprette og opdatere et Element i Spark Job Definition. Fabric REST API bruges til at oprette og opdatere Spark Job Definition-elementet, OneLake-API'en bruges til at uploade hoveddefinitionsfilen og andre lib-filer. Den primære definitionsfil og andre lib-filer uploades først til OneLake. Derefter angives URL-egenskaberne for hoveddefinitionsfilen og andre biblioteksfiler i elementet Spark Job Definition.

Planlæg og kør en Apache Spark-jobdefinition

Del via

Sådan opretter og opdaterer du en Spark Job Definition med Microsoft Fabric Rest API

Forudsætninger

Opret et spark jobdefinitionselement med den oprindelige tilstand

Upload hoveddefinitionsfilen og andre lib-filer

Opdater elementet Spark Job Definition med URL-adressen til OneLake for hoveddefinitionsfilen og andre lib-filer

Feedback

Yderligere ressourcer

Del via

Sådan opretter og opdaterer du en Spark Job Definition med Microsoft Fabric Rest API

Forudsætninger

Opret et spark jobdefinitionselement med den oprindelige tilstand

Upload hoveddefinitionsfilen og andre lib-filer

Opdater elementet Spark Job Definition med URL-adressen til OneLake for hoveddefinitionsfilen og andre lib-filer

Relateret indhold

Feedback

Yderligere ressourcer