Azure Synapse Spark client library for .NET - version 1.0.0-preview.8
This directory contains the open source subset of the .NET SDK. For documentation of the complete Azure SDK, please see the Microsoft Azure .NET Developer Center.
Use the client library for Synapse to:
- Submit Spark Batch job and Spark Session Job
Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.
Getting started
The complete Microsoft Azure SDK can be downloaded from the Microsoft Azure Downloads Page and ships with support for building deployment packages, integrating with tooling, rich command line tooling, and more.
For the best development experience, developers should use the official Microsoft NuGet packages for libraries. NuGet packages are regularly updated with new functionality and hotfixes.
Install the package
Install the Spark client library for Azure Synapse Analytics for .NET with NuGet:
dotnet add package Azure.Analytics.Synapse.Spark --version 0.1.0-preview.1
Prerequisites
- Azure Subscription: To use Azure services, including Azure Synapse, you'll need a subscription. If you do not have an existing Azure account, you may sign up for a free trial or use your Visual Studio Subscription benefits when you create an account.
- An existing Azure Synapse workspace. If you need to create an Azure Synapse workspace, you can use the Azure Portal or Azure CLI.
If you use the Azure CLI, the command looks like below:
az synapse workspace create \
--name <your-workspace-name> \
--resource-group <your-resource-group-name> \
--storage-account <your-storage-account-name> \
--file-system <your-storage-file-system-name> \
--sql-admin-login-user <your-sql-admin-user-name> \
--sql-admin-login-password <your-sql-admin-user-password> \
--location <your-workspace-location>
Authenticate the client
In order to interact with the Azure Synapse Analytics service, you'll need to create an instance of the SparkBatchClient or SparkSessionClient class. You need a workspace endpoint, which you may see as "Development endpoint" in the portal, and client secret credentials (client id, client secret, tenant id) to instantiate a client object.
Client secret credential authentication is being used in this getting started section but you can find more ways to authenticate with Azure identity. To use the DefaultAzureCredential provider shown below, or other credential providers provided with the Azure SDK, you should install the Azure.Identity package:
Install-Package Azure.Identity
Examples
The Microsoft.Azure.Synapse supports the CRUD of spark batch job.
Spark Batch Job examples
List spark batch job
List the spark batch job under the specific spark pool of a specific synapse workspace
Response<SparkBatchJobCollection> jobs = client.GetSparkBatchJobs();
foreach (SparkBatchJob job in jobs.Value.Sessions)
{
Console.WriteLine(job.Name);
}
Create spark batch job
Create spark batch job under specific workspace and spark pool.
string name = $"batch-{Guid.NewGuid()}";
string file = string.Format("abfss://{0}@{1}.dfs.core.windows.net/samples/net/wordcount/wordcount.zip", fileSystem, storageAccount);
SparkBatchJobOptions request = new SparkBatchJobOptions(name, file)
{
ClassName = "WordCount",
Arguments =
{
string.Format("abfss://{0}@{1}.dfs.core.windows.net/samples/net/wordcount/shakespeare.txt", fileSystem, storageAccount),
string.Format("abfss://{0}@{1}.dfs.core.windows.net/samples/net/wordcount/result/", fileSystem, storageAccount),
},
DriverMemory = "28g",
DriverCores = 4,
ExecutorMemory = "28g",
ExecutorCores = 4,
ExecutorCount = 2
};
SparkBatchOperation createOperation = client.StartCreateSparkBatchJob(request);
while (!createOperation.HasCompleted)
{
System.Threading.Thread.Sleep(2000);
createOperation.UpdateStatus();
}
SparkBatchJob jobCreated = createOperation.Value;
Cancel spark batch job
Cancel a Spark batch job with Spark batch id under specific workspace and Spark pool.
Response operation = client.CancelSparkBatchJob(jobCreated.Id);
To build
For information on building the Azure Synapse client library, please see Building the Microsoft Azure SDK for .NET
Target frameworks
For information about the target frameworks of the Azure Synapse client library, please refer to the Target Frameworks of the Microsoft Azure SDK for .NET.
Key concepts
Submit Spark job.
Thread safety
We guarantee that all client instance methods are thread-safe and independent of each other (guideline). This ensures that the recommendation of reusing client instances is always safe, even across threads.
Additional concepts
Client options | Accessing the response | Long-running operations | Handling failures | Diagnostics | Mocking | Client lifetime
Troubleshooting
Please open issue in github.
Next steps
The next step is adding more examples
Contributing
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.