Muokkaa

Jaa


SQL Server Big Data Clusters runtime for Apache Spark Guide

Applies to: SQL Server 2019 (15.x)

Important

The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.

Introducing the SQL Server Big Data Clusters runtime for Apache Spark

The SQL Server Big Data Clusters runtime for Apache Spark is a standardized specification for Apache Spark that enables seamless interoperability between distributions. This Spark runtime is a consistent, versioned block of programming language distributions, engine optimizations, core libraries, and packages.

Every product that uses this runtime specification, will contain the same versions of Apache Spark Core, PySpark, Scala Spark, Spark.R, sparklyr, and .NET for Spark.

All the distributed packages and libraries are also the same. One of the primary goals for the specification is to provide a first-class experience for Data Engineers and Data Scientists by providing a constantly curated and updated list of packages and connectors, out-of-the-box.

Benefits of the SQL Server Big Data Clusters runtime for Apache Spark:

  1. Spark engine optimizations and features available on all products and services
  2. Established release cadence
  3. Seamless interoperability between Spark products and services
  4. Curated packages for Data Engineers and Data Scientists
  5. Consistent package management story

Release cadence and naming standards

The SQL Server Big Data Clusters runtime for Apache Spark specification defines the following:

The runtime naming standard is as follows:

"PRODUCT_NAME.SPARK_MAJOR_VERSION.CALENDAR_YEAR.RELEASE#"

Example is "BDC.3.2021.1".

RELEASE# is a sequential semantic number. It is not bound to months or any other standard. Once a runtime release is created, it is immutable. Each release of SQL Server Big Data Clusters ships with one version of the runtime.

What's in the current runtime release?

The SQL Server Big Data Clusters platform release notes have the runtime name and complete contents of the release.

Next steps

For more information, see Introducing SQL Server Big Data Clusters.