Partilhar via


Machine Learning Services in SQL Server 2017

This post is authored by Sumit Kumar, Senior Program Manager at Microsoft.

Hopefully, you are already aware of the first release candidate (RC1) of SQL Server 2017 which became available earlier in July. This release includes several powerful enhancements in the Machine Learning Services – the core intelligence engine in SQL Server. This release further expands on SQL Server’s value proposition of 'the first commercial database with AI built-in’ . We now have comprehensive Python interfaces for Microsoft’s scalable and fast analytics and ML algorithms in SQL Server. Additionally, in-database Python support has enabled bringing all the open source innovations like Microsoft Cognitive Toolkit, Tensorflow, scikit-learn, Caffe, etc. as embedded intelligence into SQL Server.

Some of the main highlights of this release are -

In-database Python Integration

With full support of in-database Python (in addition to R) in the newly rebranded SQL Server Machine Learning Services (from SQL Server R Services), the vast population of Python developers and ML practitioners can now leverage the power of SQL Server. And the SQL Server developers now have access to the extensive Python ML and AI libraries from the open source ecosystem along with the latest innovations from Microsoft (revoscalepy and microsoftml libraries) for developing intelligent applications with in-database analytics. Some of the top innovations included in this release:

revoscalepy

This package has the Pythonic version of Microsoft’s proprietary Parallel External Memory Algorithms (APIs for linear and logistic regressions, decision tree, boosted tree and random forest) and rich set of APIs for ETL, remote compute contexts and data sources. These are the same scalable and parallelized algorithms (with ‘rx’ prefix) that have been the differentiating value proposition of Microsoft R Server and allow scaling analytics to arbitrarily large datasets, way beyond the available memory.

microsoftml

This package is a set of state of the art, battle tested ML algorithms and transforms with Python bindings including deep neural net, one class SVM, fast tree, forest, linear and logistic regressions etc. In addition, this package contains pre-trained models for extracting features from images using ResNet models, and doing sentiment analysis from English language text - which dramatically simplifies the creation and deployment of complex AI scenarios on image and text data.

Python operationalization with T-SQL

Full Python integration with the sp_execute_external_script infrastructure in SQL Server enables the enterprise grade operationalization of Python models and scripts as simple stored procedures. Streaming data from SQL to Python processes and MPI ring parallelization support provides screaming performance to the Python scripts.

Python remote compute in SQL Server

With the SQL Server remote compute context, data scientists and developers can execute Python code remotely from their development environments to explore data and develop models without moving data around.

In-database Python integration is not limited to just machine learning and AI solutions - it is equally useful for general purpose data analysis work by combining Python and SQL in powerful ways; leveraging strengths of respective languages.

Advancing the performance promise of in-database R and Python

After demonstrating industry leading 1 MM+ rows/sec batch scoring performance we have now pushed the boundaries of performance for single row scoring with real-time scoring. Models trained by RevoScaleR, revoscalepy and MicrosoftML algorithms can be used to score data in under 10 milliseconds - an improvement of 2+ orders of magnitude for scoring single row at a time. We have built embedded scoring capabilities in SQL that eliminates the need to call R and Python runtimes for scoring supported models. Real-time scoring for R is also available to SQL Server 2016 customers on upgrading in-database R to the latest release of Microsoft R Server.

A subset of these algorithms (RevoScaleR and revoscalepy) are also natively supported by the new PREDICT verb (a system table value function) which makes it easy to embed this blazingly fast scoring functionality naturally in regular T-SQL SELECT statements.

Improving the usability of R packages in SQL Server

We have further improved the R package management in SQL Server. We have a rich set of R functions to do package management in SQL Server that gives users the ability to install, uninstall and manage packages in various roles and scopes. In addition, we have now added support for managing packages using SQL commands (DDL statements). This approach ensures availability of the previously installed packages when server fails over.

I encourage you to explore the above-mentioned enhancements in the RC1 release and reach out to us if you have any feedback. We will be writing more blogs with details of these features in subsequent posts. We are actively working with a set of early adopter customers and getting ready for the GA release later this year. If you are interested in becoming an early adopter and working more closely with us in a lab setting, please contact me at sumit dot kumar at Microsoft dot com.

 

Sumit