Share via


Perform analytics with Machine Learning Studio (classic) using a SQL Server database

APPLIES TO: Applies to.Machine Learning Studio (classic) Does not apply to.Azure Machine Learning

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

Often enterprises that work with on-premises data would like to take advantage of the scale and agility of the cloud for their machine learning workloads. But they don't want to disrupt their current business processes and workflows by moving their on-premises data to the cloud. Machine Learning Studio (classic) now supports reading your data from a SQL Server database and then training and scoring a model with this data. You no longer have to manually copy and sync the data between the cloud and your on-premises server. Instead, the Import Data module in Machine Learning Studio (classic) can now read directly from your SQL Server database for your training and scoring jobs.

This article provides an overview of how to ingress SQL Server data into Machine Learning Studio (classic). It assumes that you're familiar with Studio (classic) concepts like workspaces, modules, datasets, experiments, etc..

Note

This feature is not available for free workspaces. For more information about Machine Learning pricing and tiers, see Machine Learning Studio (classic) Pricing.

Install the Data Factory Self-hosted Integration Runtime

To access a SQL Server database in Machine Learning Studio (classic), you need to download and install the Data Factory Self-hosted Integration Runtime, formerly known as the Data Management Gateway. When you configure the connection in Machine Learning Studio (classic), you have the opportunity to download and install the Integration Runtime (IR) using the Download and register data gateway dialog described below.

You also can install the IR ahead of time by downloading and running the MSI setup package from the Microsoft Download Center.The MSI can also be used to upgrade an existing IR to the latest version, with all settings preserved.

The Data Factory Self-Hosted Integration Runtime has the following prerequisites:

  • The Data Factory Self-Hosted Integration requires a 64-bit Operating System with .NET Framework 4.6.1 or above.
  • The supported Windows operating system versions are Windows 10 , Windows Server 2012, Windows Server 2012 R2, Windows Server 2016.
  • The recommended configuration for the IR machine is at least 2 GHz, 4 Core CPU, 8GB RAM, and 80GB disk.
  • If the host machine hibernates, the IR won't respond to data requests. Therefore, configure an appropriate power plan on the computer before installing the IR. If the machine is configured to hibernate, the IR installation displays a message.
  • Because copy activity occurs at a specific frequency, the resource usage (CPU, memory) on the machine also follows the same pattern with peak and idle times. Resource utilization also depends heavily on the amount of data being moved. When multiple copy jobs are in progress, you'll observe resource usage go up during peak times. While the minimum configuration listed above is technically sufficient, you may want to have a configuration with more resources than the minimum configuration depending on your specific load for data movement.

Consider the following when setting up and using a Data Factory Self-hosted Integration Runtime:

  • You can install only one instance of IR on a single computer.

  • You can use a single IR for multiple on-premises data sources.

  • You can connect multiple IRs on different computers to the same on-premises data source.

  • You configure an IRs for only one workspace at a time. Currently, IRs can't be shared across workspaces.

  • You can configure multiple IRs for a single workspace. For example, you may want to use an IR that's connected to your test data sources during development and a production IR when you're ready to operationalize.

  • The IR does not need to be on the same machine as the data source. But staying closer to the data source reduces the time for the gateway to connect to the data source. We recommend that you install the IR on a machine that's different from the one that hosts the on-premises data source so that the gateway and data source don't compete for resources.

  • If you already have an IR installed on your computer serving Power BI or Azure Data Factory scenarios, install a separate IR for Machine Learning Studio (classic) on another computer.

    Note

    You can't run Data Factory Self-hosted Integration Runtime and Power BI Gateway on the same computer.

  • You need to use the Data Factory Self-hosted Integration Runtime for Machine Learning Studio (classic) even if you are using Azure ExpressRoute for other data. You should treat your data source as an on-premises data source (that's behind a firewall) even when you use ExpressRoute. Use the Data Factory Self-hosted Integration Runtime to establish connectivity between Machine Learning and the data source.

You can find detailed information on installation prerequisites, installation steps, and troubleshooting tips in the article Integration Runtime in Data Factory.

Ingress data from your SQL Server database into Machine Learning

In this walkthrough, you will set up an Azure Data Factory Integration Runtime in an Azure Machine Learning workspace, configure it, and then read data from a SQL Server database.

Tip

Before you start, disable your browser's pop-up blocker for studio.azureml.net. If you're using the Google Chrome browser, download and install one of the several plug-ins available at Google Chrome WebStore Click Once App Extension.

Note

Azure Data Factory Self-hosted Integration Runtime was formerly known as Data Management Gateway. The step by step tutorial will continue to refer to it as a gateway.

Step 1: Create a gateway

The first step is to create and set up the gateway to access your SQL database.

  1. Log in to Machine Learning Studio (classic) and select the workspace that you want to work in.

  2. Click the SETTINGS blade on the left, and then click the DATA GATEWAYS tab at the top.

  3. Click NEW DATA GATEWAY at the bottom of the screen.

    New Data Gateway

  4. In the New data gateway dialog, enter the Gateway Name and optionally add a Description. Click the arrow on the bottom right-hand corner to go to the next step of the configuration.

    Enter gateway name and description

  5. In the Download and register data gateway dialog, copy the GATEWAY REGISTRATION KEY to the clipboard.

    Download and register data gateway

  6. If you have not yet downloaded and installed the Microsoft Data Management Gateway, then click Download data management gateway. This takes you to the Microsoft Download Center where you can select the gateway version you need, download it, and install it. You can find detailed information on installation prerequisites, installation steps, and troubleshooting tips in the beginning sections of the article Move data between on-premises sources and cloud with Data Management Gateway.

  7. After the gateway is installed, the Data Management Gateway Configuration Manager will open and the Register gateway dialog is displayed. Paste the Gateway Registration Key that you copied to the clipboard and click Register.

  8. If you already have a gateway installed, run the Data Management Gateway Configuration Manager. Click Change key, paste the Gateway Registration Key that you copied to the clipboard in the previous step, and click OK.

  9. When the installation is complete, the Register gateway dialog for Microsoft Data Management Gateway Configuration Manager is displayed. Paste the GATEWAY REGISTRATION KEY that you copied to the clipboard in a previous step and click Register.

    Register gateway

  10. The gateway configuration is complete when the following values are set on the Home tab in Microsoft Data Management Gateway Configuration Manager:

    • Gateway name and Instance name are set to the name of the gateway.

    • Registration is set to Registered.

    • Status is set to Started.

    • The status bar at the bottom displays Connected to Data Management Gateway Cloud Service along with a green check mark.

      Data Management Gateway Manager

      Machine Learning Studio (classic) also gets updated when the registration is successful.

    Gateway registration successful

  11. In the Download and register data gateway dialog, click the check mark to complete the setup. The Settings page displays the gateway status as "Online". In the right-hand pane, you'll find status and other useful information.

    Gateway settings

  12. In the Microsoft Data Management Gateway Configuration Manager switch to the Certificate tab. The certificate specified on this tab is used to encrypt/decrypt credentials for the on-premises data store that you specify in the portal. This certificate is the default certificate. Microsoft recommends changing this to your own certificate that you back up in your certificate management system. Click Change to use your own certificate instead.

    Change gateway certificate

  13. (optional) If you want to enable verbose logging in order to troubleshoot issues with the gateway, in the Microsoft Data Management Gateway Configuration Manager switch to the Diagnostics tab and check the Enable verbose logging for troubleshooting purposes option. The logging information can be found in the Windows Event Viewer under the Applications and Services Logs -> Data Management Gateway node. You can also use the Diagnostics tab to test the connection to an on-premises data source using the gateway.

    Enable verbose logging

This completes the gateway setup process in Machine Learning Studio (classic). You're now ready to use your on-premises data.

You can create and set up multiple gateways in Studio (classic) for each workspace. For example, you may have a gateway that you want to connect to your test data sources during development, and a different gateway for your production data sources. Machine Learning Studio (classic) gives you the flexibility to set up multiple gateways depending upon your corporate environment. Currently you can't share a gateway between workspaces and only one gateway can be installed on a single computer. For more information, see Move data between on-premises sources and cloud with Data Management Gateway.

Step 2: Use the gateway to read data from an on-premises data source

After you set up the gateway, you can add an Import Data module to an experiment that inputs the data from the SQL Server database.

  1. In Machine Learning Studio (classic), select the EXPERIMENTS tab, click +NEW in the lower-left corner, and select Blank Experiment (or select one of several sample experiments available).

  2. Find and drag the Import Data module to the experiment canvas.

  3. Click Save as below the canvas. Enter "Machine Learning Studio (classic) On-Premises SQL Server Tutorial" for the experiment name, select the workspace, and click the OK check mark.

    Save experiment with a new name

  4. Click the Import Data module to select it, then in the Properties pane to the right of the canvas, select "On-Premises SQL Database" in the Data source dropdown list.

  5. Select the Data gateway you installed and registered. You can set up another gateway by selecting "(add new Data Gateway…)".

    Select data gateway for Import Data module

  6. Enter the SQL Database server name and Database name, along with the SQL Database query you want to execute.

  7. Click Enter values under User name and password and enter your database credentials. You can use Windows Integrated Authentication or SQL Server Authentication depending upon how your SQL Server is configured.

    Enter database credentials

    The message "values required" changes to "values set" with a green check mark. You only need to enter the credentials once unless the database information or password changes. Machine Learning Studio (classic) uses the certificate you provided when you installed the gateway to encrypt the credentials in the cloud. Azure never stores on-premises credentials without encryption.

    Import Data module properties

  8. Click RUN to run the experiment.

Once the experiment finishes running, you can visualize the data you imported from the database by clicking the output port of the Import Data module and selecting Visualize.

Once you finish developing your experiment, you can deploy and operationalize your model. Using the Batch Execution Service, data from the SQL Server database configured in the Import Data module will be read and used for scoring. While you can use the Request Response Service for scoring on-premises data, Microsoft recommends using the Excel Add-in instead. Currently, writing to a SQL Server database through Export Data is not supported either in your experiments or published web services.