Import from Azure SQL Database
Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.
- See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
- Learn more about Azure Machine Learning.
ML Studio (classic) documentation is being retired and may not be updated in the future.
This article describes how to use the Import Data module in Machine Learning Studio (classic), to get data from an Azure SQL Database or Azure SQL Data Warehouse.
Note
Applies to: Machine Learning Studio (classic) only
Similar drag-and-drop modules are available in Azure Machine Learning designer.
To import data from a database, you must specify both the server name and database name, and a SQL statement that defines the table, view, or query.
In general, storing data in Azure databases is more expensive than using tables or blobs in Azure. There may also be limits on the amount of data that you can store in a database, depending on your subscription type. However, there are no transaction fees against SQL Azure Database, so that option is ideal for fast access to smaller amounts of frequently used information, such as data lookup tables or data dictionaries.
Storing data in an Azure database is also preferred if you need to be able to filter data before reading it, or if you want to save predictions or metrics back to the database for reporting.
How to import data from Azure SQL Database or SQL Data Warehouse
Use the Data Import Wizard
The module features a new wizard to help you choose a storage option, select from among existing subscriptions and accounts, and quickly configure all options.
Add the Import Data module to your experiment. You can find the module in Studio (classic), in the Data Input and Output category.
Click Launch Import Data Wizard and follow the prompts.
When configuration is complete, to actually copy the data into your experiment, right-click the module, and select Run Selected.
If you need to edit an existing data connection, the wizard loads all previous configuration details so that you don't have to start again from scratch
Manually set properties in the Import Data module
The following steps describe how to manually configure the import source.
Add the Import Data module to your experiment. You can find this module in Studio (classic), in the Data Input and Output category.
For Data source, select Azure SQL Database.
Set the following options specific to Azure SQL Database or Azure SQL Data Warehouse.
Database server name: Type the server name that is generated by Azure. Typically it has the form
<generated_identifier>.database.windows.net
.Database name: Type the name of an existing database on the server you just specified.
Server user account name: Type the user name of an account that has access permissions for the database.
Server user account password: Provide the password for the specified user account.
Database query: Type or paste a SQL statement that describes the data you want to read. Always validate the SQL statement and verify the query results beforehand, using a tool such as Visual Studio Server Explorer or SQL Server Data Tools.
Note
Import Data module only supports inputting Database name, user account name and password as credentials.
If the dataset that you read into Machine Learning is not expected to change between runs of the experiment, select the Use cached results option.
When this is selected, if there are no other changes to module parameters, the experiment loads the data the first time the module is run, and thereafter uses a cached version of the dataset.
If you want to re-load the dataset on each iteration of the experiment, deselect this option. The dataset is reloaded from the source each time any parameters are changed in Import Data.
Run the experiment.
As Import Data loads the data into Studio (classic), some implicit type conversion might also be performed, depending on the data types used in the source database.
Results
When import is complete, click the output dataset and select Visualize to see if the data was imported successfully.
Optionally, you can change the dataset and its metadata using the tools in Studio (classic):
Use Edit Metadata to change column names, convert a column to a different data type, or to indicate which columns are labels or features.
Use Select Columns in Dataset to select a subset of columns.
Use Partition and Sample to separate the dataset by criteria, or get the top n rows.
Examples
For an example of how to use data from Azure databases in machine learning, see these articles and experiments:
Retail Forecasting Step 1 of 6 - data preprocessing: The Retail forecasting template illustrates a typical scenario that uses data stored in Azure SQLDB for analysis.
It also demonstrates some useful techniques, such as using Azure SQLDB to passing datasets between experiments in different accounts, saving and combining forecasts, and how to create an Azure SQLDB for machine learning.
Use Machine Learning with SQL Data Warehouse: This article demonstrates how to create a regression model to predict prices using Azure SQL Data Warehouse.
How to use Azure ML with Azure SQL Data Warehouse: This article builds a clustering model on AdventureWorks, using Import Data and Export Data with Azure SQL Data Warehouse.
Technical notes
This section contains implementation details, tips, and answers to frequently asked questions.
Common questions
Can I filter data as it is being read from the source?
The Import Data module does not support filtering as data is being read. We recommend that you create a view or define a query that generates only the rows you need.
Note
If you find that you have loaded more data than you need, you can overwrite the cached dataset by reading a new dataset, and saving it with the same name as the older, larger data.
Why do I get the error, “Type Decimal is not supported”?
When reading data from a SQL database, you might encounter an error message reporting an unsupported data type.
If the data you get from the SQL database includes data types that are not supported in Machine Learning, you should cast or convert the decimals to a supported data before reading the data. Import Data cannot automatically perform any conversions that would result in a loss of precision.
For more information about supported data types, see Module Data Types.
What happens if the database is in a different geographical region. Can Import Data still access the database? Where is the data stored?
If the database is in a different region from the machine learning account, data access might be slower. Further, you are charged for data ingress and egress on the subscription if the compute node is in a different region than the storage account.
Data that you read into your workspace for an experiment is saved in the storage account associated with the experiment.
Why are some characters not displayed correctly?
Machine Learning supports the UTF-8 encoding. If string columns in your database use a different encoding, the characters might not be imported correctly.
One option is to export the data to a CSV file in Azure storage, and use the option CSV with encoding to specify parameters for custom delimiters, the code page, and so forth.
Module parameters
Name | Range | Type | Default | Description |
---|---|---|---|---|
Data source | List | Data Source Or Sink | Azure Blob Storage | Data source can be HTTP, FTP, anonymous HTTPS or FTPS, a file in Azure BLOB storage, an Azure table, an Azure SQL Database, an on-premises SQL Server database, a Hive table, or an OData endpoint. |
HDFS server URI | any | String | none | HDFS rest endpoint |
Database server name | any | String | none | Azure storage account name |
Database name | any | SecureString | none | Azure storage key |
Server user account name | any | String | none | Azure container name |
Server user account name | List (subset) | Url Contents | OData | Data format type |
Database query | any | String | none | Data format type |
Use cached results | TRUE/FALSE | Boolean | FALSE | description |
Outputs
Name | Type | Description |
---|---|---|
Results dataset | Data Table | Dataset with downloaded data |
Exceptions
Exception | Description |
---|---|
Error 0027 | An exception occurs when two objects have to be the same size, but they are not. |
Error 0003 | An exception occurs if one or more of inputs are null or empty. |
Error 0029 | An exception occurs when an invalid URI is passed. |
Error 0030 | an exception occurs in when it is not possible to download a file. |
Error 0002 | An exception occurs if one or more parameters could not be parsed or converted from the specified type to the type required by the target method. |
Error 0009 | An exception occurs if the Azure storage account name or the container name is specified incorrectly. |
Error 0048 | An exception occurs when it is not possible to open a file. |
Error 0015 | An exception occurs if the database connection has failed. |
Error 0046 | An exception occurs when it is not possible to create a directory on specified path. |
Error 0049 | An exception occurs when it is not possible to parse a file. |
For a list of errors specific to Studio (classic) modules, see Machine Learning Error codes.
For a list of API exceptions, see Machine Learning REST API Error Codes.
See also
Import Data
Export Data
Import from Web URL via HTTP
Import from Hive Query
Import from Azure Table
Import from Azure Blob Storage
Import from Data Feed Providers
Import from On-Premises SQL Server Database