How to fix AzureMLException: AzureMLException: Message: Execution failed in operation 'to_pandas_dataframe' for Dataset(id='id', name='TEMP_DATA_ASSET', version=1, error_code=ScriptExecution.Database.Unexpected,error_message and ErrorCode: ScriptExecution
Issue: AzureMLException When Consuming Datastore in AzureML via to_pandas_dataframe()
Description of the Setup
I have manually set up a datastore in AzureML, which is linked to an Azure Synapse Analytics server. During the creation of the datastore, I selected:
Datastore Type: Azure SQL Database
Authentication Type: SQL Authentication
Inputs Provided: Server Name, Database Name
To consume this datastore, I created a data asset using the following SQL query:
SELECT * FROM TABLE_NAME
The dataset contains 100,000 rows, and the schema appeared to be correct in the preview. After confirming the schema, I successfully created a data asset named TEMP_DATA_ASSEST.
Issue Encountered
I use my company (organization) Azure ML studio. I use kernels: Python 3.10 - SDK v2, Python 3.8-AzureML
I have replaced some variables with temp values due to security reasons:session_id with session_idid with iddata asset name with TEMP_DATA_ASSESTserver name with server-name
I attempted to consume the dataset in a Python script using the AzureML Pandas SDK 1 available in the consume tab of data asset (azureml-core and azureml-dataprep[pandas]) with the following code:
Required Libraries
from azureml.core import Workspace, Dataset
AzureML Workspace Details
subscription_id = 'my_subscription_id'
resource_group = 'my_resource_group'
workspace_name = 'my_workspace'
Connect to the AzureML Workspace
workspace = Workspace(subscription_id, resource_group, workspace_name)
Load the Dataset
dataset = Dataset.get_by_name(workspace, name='TEMP_DATA_ASSEST')
Convert to Pandas DataFrame
df = dataset.to_pandas_dataframe()
When executing the to_pandas_dataframe() function, I encountered the following AzureMLException:
Error Message
AzureMLException: AzureMLException:
Message: Execution failed in operation 'to_pandas_dataframe' for Dataset(id='id', name='TEMP_DATA_ASSEST', version=1,
error_code=ScriptExecution.Database.Unexpected,
error_message=Database execution failed with "SQLError(Server(TokenError { code: 103010, state: 1, class: 16,
message: "Parse error at line: 1, column: 22: Incorrect syntax near 'stmt'.", server: "server-name", procedure: "", line: 1 }))".
Observations
The SQL syntax error suggests an issue with query parsing, but the query was accepted when creating the data asset.
The schema preview showed all fields correctly, and the dataset creation was successful.
The issue occurs only when calling to_pandas_dataframe(), preventing me from loading the data into a Pandas DataFrame.
The error message mentions "Incorrect syntax near 'stmt'", which is unclear because no direct query is provided in the function call.
Troubleshooting Attempts
Checked the SQL Query in Azure Synapse:
The query SELECT * FROM TABLE_NAME executes fine in Synapse, returning all rows.
Verified the Dataset Schema in AzureML Studio:
The dataset schema was detected correctly in the AzureML UI.
Attempted an Alternative Dataset Load Method:
df = dataset.to_pandas_dataframe(sample_size=1000)
Still fails with the same SQL parsing error.
Checked SDK Versions:
Installed versions:
azureml-core: 1.54.0
azureml-dataprep[pandas]: 1.40.0
Upgraded to the latest version:
pip install --upgrade azureml-core azureml-dataprep[pandas]
Issue persists after upgrading.
Explicitly Set SQL Query Instead of Dataset Name:
dataset = Dataset.Tabular.from_sql_query(workspace, "SELECT * FROM TABLE_NAME")
df = dataset.to_pandas_dataframe()
Same error message appears.
Questions for the Community
What could be causing this SQL parsing error when calling to_pandas_dataframe()? How do I resolve this?
Does AzureML modify the query internally when executing the dataset retrieval?
Are there any known limitations when consuming large datasets (~100,000 rows) via to_pandas_dataframe() from an Azure SQL Datastore?
Is there an alternative approach to directly load this dataset into a Pandas DataFrame?