Data flow is not valid, Argument writer expected parquet|preppy|delimited in Azure ML Pipeline

Question

Data flow is not valid, Argument writer expected parquet|preppy|delimited in Azure ML Pipeline

Gunjan Kanani 45

I have an Azure ML pipeline that has been running successfully on a daily schedule. However, the pipeline failed during its latest run, producing the following error message:

AzureMLException: Message: Error Code: ScriptExecution.StreamAccess.Unexpected Native Error: error in streaming from input data sources StreamError(Unknown("Dataflow at inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5 is not valid.", Some(DataflowInvalid("inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "parquet|preppy|delimited", actual: "dfd" }))))))) => Dataflow at inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5 is not valid. Unknown("Dataflow at inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5 is not valid.", Some(DataflowInvalid("inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "parquet|preppy|delimited", actual: "dfd" })))))) => Dataflow at inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5 is not valid. DataflowInvalid("inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "parquet|preppy|delimited", actual: "dfd" })))) Error Message: Got unexpected error: Dataflow at inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5 is not valid.. DataflowInvalid("inmemory://dataflow/82a5f49706292dab81a220c3f28b76e5", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "parquet|preppy|delimited", actual: "dfd" }))))| session_id=b6b7c584-5e20-4d06-9882-74bb3f22eab4 InnerException None

Pipeline Overview:

Datasets:
- I am using two datasets created within Azure ML itself. These datasets pull data from Azure SQL Database tables.
- These datasets have been used successfully without issues until now.
Pipeline Step:
- The error occurs in the Execute Python Script step.

Python Code in the Script: Below is the code used in the Execute Python Script module:

   def azureml_main(dataframe1=None, dataframe2=None):
       import subprocess
       import sys
       def install(package):
           subprocess.check_call([sys.executable, "-m", "pip", "install", package])
       install('sentence-transformers')
       install('numpy')
       install('scikit-learn')
       install('pandas')
       import pandas as pd
       import re
       from sentence_transformers import SentenceTransformer
       from sklearn.metrics.pairwise import cosine_similarity
       import numpy as np
       def exactmatch_preprocess_text(text):
           text = re.sub(r'\W', ' ', text)
           text = re.sub(r'\s+', ' ', text)
           return text
       def exact_matching(text_df_match, label_df_match, cln_text_col, cln_label_col, label_col):
           any_matches_found = False
           for index, row in label_df_match.iterrows():
               keyword = row[cln_label_col]
               is_match = row['IsMatch']
               matches = pd.Series([False] * len(text_df_match), index=text_df_match.index)
               if is_match == 1:
                   pattern = r'^\b' + re.escape(keyword) + r'\b$'
                   matches = text_df_match[cln_text_col].apply(lambda x: bool(re.fullmatch(pattern, x)))
               if matches.any():
                   any_matches_found = True
                   text_df_match.loc[matches, "ContainsKeyword"] = True
                   text_df_match.loc[matches, "MatchingKeywords"] += row[label_col]
           return text_df_match
       text_df_match = dataframe1[['Desc']].rename(columns={'Desc': 'text'})
       text_df_match['cleaned_text'] = text_df_match['text'].apply(exactmatch_preprocess_text)
       label_df_match = dataframe2.rename(columns={'Keyword': 'label'})
       label_df_match['cleaned_keyword'] = label_df_match['label'].apply(exactmatch_preprocess_text)
       label_df_match = label_df_match[label_df_match['IsMatch'] == 1].reset_index(drop=True)
       text_df_match['ContainsKeyword'] = False
       text_df_match['MatchingKeywords'] = ''  
       text_df_match = exact_matching(text_df_match, label_df_match, 'cleaned_text', 'cleaned_keyword', 'label')
       exact_match_condition = (text_df_match["ContainsKeyword"] == True)
       exact_match_df = text_df_match.loc[exact_match_condition, ['text', 'MatchingKeywords']]
       exact_match_df.reset_index(inplace=True, drop=True)
       exact_match_df.rename(columns={'MatchingKeywords': 'assigned_labels'}, inplace=True)
       exact_match_df['confidence_score'] = 1.0
       exact_match_df = exact_match_df.astype({
           'text': 'string',
           'assigned_labels': 'string',
           'confidence_score': 'float'
       })
       return exact_match_df,

Gunjan Kanani 45 Reputation points

2024-11-27T14:02:17.6866667+00:00

Attaching the pipeline image here, if anyone can help here is much appreciated.
Ash007 0 Reputation points

2024-11-28T01:15:45.1233333+00:00

I'm running into the same (AzureMLException: Message: Error Code: ScriptExecution.StreamAccess.Unexpected) since Nov 23rd. The pipeline used to work for me before but I'm running into the AzureMLException now - I haven't changed any pill or any setting.

I'm running into the exception on any pill I'm connecting to my data import pill. Attached screenshot for reference.

AzureMLInsuranceError.PNG
romungi-MSFT 48,771 Reputation points Microsoft Employee

2024-11-28T08:50:09.6233333+00:00
@Gunjan Kanani Seems like an error in processing the file passed from the earlier component.

expected: "parquet|preppy|delimited", actual: "dfd"

Do you think the file is of correct format?
Farley, Andrew 0 Reputation points

2024-11-28T16:23:30.8633333+00:00

I am experiencing the same error on a pipeline that worked a month ago. The data itself is in the correct format, there has clearly been an update recently that messed up how data is loaded into designer. Additionally, my error happens on the "Select Columns in Dataset" component right after loading in the data. I don't even have custom scripts and I am getting the same issue.
Ash007 0 Reputation points

2024-11-28T22:29:49.5+00:00

@romungi-MSFT This is happening on pipelines which were working well before. I did not change any format or any setting. I just reran the pipeline and I'm running into this exception.

2 answers

Your answer

Gunjan Kanani 45 Reputation points

2024-11-27T14:02:17.6866667+00:00

Attaching the pipeline image here, if anyone can help here is much appreciated.
Ash007 0 Reputation points

2024-11-28T01:15:45.1233333+00:00

I'm running into the same (AzureMLException: Message: Error Code: ScriptExecution.StreamAccess.Unexpected) since Nov 23rd. The pipeline used to work for me before but I'm running into the AzureMLException now - I haven't changed any pill or any setting.

I'm running into the exception on any pill I'm connecting to my data import pill. Attached screenshot for reference.

AzureMLInsuranceError.PNG
romungi-MSFT 48,771 Reputation points Microsoft Employee

2024-11-28T08:50:09.6233333+00:00

@Gunjan Kanani Seems like an error in processing the file passed from the earlier component.

expected: "parquet|preppy|delimited", actual: "dfd"

Do you think the file is of correct format?
Farley, Andrew 0 Reputation points

2024-11-28T16:23:30.8633333+00:00

I am experiencing the same error on a pipeline that worked a month ago. The data itself is in the correct format, there has clearly been an update recently that messed up how data is loaded into designer. Additionally, my error happens on the "Select Columns in Dataset" component right after loading in the data. I don't even have custom scripts and I am getting the same issue.
Ash007 0 Reputation points

2024-11-28T22:29:49.5+00:00

@romungi-MSFT This is happening on pipelines which were working well before. I did not change any format or any setting. I just reran the pipeline and I'm running into this exception.

Answer 1

Gunjan Kanani 45

@Ash007 @romungi-MSFT

We had a call with the Microsoft Support team, where they attempted to resolve the issue, but the pipelines are still failing to run.

These pipelines were running successfully until November 27th, after which they suddenly began encountering the same error across all pipelines created in Azure ML.

The Microsoft team also confirmed that their internal teams made changes to the Azure ML code, after that users are facing this kind of issues.

I will provide further updates once we hear back from them.

Phillip A Danley 20 Reputation points

2024-12-03T03:22:13.7166667+00:00

Good evening. I have been seeking assistance on this too. Has there been any update from the Microsoft team? Thank you!
romungi-MSFT 48,771 Reputation points Microsoft Employee

2024-12-04T07:24:24.1366667+00:00

@Gunjan Kanani Thanks for the update. I see this is still an issue and the team has currently identified the cause and working towards fixing it. The current workaround is to use Enter Data Manually component instead of any dataset. I understand this might not work for all users but until a fix is rolled out, you can use this workaround.
Gunjan Kanani 45 Reputation points

2024-12-04T09:12:55.1533333+00:00

Hello @Phillip A Danley
We had a call with them, and after that, I posted the update here. However, they are still working on resolving this issue.
Gunjan Kanani 45 Reputation points

2024-12-04T09:47:03.1666667+00:00

@romungi-MSFT Thank you for the workaround. However, our data is large and dynamic, changing daily as we fetch records from APIs into SQL tables on a daily basis, which are then used as inputs in Azure ML. Due to this, we are unable to use this. I do appreciate your effort and will wait for the team to resolve the issue.
Phillip A Danley 20 Reputation points

2024-12-04T21:10:26.1766667+00:00

Thank you for the update Gunjan and romungi. I'll have to wait for the resolution as well due to dataset sizes.

Answer 2

Gunjan Kanani 45

Hello everyone, @Ash007 , @Phillip A Danley

This is to inform you that the Microsoft support team has provided an ETA of December 13, 2024, to roll back the changes in Azure ML.

After the that, we will be able to execute our pipelines as before.

Danish Ashfaq 5 Reputation points

2024-12-16T07:18:32.3033333+00:00

Has the rollback been completed? Where can we verify the status of the rollback?
Jorge A. Campos 0 Reputation points

2024-12-16T07:32:09.5+00:00

help me @Gunjan Kanani pls, still I have error:

AzureMLException: Message: Error Code: ScriptExecution.StreamAccess.Unexpected Native Error: error in streaming from input data sources StreamError(Unknown("Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid.", Some(DataflowInvalid("inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "delimited|parquet|preppy", actual: "dfd" }))))))) => Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid. Unknown("Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid.", Some(DataflowInvalid("inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "delimited|parquet|preppy", actual: "dfd" })))))) => Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid. DataflowInvalid("inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "delimited|parquet|preppy", actual: "dfd" })))) Error Message: Got unexpected error: Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid.. DataflowInvalid("inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "delimited|parquet|preppy", actual: "dfd" }))))| session_id=393a9d2c-ce50-4067-aabc-0a19214e300c InnerException None ErrorResponse { "error": { "message": "\nError Code: ScriptExecution.StreamAccess.Unexpected\nNative Error: error in streaming from input data sources\n\tStreamError(Unknown("Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid.", Some(DataflowInvalid("inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "delimited|parquet|preppy", actual: "dfd" })))))))\n=> Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid.\n\tUnknown("Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid.", Some(DataflowInvalid("inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "delimited|parquet|preppy", actual: "dfd" }))))))\n=> Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid.\n\tDataflowInvalid("inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer", expected: "delimited|parquet|preppy", actual: "dfd" }))))\nError Message: Got unexpected error: Dataflow at inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c is not valid.. DataflowInvalid("inmemory://dataflow/ada6eb1f10ef08bd645f7151b416c05c", VisitError(ExecutionError(ArgumentError(InvalidArgument { argument: "writer"
Gunjan Kanani 45 Reputation points

2024-12-16T10:53:10.5833333+00:00

Hi All @Danish Ashfaq , @Jorge A. Campos

We are also facing the same issue with the Azure ML pipeline execution, even after the rollback of the changes. I have already sent an email to the support team regarding this issue and am currently awaiting their response.
romungi-MSFT 48,771 Reputation points Microsoft Employee

2024-12-17T05:33:34.7866667+00:00

@Gunjan Kanani Jorge A. Campos Danish Ashfaq Phillip A Danley @Ash007

I have confirmation that all regions are now patched and this should work as expected. I have run a test with my workspace in eastus and it works as expected. Thanks!!

Could you please check and let me know? Thanks!!
Danish Ashfaq 5 Reputation points

2024-12-17T07:31:30.1733333+00:00

It is now working. Thank you all for providing updates.
Gunjan Kanani 45 Reputation points

2024-12-17T13:08:03.7966667+00:00

@romungi-MSFT

Thank you for the updates, finally our pipeline ran successfully today.
Gunjan Kanani 45 Reputation points

2024-12-17T13:10:49.1+00:00

@romungi-MSFT

Thank you for the updates, finally our pipeline ran successfully today.

Share via

Data flow is not valid, Argument writer expected parquet|preppy|delimited in Azure ML Pipeline

2 answers

Your answer