Issue with Manually Specifying Data Types During Data Import in Azure

NovaShi 0 Reputation points
2023-10-29T21:05:09.7+00:00

Hi,

I'm facing a problem while trying to manually set data types during data import in Azure MLops(Submit an Automated ML job). Specifically, in the step(Task type & data) I'm working with CSV files and attempting to prevent certain columns(0,1) from being incorrectly identified as decimal type(0.0,1.0).

Here's what I've tried:

In the CSV file, I specified data types in the first row, like "Column1 (string), Column2 (integer), ...". However, Azure seems to be disregarding this information.

During the data import process in Azure Data Factory, I manually set the data types for each column. Despite this, some columns are still being recognized as decimal, which is causing issues.

Additionally, I tried altering the example values, but I received an error message stating "unable to parse column value: original..".

Has anyone encountered a similar issue? Any insights or suggestions would be greatly appreciated. Thank you!

Azure Open Datasets
Azure Open Datasets
An Azure service that provides curated open data for machine learning workflows.
29 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,741 Reputation points
    2023-10-30T13:21:59.8666667+00:00

    Thanks for the question, You can try the below that can help.

    1. Specify Data Types in Code: Instead of specifying data types in the CSV file, you could try specifying them in your code when you load the data. For example, if you’re using pandas to load your data, you can specify data types for each column using the dtype parameter in the read_csv function.
    2. Check Your CSV Format: Make sure your CSV file is correctly formatted. Sometimes, incorrect formatting or special characters can cause issues with data type recognition.
    3. Use Azure Data Factory: In Azure Data Factory, all the default data type in CSV is String. You could set the data type converting in Mapping settings. Another way is that you could create a stored procedure to convert the CSV data, then call the stored procedure in Sink.
    4. Enter Data Manually: If you’re using Azure Machine Learning, you could try using the “Enter Data Manually” component to manually enter your data and specify the data types.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.