ADF - CSV file with more than 2000 columns, need to create multiple tables in the sqlserver using Azure Data factory

Question

I have csv file with more than 2000 columns and 20gb file size. Have to create multiple tables in sqlserver from the csv file using Azure Data factory.

here is the file columns

col1 col2 col3 col4 col5 col5 ......

i have to create tables with below columns from csv file. Need to split the columns and create multiple tables.

First Table Second Table Third Table

col1 col1 col1

col2 col2 col2

col3 col4 col5

i am trying to use dataflow activity from Microsoft ADF, really not sure how to split the file by columns(shown above) and create tables in sqlserver.

Any help would be greatly appreciated.

Thank you,

Anil

Accepted Answer

I would go for preprocessing your files first where you identify column grouping for each table and determine the primary key or identifier that links these tables.

In ADF, you create a Delimited Text Dataset for your CSV file and define the schema manually or import the first row as headers.

Then create a new Data Flow where you add a Source transformation and link it to the CSV dataset.

Add multiple Select transformations to extract different column sets for each table.

First Table: Select col1, col2, col3
Second Table: Select col1, col2, col4
Third Table: Select col1, col2, col5

Then you need a sink transformation for each table and configure the Azure SQL Database as the destination and don't forget to enable Auto Create Table or define the schema in SQL beforehand.

Share via

ADF - CSV file with more than 2000 columns, need to create multiple tables in the sqlserver using Azure Data factory

0 additional answers

Your answer