Creating external table by using for each loop and SQL procedure

Nimesha Hansani Amarasinghe 0 Reputation points
2025-01-30T10:50:29.7866667+00:00

Hi

I am completely new to azure synapse analytics. I am trying to create external tables using parquet files stored in the list of folders in a container (azure data lake storage)

I have created note book which return the sub folder name and the parquet file location as dictionary.Here this is the main function.

def main():
   
    file_dict = {}
    for sub_folder in SUB_FOLDERS:
       
        file_name= get_most_recent_parquet_file(sub_folder)
        recent_file_path = file_name

        # Add the subfolder and file name to the dictionary
        file_dict[sub_folder] = file_name

        
    # Return the dictionary as output
    mssparkutils.notebook.exit(file_dict)

I want to loop through this each item in the dictionary and run a sql script or procedure to create an external tables.

In the sql script key is used as the external table name and value is used as file location. I have connected the for loop activity to the notebook .

ImageForQuestions

Could someone guide me through the next steps and configurations?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,165 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 28,376 Reputation points
    2025-01-30T13:19:07+00:00

    Your notebook currently returns a dictionary. You need to modify it to loop through the dictionary and execute SQL commands to create external tables :

    from notebookutils import mssparkutils
    def main():
        file_dict = {}
        for sub_folder in SUB_FOLDERS:
            file_name = get_most_recent_parquet_file(sub_folder)
            recent_file_path = file_name
            file_dict[sub_folder] = recent_file_path
      
        for table_name, file_location in file_dict.items():
            create_external_table(table_name, file_location)
        mssparkutils.notebook.exit(file_dict)
    def create_external_table(table_name, file_location):
        
        sql_script = f"""
        CREATE EXTERNAL TABLE {table_name} (
            -- Define your schema here
            column1 STRING,
            column2 INT,
            ...
        )
        WITH (
            LOCATION = '{file_location}',
            DATA_SOURCE = your_data_source,  -- here you refer your external data source
            FILE_FORMAT = your_file_format   -- and here to your file format
        );
        """
        
        # Execute the SQL script
        mssparkutils.sql.execute(sql_script)
    
    SUB_FOLDERS = [...]  
    main()
    

    Just before running the notebook, make you have created an external data source and file format in your Azure Synapse SQL pool.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.