Version control (GitHub) in Azure Synapse

Anya Pasko 0 Reputation points
2025-02-04T15:04:24.2066667+00:00

Screenshot 2025-02-04 150327

Hello,

I am trying to integrate Azure Synapse analytics with my GitHub account, and while I managed to do that following this guide: https://learn.microsoft.com/en-us/azure/synapse-analytics/cicd/source-control#connect-with-github, I don't understand how I can see this data in the notebook (do I need to import is somehow, or ?).

Any help with this would be appreciated, thanks!

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,243 questions
GitHub Training
GitHub Training
GitHub: A web-based hosting service for software development and version control using Git. Acquired by Microsoft in 2018.Training: Instruction to develop new skills.
48 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ganesh Gurram 5,125 Reputation points Microsoft External Staff
    2025-02-04T19:28:43.33+00:00

    Hi @Anya Pasko

    Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.

    It appears you've successfully connected your Azure Synapse workspace to your GitHub repository, as evidenced by the screenshot showing the linked repository (ac-test) and branch (importing_data_branch). The presence of "Validate all," "Commit all," and "Publish" buttons further confirms this connection.

    To view and use data from your GitHub repository in an Azure Synapse notebook, follow these steps:

    Notebooks in GitHub - When you connect Synapse to GitHub, your notebook files (.ipynb files) are stored in your GitHub repository. You don't need to download them separately. They'll appear in the "Workspace" section of Synapse Studio. When you open a notebook, Synapse automatically pulls the latest version from GitHub.

    Data Files - If your notebook uses external data files (like CSV, JSON, etc.), the best practice is to store those data files in Azure Blob Storage or Azure Data Lake Storage. Synapse is designed to work very efficiently with data in these Azure storage services. You can then use code within your notebook (PySpark, SQL, etc.) to load the data from Azure Storage.

    GitHub for Version Control -The real power of the GitHub integration is version control. Any changes you make to your notebooks in Synapse can be committed and pushed back to your GitHub repository directly from Synapse Studio. This keeps a complete history of your work.

    By following these steps, you can integrate and utilize data from your GitHub repository within Azure Synapse, leveraging both the version control capabilities of GitHub and the powerful data processing features of Synapse.

    References:

    Hope this helps. Do let us know if you have any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.