Migration to Azure

Anshal 2,251 Reputation points
2024-07-08T14:32:15.9266667+00:00

Hi friends, we have a requirement to Migrate Denodo to Azure, and I have these questions:

What could be the challenges we can anticipate and be aware of?

Does the data's size impact the migration's complexity and duration?

What tools are the best suited ADF or Databricks or a combination of both?

Any suggestions for smooth migration?

CDM suitability for Azure and Denodo? Should we consider it or do you suggest simply going for Synapse Datawarehouse as there are multiple reporting tools involved?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,485 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,997 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,843 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 26,186 Reputation points
    2024-07-08T19:17:42.9066667+00:00

    What could be the challenges we can anticipate and be aware of?

    Some common challenges include ensuring data compatibility between Denodo and Azure services, tuning and optimizing performance for queries and data processing, maintaining security and compliance standards (such as GDPR and HIPAA), managing network latency based on the geographic location of your Azure data center, controlling costs associated with storage, data transfer, and compute resources, and minimizing downtime during the migration to avoid business disruption.

    Does the data size impact the migration complexity and duration?

    Yes, the size of the data significantly impacts both the complexity and duration of the migration process. Larger datasets will naturally take longer to migrate, necessitating extended migration windows. Handling large volumes of data requires robust data management strategies, such as partitioning and incremental data migration, to ensure efficiency. Additionally, ensuring that the Azure infrastructure is appropriately scaled to handle the data load and that there is sufficient network bandwidth is crucial for performance.

    What tools are best suited for the migration: ADF, Databricks, or a combination of both?

    ADF is well-suited for ETL operations, orchestrating data workflows, and scheduling jobs. It provides built-in connectors for various data sources, including Denodo, making it ideal for simple to moderately complex data transformation tasks. On the other hand, Azure Databricks excels in big data processing, complex transformations, and machine learning workloads, offering a collaborative environment for data scientists and engineers. A combination of both tools could be beneficial: using ADF for orchestrating and scheduling the data pipeline and Databricks for complex data transformations, analytics, and machine learning tasks, providing a comprehensive and scalable solution.

    Any suggestions for a smooth migration?

    For a smooth migration, start with a thorough assessment of the current environment and plan the migration in phases. Conduct a small Proof of Concept (PoC) to test the migration strategy, identify potential issues, and optimize the process. Automate as much of the migration process as possible to reduce manual intervention and errors. Implement monitoring and logging to track the migration progress and troubleshoot issues in real-time. Additionally, providing adequate training to the team on Azure tools and services ensures a smooth transition.

    Regarding the suitability of the Common Data Model (CDM) for Azure and Denodo, should we consider it or simply go for Synapse Datawarehouse as there are multiple reporting tools involved? The CDM provides a standardized way to define and exchange data, making it suitable for integrating multiple data sources and providing a unified schema across different systems. However, if you require advanced analytics and reporting capabilities, Azure Synapse Analytics might be preferable. Synapse offers a powerful data warehouse solution that integrates with various reporting tools, supporting both structured and unstructured data and providing high-performance analytics and reporting.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Nehruji R 8,146 Reputation points Microsoft Vendor
    2024-07-09T08:07:30.9+00:00

    Hello Anshal,

    Greetings! Welcome to Microsoft Q&A Platform.

    Adding to above information, Azure Data Factory (ADF), Synapse pipelines, and Azure Databricks make a rock-solid combo for building your Lakehouse on Azure Data Lake Storage Gen2 (ADLS Gen2). ADF provides the capability to natively ingest data to the Azure cloud from over 100 different data sources. ADF also provides graphical data orchestration and monitoring capabilities that are easy to build, configure, deploy, and monitor in production. ADF has native integration with Azure Databricks via the Azure Databricks linked service and can execute notebooks, JARs, and Python code activities which enables organizations to build scalable data orchestration pipelines that ingest data from various data sources and curate that data in the Lakehouse.

    refer for more details - https://azure.microsoft.com/en-us/solutions/migration/migration-journey/?activetab=pivot:planningtab, https://techcommunity.microsoft.com/t5/analytics-on-azure-blog/azure-data-factory-and-azure-databricks-best-practices/ba-p/3074262.

    Azure Synapse Analytics is a powerful solution for data warehousing and analytics. It integrates seamlessly with various Azure services and supports multiple reporting tools, making it a robust choice for your data management needs.

    Change Data Capture (CDC) and ETL with ADF

    CDC: This technique allows you to capture and track changes in your data sources in real-time, ensuring that your data warehouse is always up to date.

    ETL with Azure Data Factory (ADF): ADF is well-suited for extracting, transforming, and loading data from various sources into Synapse. It provides a visual interface and supports a wide range of connectors, making it easier to manage your data pipelines.

    refer - https://learn.microsoft.com/en-us/azure/data-factory/connector-overview, https://learn.microsoft.com/en-us/azure/databricks/connect/.

    refer for architectural insights on Azure Synapse Analytics- https://learn.microsoft.com/en-us/azure/architecture/example-scenario/dataplate2e/data-platform-end-to-end?tabs=portal.

    Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.