Load Data Directly from Azure Data Lake into Azure SQL Data Warehouse Using Polybase
Re-posted from the Azure blog.
NOTE: This post was updated on Tuesday February 14th 2017, including an update to the title.
Azure SQL Data Warehouse (Azure SQL DW, or just SQL DW for short) is a SQL-based fully managed, petabyte-scale data warehousing solution in the cloud. It is highly elastic, enabling you to provision in minutes and scale capacity in seconds. You can scale compute and storage independently, allowing you to burst compute for complex analytical workloads or scale down your warehouse for archival scenarios. What's more, you can pay by usage, rather than being locked into expensive predefined cluster configurations. PolyBase is a feature of SQL DW that allow users to connect to HDFS compatible file systems for import/export or in-place query scenarios using External Tables.
Azure Data Lake (ADL) is a no-limits data lake optimized for massively parallel processing, and it lets you store and analyze petabyte-size files and trillions of objects.
Both Azure SQL DW and ADL are key components of Microsoft's Cortana Intelligence Suite, which helps enterprises convert their big data into actionable business insights.
A common use case involving ADL Store (ADLS) and SQL DW is the following: Raw data is ingested into ADLS from a variety of sources. ADL Analytics (ADLA) is used to clean and process the data into a loading-ready format. From there, high value data is imported into Azure SQL DW for interactive analytics.
Until recently, the data in ADLS would be loaded into SQL DW using row-by-row insertion which, obviously, consumed time and meant delays in how quickly data could be explored to gain useful business insights.
However, as we recently announced, with SQL DW PolyBase support for ADLS, you can now load data directly from ADLS into your SQL DW instance using External Tables. Because SQL DW can now ingest data directly from Azure Storage Blob and ADLS, you can load data from any Azure storage service, giving you the flexibility to choose what's right for your application.
The picture below captures the "Before" and "After" situation.
Intrigued? Read this post to learn more, including how to connect ADLS to SQL DW, and best practices for loading data. Learn more about the new PolyBase capability here. You can also check out a short video clip on how to use this new feature:
If you already have an Azure Data Lake Store, you can try loading your data into SQL Data Warehouse. For those of you still exploring Azure Data Lake, check out these nice ADLS tutorials which will get you up and running.
CIML Blog Team