How to: Query nested data in Microsoft Fabric mirrored databases from Azure Cosmos DB (Preview)
Use the mirrored database in Microsoft Fabric to query nested JSON data sourced from Azure Cosmos DB for NoSQL.
Important
Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts are supported.
Prerequisites
- An existing Azure Cosmos DB for NoSQL account.
- If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free.
- If you have an existing Azure subscription, create a new Azure Cosmos DB for NoSQL account.
- An existing Fabric capacity. If you don't have an existing capacity, start a Fabric trial.
- Enable Mirroring in your Fabric tenant or workspace. If the feature isn't already enabled, enable mirroring in your Fabric tenant.
- The Azure Cosmos DB for NoSQL account must be configured for Fabric mirroring. For more information, see account requirements.
Tip
During the public preview, it's recommended to use a test or development copy of your existing Azure Cosmos DB data that can be recovered quickly from a backup.
Create nested data within the source database
Create JSON items within your Azure Cosmos DB for NoSQL account that contain varying levels of nested JSON data.
Navigate to your Azure Cosmos DB account in the Azure portal.
Select Data Explorer from the resource menu.
Use + New container to create a new container. For this guide, name the container
TestC
. The corresponding database name is arbitrary.Use the + New item option multiple times to create and save these five JSON items.
{ "id": "123-abc-xyz", "name": "A 13", "country": "USA", "items": [ { "purchased": "11/23/2022", "order_id": "3432-2333-2234-3434", "item_description": "item1" }, { "purchased": "01/20/2023", "order_id": "3431-3454-1231-8080", "item_description": "item2" }, { "purchased": "02/20/2023", "order_id": "2322-2435-4354-2324", "item_description": "item3" } ] }
{ "id": "343-abc-def", "name": "B 22", "country": "USA", "items": [ { "purchased": "01/20/2023", "order_id": "2431-2322-1545-2322", "item_description": "book1" }, { "purchased": "01/21/2023", "order_id": "3498-3433-2322-2320", "item_description": "book2" }, { "purchased": "01/24/2023", "order_id": "9794-8858-7578-9899", "item_description": "book3" } ] }
{ "id": "232-abc-x43", "name": "C 13", "country": "USA", "items": [ { "purchased": "04/03/2023", "order_id": "9982-2322-4545-3546", "item_description": "clothing1" }, { "purchased": "05/20/2023", "order_id": "7989-9989-8688-3446", "item_description": "clothing2" }, { "purchased": "05/27/2023", "order_id": "9898-2322-1134-2322", "item_description": "clothing3" } ] }
{ "id": "677-abc-yuu", "name": "D 78", "country": "USA" }
{ "id": "979-abc-dfd", "name": "E 45", "country": "USA" }
Setup mirroring and prerequisites
Configure mirroring for the Azure Cosmos DB for NoSQL database. If you're unsure how to configure mirroring, refer to the configure mirrored database tutorial.
Navigate to the Fabric portal.
Create a new connection and mirrored database using your Azure Cosmos DB account's credentials.
Wait for replication to finish the initial snapshot of data.
Query basic nested data
Now, use the SQL analytics endpoint to create a query that can handle simple nested JSON data.
Navigate to the mirrored database in the Fabric portal.
Switch from Mirrored Azure Cosmos DB to SQL analytics endpoint.
Open the context menu for the test table and select New SQL Query.
Run this query to expand on the
items
array withOPENJSON
. This query usesOUTER APPLY
to include extra items that might not have an items array.SELECT t.name, t.id, t.country, P.purchased, P.order_id, P.item_description FROM OrdersDB_TestC AS t OUTER APPLY OPENJSON(t.items) WITH ( purchased datetime '$.purchased', order_id varchar(100) '$.order_id', item_description varchar(200) '$.item_description' ) as P
Tip
When choosing the data types in
OPENJSON
, usingvarchar(max)
for string types could worsen query performance. Instead, usevarchar(n)
whern
could be any number. The lowern
is, the more likely you will see better query performance.Use
CROSS APPLY
in the next query to only show items with anitems
array.SELECT t.name, t.id, t.country, P.purchased, P.order_id, P.item_description FROM OrdersDB_TestC as t CROSS APPLY OPENJSON(t.items) WITH ( purchased datetime '$.purchased', order_id varchar(100) '$.order_id', item_description varchar(200) '$.item_description' ) as P
Create deeply nested data
To build on this nested data example, let's add a deeply nested data example.
Navigate to your Azure Cosmos DB account in the Azure portal.
Select Data Explorer from the resource menu.
Use + New container to create a new container. For this guide, name the container
TestD
. The corresponding database name is arbitrary.Use the + New item option multiple times to create and Save this JSON item.
{ "id": "eadca09b-e618-4090-a25d-b424a26c2361", "entityType": "Package", "packages": [ { "packageid": "fiwewsb-f342-jofd-a231-c2321", "storageTemperature": "69", "highValue": true, "items": [ { "id": "1", "name": "Item1", "properties": { "weight": "2", "isFragile": "no" } }, { "id": "2", "name": "Item2", "properties": { "weight": "4", "isFragile": "yes" } } ] }, { "packageid": "d24343-dfdw-retd-x414-f34345", "storageTemperature": "78", "highValue": false, "items": [ { "id": "3", "name": "Item3", "properties": { "weight": "12", "isFragile": "no" } }, { "id": "4", "name": "Item4", "properties": { "weight": "12", "isFragile": "no" } } ] } ], "consignment": { "consignmentId": "ae21ebc2-8cfc-4566-bf07-b71cdfb37fb2", "customer": "Humongous Insurance", "deliveryDueDate": "2020-11-08T23:38:50.875258Z" } }
Query deeply nested data
Finally, create a T-SQL query that can find data deeply nested in a JSON string.
Open the context menu for the
TestD
table and select New SQL Query again.Run this query to expand all levels of nested data using
OUTER APPLY
with consignment.SELECT P.id, R.packageId, R.storageTemperature, R.highValue, G.id, G.name, H.weight, H.isFragile, Q.consignmentId, Q.customer, Q.deliveryDueDate FROM OrdersDB_TestD as P CROSS APPLY OPENJSON(P.packages) WITH ( packageId varchar(100) '$.packageid', storageTemperature INT '$.storageTemperature', highValue varchar(100) '$.highValue', items nvarchar(MAX) AS JSON ) as R OUTER APPLY OPENJSON (R.items) WITH ( id varchar(100) '$.id', name varchar(100) '$.name', properties nvarchar(MAX) as JSON ) as G OUTER APPLY OPENJSON(G.properties) WITH ( weight INT '$.weight', isFragile varchar(100) '$.isFragile' ) as H OUTER APPLY OPENJSON(P.consignment) WITH ( consignmentId varchar(200) '$.consignmentId', customer varchar(100) '$.customer', deliveryDueDate Date '$.deliveryDueDate' ) as Q
Note
When expanding
packages
,items
is represented as JSON, which can optionally expand. Theitems
property has sub-properties as JSOn which also can optionally expand.Finally, run a query that chooses when to expand specific levels of nesting.
SELECT P.id, R.packageId, R.storageTemperature, R.highValue, R.items, Q.consignmentId, Q.customer, Q.deliveryDueDate FROM OrdersDB_TestD as P CROSS APPLY OPENJSON(P.packages) WITH ( packageId varchar(100) '$.packageid', storageTemperature INT '$.storageTemperature', highValue varchar(100) '$.highValue', items nvarchar(MAX) AS JSON ) as R OUTER APPLY OPENJSON(P.consignment) WITH ( consignmentId varchar(200) '$.consignmentId', customer varchar(100) '$.customer', deliveryDueDate Date '$.deliveryDueDate' ) as Q
Note
Property limits for nested levels are not enforced in this T-SQL query experience.