Best data architecture strategy in Azure Data Explorer

Marcos Olmedilla 0 Reputation points
2024-09-06T12:09:54.3133333+00:00

Hey everyone,

I have some questions regarding the best pracetices for the data architecture within Azure Data Explorer.

We currently have multiple databases with similar telemetry data in tables from multiple clients. We would like to do some transformations on this data and then have it ready for reporting purposes:

Regarding the union and transformation of the data, what makes more sense for performance purposes: unifying it first and then transforming it, or the opposite? Also, does it make more sense to have a materialized view of the aggregation or are there any situations where having a new table with the aggregated data is better?

What kinds of overhead should I be worried about when doing these transformations?

Finally, when querying this data from powerBI, what would be the advantages of Direct Query versus Importing the data?

Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
522 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sander van de Velde | MVP 32,086 Reputation points MVP
    2024-09-07T10:57:50.32+00:00

    Hello @Marcos Olmedilla,

    welcome to this moderated Azure community forum.

    The Kusto Query Language works like a funnel. You start with many rows and and with an aggregated, filter subset at the end.

    So, filtering (especially on time ranges) first Will speed up your queries definitely.

    Having the data in multiple databases (within the same cluster) will not have a large impact on performance, compared with having all the data in the same database. Data is stored in the underlying storage per table and the same compute cluster is in action in both cases.

    Putting custom aggregations in a separate table (like making use of the bronze, silver, gold architecture) will speed up your queries (and less impact on CPU usage) in the same way.

    Materialized views are views with some kind of aggregation (like a summarize) that is updated constantly. This takes some compute (overhead) during the day but it speeds up your queries that make use of these views. Check out the performance considerations.

    As with any query language, it takes practice to optimize queries in the most optimal way. Check this documentation for guidance.

    Regarding PowerBI, we prefer to both make use of the compute of the ADX cluster and leaving the data as much as possible in ADX instead of making a copy.

    Using the internal ADX data flows made available by ADX Table update policies, we can turn raw 'bronze' data into 'gold' optimized data for consumption by eg. Power BI.


    If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.