Kusto Code Hygiene and Performance Tips
Now that you've started authoring your first Kusto queries in Azure Data Explorer, there are a few things you can do to make your life a lot easier going forward.
In terms of performance, there are three things that I recommend to new users:
- Use a where clause as soon as you can and always filter by timestamps first. Reducing your data size is the usually a major key to good performance and Azure Data Explorer is very good at filtering by timestamps.
- Project only the columns that you need. We sometimes work with datasets that have hundreds of columns and we only need a few of them. Getting rid of extra columns can not only help performance but it makes debugging a lot easier.
- When you perform a join, the big dataset goes on the right side. If the two datasets are close to the same size, it doesn't make a big difference, but if you have 10 rows on the left and a million on the right, play with doing the join both ways and notice the perf difference.
The next set of tips are more subjective. Nothing technically bad will happen if you don't follow them, but I encourage you to agree to some guidelines with your team to make code reviews and maintenance easier.
- Let statements are an important tool for organizing your code. It lets you bind a series of expressions to a single name. You might be able to write your query in 20 lines with no let statements, but it is probably easier for others to follow if you break it up into smaller chunks and assign it to a variable. These let statements also give you a great way to debug your query because you can quickly verify that each small chunk is working as expected.
- If you need to use the same logic in more than one place, create a function. Functions can optionally take parameters.
- Create a simple style guidelines document. Style guidelines sometimes carry a negative connotation but these are just the min-bar things we need to do to get work done in a collaborative environment.
Indent code in a way that is readable:
- Start let statement queries on a new line indented
- Each pipe should be placed on a new line with the same indentation as the previous line except for pipes that are part of join queries, which should be indented from the left side query
- For project/extend commands with many fields, consider placing each field on its own indented line
Place semi-colons on new lines
For functions, place commented out parameter values as let statements at top of query with documentation
Style guidelines example:.create-or-alter function with (docstring = "Best practices for kusto queries", folder = "Style")
['MyFunction'](paramScanStartUTC:datetime, paramScanEndUTC:datetime){
// The start time of the events we'd like to gather
// let paramStartTime=ago(20m);
//
// The end time of the events we'd like to gather
// let paramEndTime=ago(10m);
let Query1 =
// It's nice for readability to always start queries on their own line, instead of
// part of the 'let' line so that subsequent let statements have all the query text line up
cluster('my-cluster').database('my-database').Table
| where * contains ""
// Semi-colon should be place on its own line to help space let statements apart, but maintain
// all the query code in the same text block for easy f5 execution
;
let Query2 =
cluster('my-cluster').database('my-database').Table2
| where * contains ""
| join kind= leftouter (
// join queries should be indented relative to the outer query
cluster('my-cluster').database('my-database').Table3
| project Key1, Value1
) on Key1
;
Query1
| union Query2
}
- Both the Kusto Explorer desktop tool and the web interface, have convenient ways to share queries. In the desktop client, look for the "Query to Clipboard" and in the web UI, it's under "Share". These buttons put HTML strings in your clipboard that contain nice code highlighting and links to open the query in various environments.
Keep calm and Kusto on!