Grokking Report Builder Queries: This is NOT your mother's SQL!
IT People and Report Builder
The natural inclination when IT people see and start to use Report Builder is that they immediately try think of the entire experience in terms of the underlying SQL statements -- in particular the type of SQL statements they are used to writing. Most immediately start to guess what RB's generated SQL statements look like, and many start to conjure up ways they can manipulate RB and/or the underlying report models into generating the SQL statements they think they want. This is a bad thing.
The Report Builder user interface and underlying report models were designed for people who haven't the foggiest idea what "SQL" stands for, and wouldn't know a SQL statement if they saw it on the six o'clock news. To accommodate this, the SQL queries generated are rather non-standard in a number of ways. Accordingly, all the technical reporting knowledge that an IT person brings with them to Report Builder largely gets in the way of their understanding it, and presumably reduces their effectiveness in helping others get up to speed with its more advanced capabilities.
The Inside Scoop
The two most important concepts to realize about RB queries are: (a) filters and selection are totally independent, and (b) row identity is totally inviolate.
Filters and field selection are independent in that selecting a particular field to display in your report, even if that field is defined on a related table, will never reduce the set of rows you get back in your report. In short, RB queries generally use LEFT OUTER joins instead of INNER joins, to ensure that joining to another table to pull a field from it will never remove any rows from the result set. If the user wants rows to go away for any reason (including rows that would normally go away due to an INNER join), they must create an explicit filter to do so.
Row identity is held inviolate in that no field may be added to a group that would change the identity of the rows in that group. For example, Order Date may not be added to a Customer group, because to display all the Order Dates for each customer, the group would have to return one row per Customer-Order, not one row per Customer. In contrast, the user may add #Orders to a Customer group, because #Orders is an aggregate that can return a single value for each Customer. The effect this rigorous constraint on row identity has on the generated SQL is that fields from related tables are (almost) always pulled into the main query by joining to a nested SELECT statement. The primary responsibility of the nested SELECT statement is to internally group and aggregate its results until it arrives at the cardinality required by the containing SELECT statement. Hence, the joins never affect the "row identity" of the main result.
Why We Do It That Way
The cold, hard, truth is that most human beings know nothing about SQL. They never think of their business questions in terms of joins, group by's, having clauses, foreign keys, and optimizer hints. In fact, they generally don't think of their questions as a single "query" or "statement" any more than they would describe the minutiae of an entire day at work in a single sentence. To them, the most natural analogy for query design (and you might want to use this the next time you explain Report Builder to someone) is the manual process that would be required to obtain the same information, i.e. sitting down in front of a big file cabinet with a pad of paper and a calculator, and methodically going through each desired set of files, gathering information, calculating totals, and recording the results. Report Builder's query design experience, and by necessity the resulting SQL queries, have been modeled precisely after this kind of approach. In fact, you may rightly consider each individual field in an RB report to be its own independent query, because in fact it is . Now of course, RB has optimizations in query generation that try to reduce the number of joins used to obtain the information the user wants, but the base case or initial assumption is that every single field is its own query.
Once an IT person groks these essential truths, a whole new world of (IMO) amazingly cool functionality will open up to them as they drop all that SQL baggage and start to think about their queries the way regular people do. Sure, there are limitations: RB can't do some things in this release, and there are occasionally perfomance implications, but by and large it is a fantastically intuitive way to think about real business questions, and it really doesn't perform that badly, considering that the users don't give one iota of thought to their join strategy.
Comments
- Anonymous
March 21, 2006
Bob,
Really appreciate both this post and the 'philosophy behind...' one. Initially I was not a big fan of RB's i/f but on reading the 'intention' behind it, I now 'get it' (at least i hope i do :D )and will be much better placed to give a simple explanation to my users (and tech guys).
many thanks! - Anonymous
April 25, 2006
Bob,
This might not be the best thread for this post, but it is connected by the nested query concept...
In Report Builder, it appears as though the aggregation level (or GROUP BY clause) is determined automatically. I am assuming that this is determined by the grouping in the report, but I haven't found a way to arrange the groups in such a way that I can accomplish what I want.
Some reporting tools allow for calculations like SUM(Orders for Customer) or SUM(Orders for Report). This allows someone to include summary aggregations as detail rows. I ran into this problem as I was trying to create a '% of Total' field.
There might already be a way to do this, but I haven't found it yet. In SQL, I would do this by joining to a nested query, and if every single field is its own query, then this ability should fit in well. - Anonymous
June 13, 2006
Trained a number of users from around Australia last week on Report builder. The number one issue that emerged was that table joins do exhibit inner join behaviour and how to get around it. Simply put if a service provider (provider table) belongs to an organisation (organisation) and I put a report together that shows provider name and organisation name then only those providers who have an organisation are listed. Take away the organisation and all the providers are listed again. Be good for this to work in the next release. - Anonymous
October 30, 2006
Not sure if this is due to the philosophy of keeping it simple, but youre restricted from doing stuff that may be stupid ie results in cross joins etc. but in some cases may be required in a report.e.g.A-<BA-<Ccan only include fields from A and B not A, B and C. Our users would tend to do this but ensure with filters only a current record was returned from table C. This could be an issue for some users especially users familiar with crystal, eureka etc. I dont think that its satisfactory to have counts of records in table C to drill into as the users run reports to print out and take with them which has all the information they require. - Anonymous
April 16, 2007
I would like to know if it would be possible to create a report that uses the "having" condition to do something like this from the report builder using the entities...Let's say I have a customer table and orders table (that has the list of orders placed by the customers along with the amt for each order, order date etc). The corresponding entities have been created in the report model.I would like to retrieve the list of customers who have placed orders above a certain limit over a calendar year. I tried doing it and was able to do the group by year and get the sum of the order amt for all the customers. But when I use the sum field in the filter criteria to get the list of customers whose order amts over a year is greater than a value, I get an empty set of results. Apparently, the query is not doing the "haivng" condition. Instead it is applying the filter against the field and not against the sum field...Is there a way to achieve this result using the report builder?