Design Notes: Row Count

One of the scenarios we have heard feedback around is the ability for the a client of a data service to determine the total number of entities in a set without having to retrieve them all. The following video discusses this scenario in more detail and some of our thoughts regarding how to make the experience better. 

 

Astoria Design Walkthrough: Introducing $count as an URI query option

 

Additional design details:

- The count value would be calculated after $filter expressions are applied.  For example /Customers?$filter=City eq 'London' would return a count value made up of only the customers living in London and not all the customers.

- If a query included $top, $skip or $orderby, the count value would be unaffected.  For example /Customers?$top=1&$skip=2&$orderby=Name would return the total number of customers stored by the service.

- If a query includes a $expand, then the count applies to the outer entity set.  For example /Customers?$expand=Orders would return the number of customers stored by the service

 

What do you think?  Would this be useful in your data services applications?

 

Mike Flasko

ADO.NET Data Services, Program Manager

 

This post is part of the transparent design exercise in the Astoria Team. To understand how it works and how your feedback will be used please look at this post.

Comments

  • Anonymous
    December 15, 2008
    PingBack from http://blog.a-foton.ru/index.php/2008/12/16/design-notes-row-count/
  • Anonymous
    December 15, 2008
    Adding the $count would be helpful. Its not needed all the time in my experiences, but when it is, its nice to be able to do a quick query without all the baggage just to get a count. I had been doing custom service methods to get the counts, which is a lot of code overhead for something simple, so a $count would be nice.
  • Anonymous
    December 15, 2008
    I also think your plan is good and that $count will be very welcome.
  • Anonymous
    December 15, 2008
    The comment has been removed
  • Anonymous
    December 17, 2008
    The comment has been removed
  • Anonymous
    December 17, 2008
    The comment has been removed
  • Anonymous
    December 17, 2008
    The comment has been removed
  • Anonymous
    December 18, 2008
    The comment has been removed
  • Anonymous
    December 18, 2008
    It's great to see this post!$count=true or omission/$count=false would be a perfectly acceptable syntax to me for notifying a DS of our intentions.$filter precedence is how I would expect also... So long as the number of records actually being returned equals the record count, then it's proper. $top/$skip are irrelevant when it comes to record counts so that, too, is peachy.  $expand behavior is ideal.How about some more info on how you guys were hoping to implement things with the client libraries?
  • Anonymous
    December 18, 2008
    The ExtensionDataAttribute could be used to mark a property so the client framework knows when use it.again, i feel thart going that length to give count on query results is a bit over the top.just give the http header thing and the count.that will solve the problem and still make things simple and if people want to do harder stuff use something like their own intercepting of the entity writing on the server side and injecting XElements into the feed before it is sent. But they will need something to allow them to get the value on the client side.
  • Anonymous
    December 18, 2008
    dont need $count=boolean just $count as having there or not having there is already a boolean.to be able to get the scalar value of a count like thehttp://server/context.svc/clients('1')/name/$valuethat return just "daniel"or http://server/context.svc/clients('1')/namewill return <name xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices">daniel</name>will be fine.but filtering needs count and bad.
  • Anonymous
    December 18, 2008
    To answer the question about "where to put the count": if you only ask for the count we'd give you just that value. However, often times applications will want to render the first page of data and it would be great (from the latency perspective) to obtain the first page of data and the count in the same query.The plan is to include the count as part of the feed itself. Atom feed elements -just like Atom entries- allow custom elements to be added (in a separate XML namespace), so we have the perfect spot to transport this information to the client. The JSON case is trickier, but we'll manage.Note that it's pretty hard to do this today with a service operation, because it means you'd have to fully parse the expression language that goes into $filter in order to compute the count of the appropriate subset.As for the client API...things can always change, but current thinking is that we'll add both support for the LINQ .Count() operator (to obtain only the count) and also a separate .IncludeCount() "chainable" operator to request for the count and the results, something like this:from c in svc.Customers.IncludeCount()where c.State == "WA"select c;Then on the client we'd surface the count in the QueryOperationResponse object we already have. It's not completely nice (because you have to cast the reference to it), but it's pretty clean.-pablo
  • Anonymous
    December 18, 2008
    The count alone is useful in situations such as search fields that show number of results for a given query and the likes, but I do imagine the majority of interaction will be done for use in communicating the number of pages to the user.I was anticipating, worst case, having to do something like $top=1&$count and ignoring the data returned and just taking the count, but it sounds like you guys have things covered.  .Count and .IncludeCount sound great to me.Are there any particular areas (beyond what has been discussed already) that you guys are fishing for feedback in?
  • Anonymous
    December 18, 2008
    Actually, I just remembered something that needs to be adjusted from my comment above:We can't really map to LINQ's .Count(), because Astoria's count ignores top/skip (by design, the point is to know the total number of entities regardless of paging directives), so we mapped it to a custom method...can't remember the name, probably a bad name, but distinct from Count().@Nick: I think we got good feedback on this one...but there will be more questions to come on other areas we're designing, so keep it coming!Thanks,-pablo
  • Anonymous
    December 20, 2008
    Great news.  Seems exactly what we need.
  • Anonymous
    December 21, 2008
    Right Pablo, that is cool, so linq count will only return the total number of records on the returned query, that is fine as expected.I think you meant on yur linq example to so something like this.(from c in svc.Customers.IncludeCount()where c.State == "WA"select c).Skip(10).Take(10);Then the include count would make sense.Would you add support to allow the include of count on expansions too?I understand your willingness to give in some form the total count of records on available set of data all in one call, for Latency sake, but would that break the  "query" design? After all isn't it the mantra that if the consumers of the service wanted to do something special that the system was simple enough that they could build anything on to it?daniel Portella
  • Anonymous
    December 21, 2008
    Perhaps giving some way that consumers can get extra stuff would be better. Like PagedQuery<T>. that requires Skip and Top and returns total available records. all then you need to so is think about how the query translates to the client.In the end of the day we are always hitting the database twice, just not sure if advocating how people should do things is the right way of going about it.daniel portella
  • Anonymous
    December 22, 2008
    They way you have spoken is that this is written into stone already. I am sure include count wont be available on expands silly thing to have ^^.Just thought of something:[quote]The count value would be calculated after $filter expressions are applied[/quote]But would we be able to use count on filters? like one of my examples above?.daniel portella
  • Anonymous
    December 22, 2008
    The comment has been removed
  • Anonymous
    December 22, 2008
    @pablo: On my last question I was thinking about Counts nested inside filter expressions if counts are only applied after filter are applied doesn't mean that only the top entity count would be returned?I think that just for latency sake is not good enough reason to do all that work for that. If people are getting 4000 milliseconds of latency don't you think there would be something wrong with the application first? I think returning count would be last thing I would be worrying about.About other domain specific stuff and things, that is fine it is good that we are catering to them and all I think that is a must, but also mostly (I suppose) of the user base of data services uses the entity framework, which doesnt really support this scenarios properly see below:Using multiple active result sets in stored procedures would be a good case where "metadata" could be returned about a query. On these cases then you would like to return all the data.the entity framework only allow actual entities or scalar variables to be returned by the stored procedure, and to make it worse the entity framework doesnt allow entities that dont actually map to areal table of view. Hopefully that will be fixed in together with computed variables.As far as I know part of that is also the case for astoria, as it only allow entities to be returned that are contained on the data source (entity data context).I would rather see astoria being able to return non data source complex objects (that i can build up onto it) then build options and capabilities may not fit well on the current designing blocks.I just like something more intuitive, anyhow, the PagedQuery class could have much more then just a boolean property to set a flag for the IncludeCount, It could manage the whole paging aspect. But again I see now that something like that would not be added..daniel portella
  • Anonymous
    December 22, 2008
    The comment has been removed
  • Anonymous
    December 23, 2008
    You got to love the '@' very impersonal. I still find it funny.@Pablo:Forget about the "nesting" question I have got my answer else where.I know the latency can be a big thing for other people I am not discrediting that, I think you have mis understood me, I am saying that the astoria team can come up with a much better solution to this problem.Complex types: I hope that version 2 will have support for them. At the moment is pain like you have said. I haven't looked at all the code for astoria yet (hopefully I will finish in this holidays) I don't know how difficult it would be to de-serialize an strongly typed object that is returned on a service operation. I haven't read the "System.Data.Services.Serializers.SyndicationSerializer" class yet so I am not sure if it doesn't serialize complex types that are not part of the context model.You could hit the database in many different ways like you have said before and you are absolutely right to say that is is matter of balance, I think the solution can be more simple and elegant than this. My example using multiple active results sets was a hypothetical example and not a solution to how Astoria is going to implement their internal retrieval of the count of total available records, so whether is appropriate for this or not it is irrelevant otherwise to do that in the very essence should be wrong, in both parts.The feeling we are getting is that the decision has already been made and that may be the design notes posted was more of a medium to broadcast the fact that it will be implemented, don't want to think that the protocol is being undermined but I think this put this process in a dangerous position that what is said is in vain and that may be it more a method of lifting ideas into the design process and not making any real significant impact.The forum is open to discussion I asking if astoria I whiling to show the effort to find solutions with the input and help of their community. This transparent development process seems more like the name implies that a barrier (transparent as it is) is present and it is a transparent sound proof wall.Apologies for my pessimism I just say as I see it,No harm meant.daniel portella
  • Anonymous
    March 01, 2009
    It's been a little while since we've written about the future direction of data services, so this post
  • Anonymous
    March 13, 2009
    Count is essential of course.  Total table count as well as after a filter is applied.  Also important is count as an aggregate function, to count the number of grouped rows: "select x, y, count(*) from z group by x, y".  And when talking about aggregates, functions other than count are also very important for analysis of all that data.  Are these supported, or are there plans for this, through Astoria and in SDS specifically?
  • Anonymous
    March 13, 2009
    When doing paging, especially with SDS, I imagine some of the datasets will be changing quite a bit.  If I'm viewing page 1000 of many, and hundreds of rows are added or removed at lower page counts by the time I go to the next or previous page, it seems I'm going to have a problem with data page continuity.For these circumstances, it would be very helpful to have a paging mechanism that can operate relative to a specific row, such as the first row in a currently-reitreved page, or the next existing one in the case that row has been removed in the meantime.
  • Anonymous
    April 06, 2009
    For you guys still looking for a work arround for paging I can provide you with a very simple solution:(for the example I use Northwind):In the Web Data Service class add something like[WebGet]public int SelectOrdersCount(){ return this.CurrentDataSource.Orders.Count();}In the client you could use:proxy.CreateQuery<int>("SelectOrdersCount").ToArray().FirstOrDefault()This way you don't have to return all the results at client side and you could use a query using Skip(), Take() for paging purposes.
  • Anonymous
    April 06, 2009
    @dvanderboom: we are not planning to support aggregates in general in the upcoming version. SDS itself will support full relational queries using SQL, so you'll be able to do aggregates there, and you can chose to expose them as a custom type if you happen to be using Astoria as the service front-end.Regarding volatility of data and the implications of paging: stateless paging has this as an intrinsic problem. Note that we do actually do server-driven paging by exchanging a continuation token that is value-based. You can do the same thing from the client by ensuring a known full order and then use the order keys in a $filter clause such that the next page has rows that are "greater than" the current page, and then use $top to limit the number of results. This is also typically better in terms of performance if the underlying database happens to have the right indexes.In any case, note that this does not mean that you'll have no holes or no repeats.-pablo