Some meeting notes [Kit George]
All, thought I would post some notes from a recent meeting we had with the WebData team. This is in relation to a Tree class they have created internally in their namespaces, and what we should consider doing with it. The reference to 'our redblack' tree reflects a type you'll see in beta2.
Tree: created by WebData
- The webdata team has made a tree, which is currently non-public. They wanted to consider whether this should be public, and whether they could use a generic version. Gang explained our current position on SortedDictionary, our generic redblack tree
- Webdata asserted that our redblack tree does not satisfy their requirements
- They also stated that some of our collections don’t meet the requirements for particular scenarios for our users. For example, adding a million rows to a SortedList takes around 30 minutes, but a redblack tree takes around 30 seconds. This is huge
- End issue was: they believe their tree is good for general use, should we not consider having it in the BCL? One of the key things it does is that it has a key, value, and index
o We have not heard the mass of request for this, but as with all collections, we’ll keep our heads up looking for requests. If the number of requests becomes significant, we’ll consider adding this in a future version
NOTE: Redblack tree implementation underneath SortedList is all goodnesss for scalability and performance while operating with 100s of 1000s of rows but while operating in the lower end of the spectrum say less than 5000 rows (I don't have the exact perf data here), it would actually add to the overhead than any perf again compared to using simple list. What I'm getting at is there would be a break-even point before which using redblack tree is probbaly worse for your perf. Memory usage would be another point to consider as well.
- Things we can consider for the future:
o Do we have to handle duplicates in the class? Do we want to?
o GC friendliness
o Do we want to provide positioning (index)
- Goal for VNext: consider how we can provide a tree, or tree like structure(s) which could meet the needs of the webdata team
Other issues:
- Fix Dictionary, so that it is single writer safe. This means consumers have to stick locks everywhere to ensure the right thing happens (or, use Hashtable)
- WebData suggestion: Consider making a HybridDictionary<T>, a generic version of HybridDictionary
Comments
Anonymous
October 09, 2004
The comment has been removedAnonymous
October 10, 2004
Just a week ago we had a need for a collection with the key AND the index. So, please, count it as a request. After all, if there is an implementation that works faster than current one and supplies bigger functionality, i don't see any good reasons not to use it.Anonymous
October 11, 2004
+1Anonymous
October 11, 2004
The comment has been removedAnonymous
October 11, 2004
One point to add to Kit's post above:
Redblack tree implementation underneath SortedList is all goodnesss for scalability and performance while operating with 100s of 1000s of rows but while operating in the lower end of the spectrum say less than 5000 rows (I don't have the exact perf data here), it would add to the overhead and would actually fair poorly compared to using simple list. What I'm getting at is there would be a break-even point before which using redblack tree is probbaly worse for your perf. Memory usage would be another point to consider as well.Anonymous
October 11, 2004
Ravi, perhaps SortedList should change its underlying implementation at the break even point you speak of. Like HybridDictionary. That lowers the friction of having multiple classes for novice developers to choose from and provides the optimized performance for a larger percentage of scenarios.Anonymous
October 12, 2004
I'll vote for the red-black tree. There have been several cases where I could have used one of those.Anonymous
October 14, 2004
For those of you who want an implementation of R-B Trees now, you can do a lot worse than checking out Scott Mitchell's article on MSDN (part of his excellent Data Structures series):
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/datastructures_guide.asp
(It's part 4 if the URL doesn't take you straight there.)
I'd love to see a replacement or enhancement of the current Sort-related implementations as perf can be a problem with very large sets. I'd also love to see a decent reference string radix sort and suffix tree implementation, even if they were in a special namespace, but I realise that's reaching a bit ;)
Keep up the great work!Anonymous
May 31, 2009
PingBack from http://portablegreenhousesite.info/story.php?id=15784Anonymous
June 02, 2009
PingBack from http://portablegreenhousesite.info/story.php?id=25591