Why are NameValueCollection lookups slower than Hashtable? [Kim Hamilton]
An internal discussion came up recently on the performance difference of lookups in Hashtable versus NameValueCollection. Benchmarks revealed that NVC lookups were ~2-8 times slower than Hashtable.
For example, when doing 40,000 lookups on a collection size of 100,000, NameValueCollection is about 2.6x worse:
NameValueCollection time: | 0.050 sec |
Hashtable time: | 0.019 sec |
This leads to the question of whether NameValueCollection and Hashtable have different asymptotic behavior or if it simply has additional overhead. By repeating the benchmark over increasing collection sizes, it’s clear that the asymptotic behavior is the same for lookups.
n = 100,000 | |
NameValueCollection time: | 0.050 sec |
Hashtable time: | 0.019 sec |
n = 200,000 | |
NameValueCollection time: | 0.051 sec |
Hashtable time: | 0.019 sec |
n = 400,000 | |
NameValueCollection time: | 0.050 sec |
Hashtable time: | 0.019 sec |
This confirms what we expected – that lookup cost is constant for both. But what causes the additional overhead for NameValueCollection? NameValueCollection allows a key to be associated with one or more values. So the additional cost is caused by the way lookups are performed internally: NameValueCollection actually delegates the hash key lookups to an internal Hashtable, which may contain multiple entries associated with that key.
If there are multiple entries associated with the key, it will return them appended together. So in fact, if you’ve assigned multiple entries to the same key (the trials above did not), then the lookup cost will be linear in the number of items you’ve assigned to the key, because the accessor will append them together. This is demonstrated in the following NameValueCollection.
NameValueCollection c = new NameValueCollection();
c.Add("dogs", "Bubba");
c.Add("dogs", "Mojo");
Console.WriteLine(c["dogs"]);
// returns Bubba,Mojo
But back to the original question – even if multiple values aren’t assigned to the same key, the additional bookkeeping in place causes the additional constant overhead seen above.
Let’s move on to some other interesting performance aspects of NameValueCollection. NameValueCollection allows items to be looked up by index as well. Expanding the above example, you can see how “Bob” can be looked up by key or by index.
NameValueCollection c = new NameValueCollection();
c.Add("dogs", "Bubba");
c.Add("dogs", "Mojo");
Console.WriteLine(c["dogs"]);
// returns Bubba,Mojo
c.Add("cats", "Bob");
Console.WriteLine(c["cats"]);
// returns Bob
Console.WriteLine(c[1]);
// returns Bob
The internal bookkeeping allowing index-based lookups causes even more noticeable performance differences in adds and removes. Adding is again constant, but with a slightly larger overhead than lookups, caused by the fact that the item is also being added to an array that allows lookup by index. The following shows 6,000 iterations of adds performed on collections of varying starting size.
n = 10,000 | |
NameValueCollection time: | 0.017 sec |
Hashtable time: | 0.002 sec |
n = 20,000 | |
NameValueCollection time: | 0.018 sec |
Hashtable time: | 0.002 sec |
n = 40,000 | |
NameValueCollection time: | 0.017 sec |
Hashtable time: | 0.002 sec |
In this test, NameValueCollection is performing about 8.5x worse for adds.
The difference is most apparent on removes. For NameValueCollection, the cost of removes is linear in the size of the collection, caused by the need to shift the index-based lookup array. Note that this means NameValueCollection and Hashtable have different asymptotic behavior for removes.
n = 10,000 | |
NameValueCollection time: | 8.699 sec |
Hashtable time: | 0.002 sec |
n = 20,000 | |
NameValueCollection time: | 24.657 sec |
Hashtable time: | 0.002 sec |
n = 40,000 | |
NameValueCollection time: | 39.541 sec |
Hashtable time: | 0.002 sec |
So you may be wondering when you’d want to use NameValueCollection. NameValueCollection only accepts keys and values that are Strings, so this is a very specialized collection. It’s useful in a situation in which you either need to associate multiple values with a key, or to do hash-based lookups as well as lookup by index (and hopefully not perform too many removes).
However, if you need to store string key/value pairs and you don’t need to perform index-based lookups or associate multiple values with a key, you may prefer to use the generic Dictionary class. This has the same asymptotic behavior as Hashtable in all cases and furthermore avoids any costs due to boxing.
Credits go to Anthony Moore and Joe Huffman for their input into this internal thread.
Comments
Anonymous
September 05, 2006
NameValueCollection also has a hideous API, probably one of the worst APIs in the .NET framework, IMHO. Every time I'm forced to use it I shudder, and wish the writer of the class in question (i.e. the class that uses NameValueCollection) had used some other method, such as descending from CollectionBase or Collection<T> and adding appropriate extra methods.Anonymous
September 05, 2006
We had a good discussion on the NameValueCollection internally not too long ago and I asked my friends...Anonymous
June 13, 2007
Last update: June 13 , 2007 Document version 0.6 Preface If you have something to add, or want to takeAnonymous
June 13, 2007
Last update: June 13 , 2007 Document version 0.6 Preface If you have something to add, or want to takeAnonymous
May 29, 2009
PingBack from http://paidsurveyshub.info/story.php?title=bcl-team-blog-why-are-namevaluecollection-lookups-slower-thanAnonymous
June 09, 2009
PingBack from http://greenteafatburner.info/story.php?id=2430