Performance Quiz #6 -- Looking at the fourth cut

Article
05/16/2005

Raymond is definately making some great headway. Having targetted the single-character at a time conversion problem in version 4 of his program he's 1.84 times faster than the previous version.

Version	Execution Time(seconds)
Unmanaged v1	1.328
Unmanaged v2	0.828
Unmanaged v3	0.343
Unmanaged v4	0.187
Unoptimized Managed port of v1	0.124

As before I'm including the function costs in tree view, but only showing functions with an inclusive cost of at least 5% to keep the size of the report manageable.

Function Name (Sanitized)	ExclusivePercent	InclusivePercent
_mainCRTStartup	0.00	97.97
_main	0.00	97.97
Dictionary::Dictionary(void)	1.45	77.97
MultiByteToWideChar	6.67	7.25
std::vector<struct DictionaryEntry, ...>::insert(...)	0.29	14.20
std::vector<struct DictionaryEntry, ...>::insert(...)	0.58	13.62
std::vector<struct DictionaryEntry, ...>::_Ucopy(...)	0.00	6.67
std::_Construct<struct DictionaryEntry,struct DictionaryEntry>(...)	0.29	6.67
DictionaryEntry::DictionaryEntry(struct DictionaryEntry const &)	0.29	6.38
std::basic_string<...>::basic_string<...>(class std::basic_string<...> const &)	2.03	6.09
std::basic_string<...>::basic_string<...>(unsigned short const *,...)	0.29	7.83
std::basic_string<...>::assign(unsigned short const *,unsigned int)	0.00	7.54
DictionaryEntry::Parse(class std::basic_string<….> const &)	1.16	33.33
std::basic_string<...>::assign(...)	0.29	28.70
std::basic_string<...>::_Grow(unsigned int,bool)	0.58	26.09
std::basic_string<...>::_Copy(unsigned int)	1.16	25.22
operator new(unsigned int)	0.58	24.06
…	0.29	23.19
…	2.61	22.90
AllocateHeap	4.06	21.45
std::vector<struct DictionaryEntry,...>::~vector<...>(void)	0.00	18.84
std::vector<struct DictionaryEntry,class std::allocator<….>::_Destroy(...)	0.00	18.84
DictionaryEntry::~DictionaryEntry(void)	0.29	18.26
std::basic_string<...>::~basic_string<...>(void)	0.58	17.68
std::basic_string<...>::_Tidy(bool)	1.16	17.10
_free	0.29	15.65
FreeHeap	4.35	15.36

So looking at this new program it seems like its time to start targetting string management and the allocations. We're spending 24% of our time just allocating strings. Now what's interesting is that at this point things are fast enough that the tear-down at the end is getting to be signficant -- at 18.84% it's a good chunk of the cost.

Comments

Anonymous
May 16, 2005
Hey Rico!

I'm still aching to know how the Everett vs. Whidbey numbers look. :-)
Anonymous
May 16, 2005
I'm saving the managed analysis for a bit... partly waiting to see if any of my readers have more to say based on their own measurements and partly waiting for Raymond's unmanaged to finally beat the managed time.

I think it will be interesting to see what it takes and what finally did the trick. Hang in there a few more days.

I think the whole exercise is just full of surprises so I'm kinda hoping more people will say what's on their mind.
Anonymous
May 16, 2005
Hehe, you going to hold your trump card in reserve until the unmanaged one is as fast as it'll get? :)
Anonymous
May 16, 2005
I've been trying to figure if you and Raymond are doing this in concert - especially since you seem to be giving the hints to where Raymond's going to be going next...

And I'm still waiting for him to totally rip the CRT out of the picture.

He's pretty close right now. All he has to do is get rid of those darned string thingies (which are fundimentally pointless since he's dealing with what are effectively fixed length items).

And no, I didn't peek at his next posts.
Anonymous
May 16, 2005
Just out of curiosity I did a direct port of the managed version to Java. I'm getting around 203ms.
Anonymous
May 16, 2005
I did see Raymond's program quite some time ago, but I didn't do any analysis then. Still it's not hard to predict where he's likely to go because he keeps picking off the biggest performance problem and it's easy enough to spot that by looking at what he's done so far. It's not like I'm doing some exotic analysis here :)
Anonymous
May 16, 2005
The managed version takes around 50ms to run here, built with VS2005B2. (My PC is faster than I thought...)

About 53% of the time (including children) is spent in Parse, about 23% in Substring in Parse.

I don't know if the VS profiler can be made to tell me more about what's happening INSIDE something like Substring.

My approach (as a C programmer, really) my be to load the whole string and just use parse to calculate the offsets and lengths of everything, thereby avoiding all the string copying.

But of course, that would just be a trade-off of load time vs. access time, and I don't know enough about the rest of the application to know if that would be a good thing.
Anonymous
May 17, 2005
Rico.. Do you have any working set numbers to compare between the two implementations?
Anonymous
May 17, 2005
Rico, do you have any comments to my comment at:

http://blogs.msdn.com/ricom/archive/2005/05/12/416977.aspx#418471
Anonymous
May 31, 2005
So I was reading through one of my favorite MSDN blogs (http://blogs.msdn.com/oldnewthing/)

And he...

Share via

Performance Quiz #6 -- Looking at the fourth cut

Comments

Additional resources