Investigating a highly intermittent automation failure

Artigo
04/13/2009

One of our developers here ran my napkin math automation script the other day and reported it had a bogus failure while evaluating an equation. Now, napkin math is a fairly bulletproof feature so I was a bit skeptical that a bug in OneNote had been uncovered. The specific test that had failed was this:

cos(75)-cos(30)*cos(45)-sin(-30)*sin(45)=

and the answer it was looking for is 0.

When I looked at what had happened, the text typed was this:

cos(75)-cos(30)*cos(45)-sin(-30)*sin

which is incomplete. The final argument (45) and the equals sign were missing, either which would cause the test to fail. Napkin math triggers its computation by that equal sign, and a missing argument would render the equation invalid. When my comparison runs to see if the answer was computed, it fails since the equation was not typed correctly.

From here, I had a few different starting points. I could check to see if the test had been altered and gotten corrupted (it hadn't). I could go to the machine which had failed to see if there were any clues there. I could look up the results to see if the error had happened on more than one machine. I could also try the test manually to see if there was a new bug in OneNote. In fact, this last starting point was where I started. The formula was correctly evaluated.

Next, I looked at the results. The test had passed 1499 out of 1500 times, and only failed once. Hmm - that seems like a clue. I looked at what the test was actually doing - it uses APIs straight out of Windows to find the dialog, then uses the .NET framework to type on the page and wait for the application to process the message- the keystrokes to be typed, in this case.

The code looked like something like this (with needed imports to get it running):

 IntPtr myPointer = GetForeGroundWindow();
System.Windows.Forms.SendKeys.SendWait( ----string of math to evaluate----);

and then look at the answer and compare it to the expected. Once out of every 1500 times the string passed to the API was not getting fully typed.

There doesn't look like much I can do with this at first glance. I'm passing a string to the .NET framework and only some subset of the characters are getting processed. If ALL the characters were lost, I could do something like this to see where my keystrokes are being sent:

 StringBuilder dlgText = new StringBuilder("",16384);
GetWindowText(myPointer, dlgText, 16384);
Log("I sent my keys to the window with this text: " + dlgText);

But I know that the keys got passed to the correct window - after all, the atomic call got sent to OneNote.

(And I’m adding that logging anyway just to see where the keys are getting sent in case a script fails here in the future. Lesson learned over the years: you can’t have too much logging).

So this line of thinking is pretty much a dead end. This is looking more like a timing problem in that my verification is getting called before the text on the screen is fully typed.

I'll keep you posted on this one. Since the test "only" fails once out of 1500 times, this may take a while for me to get to.

Questions, comments, concerns and criticisms always welcome,

John

Comments

Anonymous
April 13, 2009
Hmmm.. that's interesting. Out of curiosity, does it fail once out of 1500 every single time? what about if you run it a 1000 times? 3000 times?
Anonymous
April 14, 2009
This is great - did you notice that www.live.com now supports variables in its napkin math functionality?
Anonymous
May 16, 2009
I don't know where to post this, but I'm getting incorrect order of operation evaluation in my oneNote math in oneNote 2007 Particularly: 3/4*12=0.0625 (3/4)*12=9 since * and / should have the same precedence, shouldn't these two equations be equivalent?
Anonymous
June 04, 2009
So my tablet PC is still being used for debugging and it is taking longer than anyone expected. No big

Compartilhar via

Investigating a highly intermittent automation failure

Comments

Recursos adicionais