Use streams to simplify Stitching problems
Sometimes you need to take several different streams of information and "stitch" them together into a single stream. This sounds simple, but can be very meticulous because you need to deal with things like:
* off-by-1 bugs and lots of corner cases
* generally iterators / sliding windows for each stream, which translates to lots of locals and lots of potential corner cases.
* ordering which stream gets precedence
* corner cases if one stream runs out first
In my case, I'm adding an IL-diassembly window to the MDbg GUI. This means stitching IL code, source-code, and native information together.
Improved simplicity:
My goal with stitching problems is to convert them into something simple, meaning: easy to understand, maintain, and change. I think you need to make a distinction between "total lines" and "complicated lines". I'd rather refactor one 100-line function of 20 locals into four 50-lines functions of 3 locals each. Why? Because I personally find that often:
1) complexity increases exponentially with the number of locals you're juggling. I find this to be because the locals tend to interact with each other and so you need to keep all the relationships straight between all the locals. You always have to make sure that each local is being used correctly and that you're not accidentally manipulating the wrong one, and that the invariants between each uphold, etc. When hunting down a bug in a function with 30 separate local variables that are all indices into an array, you need to make sure that each index is being used correctly.
2) complexity increases linearly with lines of forward-flowing code (no loops, no function calls, no raw gotos) . That's because each line only has 1 state (remember, no loops here), and you don't have to compare every line to every other line.
(I'm sure there are formal studies out there - feel free to point me at them. But in the absence of them, I'm talking about my personal observations).
For stitching problems, practically this means creating an enumerator class to wrap each stream. This means writing more helper classes, but then you're just juggling around a few classes instead of an army of locals.
An experiment:
So here's an experiment. Below are two code snippets to do the stitching for the IL-disassembly window. Imagine that you wanted to do something like change the order that the streams get put together. Which snippet would you rather manipulate? The old snippet has no abstractions and exposes everything directly. It has 15 local variables. The new snippet creates a stream class for each stream to be stitched in (il-disassembly, source-code, and native), and thus only has 3 locals.
Old Code:
This is about 160+ lines, and has 15 local variables tracking state.
int il2NativeEndIl;
int il2NativeCursor = 0;
if (il2nativeMapping == null)
{
// No native info, so don't display it. Put range just out of reach.
il2NativeEndIl = (code.Length + 1);
}
else
{
SortMapByIL(il2nativeMapping);
// Skip past initial entries for prolog, epilogue, unmapped, etc.
while ((il2NativeCursor < il2nativeMapping.Length) && (il2nativeMapping[il2NativeCursor].IlOffset < 0))
{
il2NativeCursor++;
}
// What's the IL range of the current (il2NativeCursor) IL2Native entry?
il2NativeEndIl = (il2NativeCursor + 1 >= il2nativeMapping.Length) ?
(code.Length - 1) :
(il2nativeMapping[il2NativeCursor + 1].IlOffset - 1);
}
string lineFormatPrefix = "//{0,7}:{1}"; // args: line #, source contents.
// We assume sequence points are sorted by IL offset. We assert that in the loop below.
// Now we need to go through and stitch the IL + Source together.
// This also works even if we have no source (since that's just the degenerate case of 0 sequence points)
int adjust = 0;
int ilCursor =0;
int lastSeqIlOffset = -1;
// @@ - it seems like we could seriously benefit from some abstractions here.
// Header information
finalLines.Add(String.Format(CultureInfo.InvariantCulture,
"> Function name:{0} (token={1:x})", fullName, token));
adjust++;
int lastRow = -1; // unique value
// Now the real stuff
for(int i = 0; i < seqCount; i++)
{
// Assert out assumption that sequence points are sorted by IL offsets.
Debug.Assert(seqIlOffsets[i] > lastSeqIlOffset);
lastSeqIlOffset = seqIlOffsets[i];
string path = seqPaths[i]; ;
SourceViewerForm file = GetSourceFile(parent, path);
// Add IL snippets that occur before this sequence point.
while (ilCursor < seqIlOffsets[i])
{
int thisRow = m_il2RowMapping[ilCursor];
if (thisRow != lastRow)
{
// Add lines for IL.
finalLines.Add(" " + lines[thisRow]);
Debug.Assert(thisRow > lastRow);
lastRow = thisRow;
}
// Adjust mapping for the source lines that we've injected.
m_il2RowMapping[ilCursor] += adjust;
// check if this matches any native code? Print this after the IL code.
// We want to print the Native range after
int ilNext = ilCursor + 1;
if (ilNext > il2NativeEndIl)
{
string nativeInfo = String.Format(CultureInfo.InstalledUICulture,
" IL:0x{0:x},0x{1:x} --> Native:0x{2:x},0x{3:x} (N. size=0x{4:x} bytes)",
il2nativeMapping[il2NativeCursor].IlOffset,
il2NativeEndIl+1,
il2nativeMapping[il2NativeCursor].NativeStartOffset,
il2nativeMapping[il2NativeCursor].NativeEndOffset,
il2nativeMapping[il2NativeCursor].NativeEndOffset - il2nativeMapping[il2NativeCursor].NativeStartOffset);
finalLines.Add(nativeInfo);
adjust++;
finalLines.Add(""); adjust++; // extra line for easier viewing
il2NativeCursor++;
il2NativeEndIl = (il2NativeCursor + 1 >= il2nativeMapping.Length) ?
(code.Length-1) :
(il2nativeMapping[il2NativeCursor + 1].IlOffset - 1);
}
ilCursor++;
}
// Now add text for this sequence point
if (file != null)
{
for (int j = seqStartLines[i]; j <= seqEndLines[i]; j++)
{
// Need to count how many source lines we've injected so that we can adjust
// the il2row mapping accordingly.
adjust++;
if (j == 0xFeeFee)
{
finalLines.Add("// !hidden!");
}
else if (j > file.Count)
{
finalLines.Add(String.Format(CultureInfo.InvariantCulture, lineFormatPrefix, j, "<out of range>"));
}
else
{
// Write out: // line# : <contents>
finalLines.Add(String.Format(CultureInfo.InvariantCulture, lineFormatPrefix, j, file.GetLine(j)));
}
}
}
else
{
// Source file not found.
adjust++;
finalLines.Add("// File '" + path + "' missing. Line " + seqStartLines[i] + " to line " + seqEndLines[i]);
}
} // end for sequence points.
// @@ - this is all redundant with the stuff above.
// Now add the last tidbit of IL.
while (ilCursor < code.Length)
{
int thisRow = m_il2RowMapping[ilCursor];
if (thisRow != lastRow)
{
finalLines.Add(lines[thisRow]);
Debug.Assert(thisRow > lastRow);
lastRow = thisRow;
}
m_il2RowMapping[ilCursor] += adjust;
// check if this matches any native code? Print this after the IL code.
// We want to print the Native range after
int ilNext = ilCursor + 1;
if (ilNext > il2NativeEndIl)
{
string nativeInfo = String.Format(CultureInfo.InstalledUICulture,
" IL:0x{0:x},0x{1:x} --> Native:0x{2:x},0x{3:x} (N. size=0x{4:x} bytes)",
il2nativeMapping[il2NativeCursor].IlOffset,
il2NativeEndIl+1,
il2nativeMapping[il2NativeCursor].NativeStartOffset,
il2nativeMapping[il2NativeCursor].NativeEndOffset,
il2nativeMapping[il2NativeCursor].NativeEndOffset - il2nativeMapping[il2NativeCursor].NativeStartOffset);
finalLines.Add(nativeInfo);
adjust++;
finalLines.Add(""); adjust++; // extra line for easier viewing
il2NativeCursor++;
il2NativeEndIl = (il2NativeCursor +1 >= il2nativeMapping.Length) ?
(code.Length - 1) :
(il2nativeMapping[il2NativeCursor + 1].IlOffset - 1);
}
ilCursor++;
}
The new code:
The simplified code is much easy to understand. It creates stream classes and single local variable for each of the streams (seqIterator for sequence points, ilDasm for the il-disassembly, il2NativeIterator for the native ranges) . Thus you're down to 3 locals instead of 15, and 20 lines of stitching code instead of 160. Yes, you have additional code for the stream helper classes, but each stream class is separated from the others and thus simple. So the overall solution is more total lines, but fewer complex lines, and thus simpler.
// Walk through the IL in order and write out interleaved IL and Sequence Points.
while (!seqIterator.IsDone)
{
// Add IL snippets that occur before this sequence point.
WriteIlAndNative(ilDasm, il2nativeIterator, seqIterator.IlOffset);
seqIterator.WriteSource();
seqIterator.Next();
}
// Write the IL that's after the last sequence point
WriteIlAndNative(ilDasm, il2nativeIterator, ilDasm.IlLength);
And the WriteILAndNative helper function is:
// Helper to write the IL and corresponding native code up to the end offset
static void WriteIlAndNative(ILDasmIterator ilDasm, Il2NativeIterator il2nativeIterator, int ilEndOffset)
{
while (ilDasm.IlOffset < ilEndOffset)
{
ilDasm.WriteIl();
il2nativeIterator.WriteNative(ilDasm.IlOffset);
ilDasm.Next();
}
}
The helper classes are about 250 lines.
Some hacky math:
Just for kicks, let's compute a "Complexity Factor" according to my observations above:
So the first snippet has 15 locals spread across 160 lines. That would be 152 * 160 = 36000.
The second snippet has two parts. The helper classes are average 5 locals across 350 lines, 52 * 250 = 6250. The stitching has 3 locals across 20 lines, 32 * 20=180. The helper and stitching sum to 6430.
So refactoring the streams into classes really reduces the "complexity factor" by over 80%. Yeah, the math is silly and informal, but you get the point.
Comments
- Anonymous
November 22, 2005
good one :-)