Passing Strings to Unmanaged Code
I've just come across a nasty bug in some sample code (from us, I'm ashamed to say), that highlights the pitfalls of passing string buffers between managed and unmanaged code.
To go back a step or two, I've been trying to create a small application to pull metadata out of Windows Media files so that I can catalogue my music collection. (Incidentallly, there are several supported ways to achieve this, including the Windows Media Player SDK and the Windows Media Format SDK.) I'd come across this little function that iterated through all the metadata attributes in a file and dumped them to the console. But for some reason, the function only seemed to be printing the attribute names and not the associated values. The statement looked something like this:
Console.WriteLine("* {0, 3} {1, 25} {2, 3} {3, 3} {4, 7} {5}",
wIndex, pwszName, wStream, wLangID, pwszType, pwszValue);
According to the debugger, I was seeing the contents of wIndex and pwszName, but none of the other parameters. Stranger still, when I preceded the Console.WriteLine call with a similar call to MessageBox.Show, the function printed all the parameters. Needless to say, when you get into the kind of debugging situation where you're seeing truly unexpected results, you often disappear down a blind alley trying to solve a problem that doesn't exist. In my case, I started testing the hypothesis that it was a timing issue that the message box display eradicated; I wasted several hours experimenting with wait loops and searching through the documentation for references to file status that with hindsight couldn't have fixed the problem.
Suddenly it came to me in a flash: the debugger was showing the value of pwszName as "Duration\0". Of course! There was a null-termination character at the end of the string that shouldn't have been there. It wasn't that the call to Console.WriteLine didn't contain the right parameters - it was simply seeing the \0 and terminating the string at that point. MessageBox.Show obviously deals with this differently.
So how had pwszName got created like this? Looking back at the sample code that generated the values, I saw something like the following:
string pwszName = null;
ushort wNameLen = 0;
HeaderInfo3.GetAttributeByIndex( wAttribIndex,
ref wStreamNum,
pwszName,
ref wNameLen,
out wAttribType,
pbAttribValue,
ref wAttribValueLen );
pwszName = new String( (char)0, wAttribNameLen );
HeaderInfo3.GetAttributeByIndex( wAttribIndex,
ref wStreamNum,
pwszName,
ref wNameLen,
out wAttribType,
pbAttribValue,
ref wAttribValueLen );
It's pretty clear from this piece of code what's wrong: the creator (presumably a C++ programmer judging by the code style) has called the function once to determine the length of the retrieved string and then called it a second time to fill a pre-populated string. They forgot to trim the final null value(s), with a statement such as the following:
pwszName = pwszName.Substring(0, wNameLen);
Even this is not a great way of handling string buffers. A far better approach would have been to have used the System.Text.StringBuilder class - a mutable string type that can be passed wherever a string is required by an API function. Rather than trimming the returned string, I rewrote the API declaration to use a StringBuilder rather than a fixed-length string and changed the sample code accordingly:
StringBuilder pwszName = null;
ushort wNameLen = 0;
HeaderInfo3.GetAttributeByIndex( wAttribIndex,
ref wStreamNum,
pwszName,
ref wNameLen,
out wAttribType,
pbAttribValue,
ref wAttribValueLen );
pwszName = new StringBuilder(wNameLen);
HeaderInfo3.GetAttributeByIndex( wAttribIndex,
ref wStreamNum,
pwszName,
ref wNameLen,
out wAttribType,
pbAttribValue,
ref wAttribValueLen );
The moral of the story: whenever you need to pass a string buffer to a Windows API call, use StringBuilder. (Of course, string is just fine if the unmanaged function doesn't modify its contents.) And if you're wondering why a string is being prematurely truncated, make sure you check for rogue null-termination characters!
Comments
- Anonymous
March 25, 2004
I've endeavoured to do the same thing on a number of occassions trying to get my media collection metadata into a DB ..
I've mused a couple of times how nice it would be if Media Player had the option to store media metadata in MSDE -- or even better, with the advent of longhorn, use WinFS..
I'm glad that someone at MS sees the value in this ;) - Anonymous
March 25, 2004
Oh trust me, you'll be delighted by Longhorn :-)
WinFS has a much stronger understanding of the underlying metadata and will be able to do filtering, grouping and complex queries with a sub-second response. You can expect to see all these properties properly managed and surfaced via an XML schema. Check out the promotion / demotion stuff in this post:
http://blogs.msdn.com/tims/archive/2003/10/29/57428.aspx
Cheers,
Tim - Anonymous
March 25, 2004
Oh - I meant to add in the article that by far the best book on this subject is ".NET and COM - The Complete Interoperability Guide" - as per the following:
http://www.amazon.com/exec/obidos/ASIN/067232170X - Anonymous
April 14, 2004
Tim, any chance of you posting your code? Or at least the metadata interrogator class?
Thanks!! - Anonymous
April 14, 2004
Never mind! http://blogs.msdn.com/tims/articles/100730.aspx. :) - Anonymous
April 12, 2009
In this article, I'll describe how to use the Windows Media Format SDK to access the metadata embedded