System.Uri in .Net 4.5 and un-escaping "%27"
I lost access to my MSDN blog account for some reason and regained it recently. I also spent some time figuring out how to get Windows Live Writer installed on Win10 which is much harder than I first thought (an easier way is described here).
This post is a follow up to the previous two posts complaining the System.Uri class. There is an idiom in Chinese which can be translated to ‘Three and Out’. But I hit an issue related to System.Uri the third time :-(.
The corresponding connect bug is here. The repro:
void Test()
{
Uri uri1 = new Uri("https://en.wikipedia.org/wiki/%2751_Dons");
Uri uri2 = new Uri("https://en.wikipedia.org/wiki/%2738_%E2%80%93_Vienna_Before_the_Fall");
System.Net.WebClient client = new System.Net.WebClient();
// https://en.wikipedia.org/wiki/%2751\_Dons
Console.WriteLine("{0}", uri1.AbsoluteUri);
// https://en.wikipedia.org/wiki/'38\_%E2%80%93\_Vienna\_Before\_the\_Fall
Console.WriteLine("{0}", uri2.AbsoluteUri);
// As you can see, the second Uri unescapes "%27" to "'" while the first doesn't.
// The second download will throw System.Net.WebException.
// This is because wiki redirects '38_%E2%80%93_Vienna_Before_the_Fall back to %2738_%E2%80%93_Vienna_Before_the_Fall,
// which becomes an infinite loop.
client.DownloadString(uri1);
client.DownloadString(uri2);
}
The un-escaping happens inside ‘System.Uri.EnsureParseRemaining’ and its behavior is determined by the following helper ‘System.UriHelper.IsReservedUnreservedOrHash’:
private const string RFC2396ReservedMarks = @";/?:@&=+$,";
private const string RFC3986ReservedMarks = @":/?#[]@!$&'()*+,;=";
private const string RFC2396UnreservedMarks = @"-_.!~*'()";
private const string RFC3986UnreservedMarks = @"-._~";
private static unsafe bool IsReservedUnreservedOrHash(char c)
{
if (IsUnreserved(c))
{
return true;
}
if (UriParser.ShouldUseLegacyV2Quirks)
{
return ((RFC2396ReservedMarks.IndexOf(c) >= 0) || c == '#');
}
return (RFC3986ReservedMarks.IndexOf(c) >= 0);
}
So it appears to be a breaking change in .Net 4.5 mentioned here. Net 4.5 follows RFC3986 while older versions follow RFC2396 and they have different sets of reserved characters (%27 un-escapes to ‘, which is only reserved in RFC3986).
private enum UriQuirksVersion {
// V1 = 1, // RFC 1738 - Not supported
V2 = 2, // RFC 2396
V3 = 3, // RFC 3986, 3987
}
// Store in a static field to allow for test manipulation and emergency workarounds via reflection.
// Note this is not placed in the Uri class in order to avoid circular static dependencies.
private static readonly UriQuirksVersion s_QuirksVersion =
(BinaryCompatibility.TargetsAtLeast_Desktop_V4_5
// || BinaryCompatibility.TargetsAtLeast_Silverlight_V6
// || BinaryCompatibility.TargetsAtLeast_Phone_V8_0
) ? UriQuirksVersion.V3 : UriQuirksVersion.V2;
internal static bool ShouldUseLegacyV2Quirks {
get {
return s_QuirksVersion <= UriQuirksVersion.V2;
}
}
However, I believe this is actually a bug because the un-escaping isn’t applied consistently (as shown by the repro above). The un-escaping only happens under the following conditions (if the uri contains non-ASCII unicode and Iri parsing is enabled):
if (m_iriParsing && hasUnicode){
// In this scenario we need to parse the whole string
EnsureParseRemaining();
}
The hack below depends on this inconsistency to work, though :-).
Anyway, unfortunately, my code depends on some 4.5 only features and I can’t simply switch it back to the older .Net version. In fact, the component which triggers the issue only uses .Net 2.0.
So I have to hack again (after hours of debugging). BTW, when I was debugging the .Net source, the debugger incorrectly uses my source file instead of downloading the correct one from the source server (both file have the name ‘Uri.cs’). I enable ‘Require source files to exactly match the original version’, so I would expect the debugger to at least tell me that the wrong source file doesn’t match the hash in pdb.
Unlike the previous two hacks, this one impacts all Uri objects because it has to change a static class member. Also, due to the nature of the issue (the Uri object is created by System.Net.WebClient internally, not me), it looks like the only viable option to me.
Here is the code snippet of the hack (setting this property to false will revert to pre-4.5 behavior):
public static bool AllowIriParsing
{
get
{
FieldInfo fieldInfoIriParsing = typeof(Uri).GetField("s_IriParsing", BindingFlags.Static | BindingFlags.NonPublic);
if (fieldInfoIriParsing == null)
{
throw new MissingFieldException("'s_IriParsing' field not found");
}
return (bool)fieldInfoIriParsing.GetValue(null);
}
set
{
FieldInfo fieldInfoIriParsing = typeof(Uri).GetField("s_IriParsing", BindingFlags.Static | BindingFlags.NonPublic);
if (fieldInfoIriParsing == null)
{
throw new MissingFieldException("'s_IriParsing' field not found");
}
fieldInfoIriParsing.SetValue(null, value);
}
}
How does the hack work? The trick is that if s_IriParsing is false, m_iriParsing will become null and disable the un-escaping logic. Note that IriParsing is disabled by default prior to .Net 4.5 (see this).
// Value from config Uri section
// On by default in .NET 4.5+ and cannot be disabled by config.
private static volatile bool s_IriParsing =
(UriParser.ShouldUseLegacyV2Quirks ? IriParsingElement.EnabledDefaultValue : true);
Ideally, I prefer to alter ‘UriParser.ShouldUseLegacyV2Quirks’ instead. But it is a property and checks against a read-only member we can’t modify even with reflection.
Comments
- Anonymous
February 25, 2016
Here's another weird encountering with URI: danielwertheim.se/uri-behaves-differently-in-net4-0-vs-net4-5