Udostępnij za pośrednictwem


The User-Agent String: Use and Abuse

When I first joined the IE team five years ago, I became responsible for the User-Agent string. While I’ve owned significantly more “important” features over the years, on a byte-for-byte basis, few have proved as complicated as the “simple” UA string.

I (and others) have written a lot about the UA string over the years. This post largely assumes that you’re familiar with what the user-agent string is and what it’s commonly (mis)used for. 

In this post, I’ll try to summarize why the UA string causes so many problems (beyond browser version sniffing), and expose the complex tradeoff between compatibility and extensibility.

Background

First things first-- you can check the UA string currently sent by your browser using my User-Agent string test page.

Do you see anything in there that you weren’t expecting?

Changing the User-Agent String at Runtime

For IE8, we fixed significant bugs in the UrlMkSetSessionOption API, which allows setting of the User-Agent for the current process. Before IE8, calling this API inside IE would (depending on timing) set the User-Agent sent to the server by WinINET, or set the User-Agent property in the DOM, but never properly set both.

I developed a simple User-Agent Picker Add-on for IE8 that allows you to change your User-Agent string to whatever you like. You can then easily see how websites react to various UA strings. For instance, try sending the GoogleBot UA string to MSDN to see how that site is optimized for search.

Internally, the add-on simply exercises the URLMon API:

UrlMkSetSessionOption(URLMON_OPTION_USERAGENT, szNewUA, strlen(szNewUA), 0)

Alternatively, Web Browser Control hosts can change the User-Agent string sent by hyperlink navigations by overriding the OnAmbientProperty method for DISPID_AMBIENT_USERAGENT. However, the overridden property is not used when programmatically calling the Navigate method, and it will not impact the userAgent property of the DOM's navigator or clientInformation objects.

Extending the User-Agent String in the Registry

It’s trivial to add tokens to the User-Agent string using simple registry modifications. Tokens added to the registry keys are sent by all requests from Internet Explorer and other hosts of the Web Browser control. These registry keys have been supported since IE5, meaning that all currently supported IE versions will send these tokens.

Other browsers (Firefox, Chrome, etc) do not offer the same degree of ease in extending the UA string, so it’s uncommon for software to extend the UA string in non-IE browsers.

Update 3/23/2010: IEBlog announces that IE9 will no longer send registry tokens to the server.

The Fiasco

Unfortunately, the ease of extending IE’s UA string means that it’s a very common practice. That, in turn, leads to a number of major problems that impact normal folks who don’t even know what a UA string is. 

A few of the problems include:

  1. Many websites will return only error pages upon receiving a UA header over a fixed length (often 256 characters).
  2. In IE7 and below, if the UA string grows to over 260 characters, the navigator.userAgent property is incorrectly computed.
  3. Poorly designed UA-sniffing code may be confused and misinterpret tokens in the UA.
  4. Poorly designed browser add-ons are known to misinterpret how the registry keys are used, and shove an entire UA string into one of the tokens, resulting in a “nested” UA string.
  5. Because UA strings are sent for every HTTP request, they entail a significant performance cost. In degenerate cases, sending the UA string might consume 50% of the overall request bandwidth.

Two real-world examples:

My bank has problem #1. They have security software on their firewall looking for “suspicious” requests, and the developers assumed that they’d never see a UA over 256 bytes.

Some major sites are using super-liberal UA parsing code (problem #3) to detect mobile browsers. Unfortunately, for instance, Creative Labs adds the token “Creative AutoUpdate” to the UA string. Naive server code sees the characters pda inside that token and decides that the user must be on a mobile browser. The server might then return WML content that the desktop browser will not even render, or provide an otherwise degraded experience. Worse still, some sites don’t send a Vary: User-Agent response header when returning the mobile content, meaning that network proxies will sometimes start sending everyone content designed for mobile devices.

Ultimately, the problem is what economists call the Tragedy of the Commons, although personally I prefer the visual representation. You might remember that the extensibility of the Accept header leads to the same problem, although that header is sent so unreliably that no sane website would depend upon it.

Standards

It’s tempting to look to the standards for restrictions on the UA string. Unfortunately, the RFC for HTTP has little to say on the topic:

14.43 User-Agent

The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations. User agents SHOULD include this field with requests. The field can contain multiple product tokens (section 3.8) and comments identifying the agent and any subproducts which form a significant part of the user agent. By convention, the product tokens are listed in order of their significance for identifying the application.

User-Agent = "User-Agent" ":" 1*( product | comment )

Example:

User-Agent: CERN-LineMode/2.15 libwww/2.17b3

Notably, the RFC does not define a maximum length for the header value, and does not provide much guidance into what “subproducts which form a significant part of the user agent” means. It suggests a few broad uses of the UA string on the server-side, without discussion of what problems such usage might introduce.

Motivations for UA Modification

OEMs and ISVs have a number of motivations for adding to the UA string.

  1. Metrics. Every server on the web can easily tell if your software is installed.
  2. Client capability detection. JavaScript can easily detect if your (ActiveX control / Protocol Handler / Client application / etc) is available.
  3. User Tracking. I don’t know of any current offenders, but at some point in the past some software would add a GUID token to the UA string. This token would effectively act as an invisible “super-cookie” that would be sent to every site the user ever visited.

Now, scenario #3 is clearly evil, and we have no desire to support it. Scenarios #1 and #2 aren’t inherently bad—but advertising to every site in the world that a given piece of software is available on the client is probably the wrong design.

Known UA Tokens

Here are some explanations of common tokens found in real-world IE UA strings.

Token Meaning / Component
SV1 Security Version 1- Indicates that XP SP2 was installed. Removed from IE7.
SLCC1 Software Licensing Commerce Client- Indicates Vista+ AnyTime Upgrade component is available. 
MS-RTC LM 8 Microsoft Real Time Conferencing Live Meeting version 8
InfoPath.2 InfoPath XML MIME Filter
GTB6 Google Toolbar
Creative AutoUpdate Creative AutoUpdate software
Trident/4.0 IE8 version of HTML Renderer installed
Zune 3.0 Zune Software client
Media Center PC 6.0 It's a Media Center PC
Tablet PC 2.0 It's a TabletPC
.NET CLR 3.5.30729 The .NET Common Language Runtime
chromeframe Google ChromeFrame addon
fdm FreeDownloadManager.org add-on
Comcast Install 1.0 Comcast High-speed Internet installer
OfficeLiveConnector.1.3 Office Connector
OfficeLivePatch.0.0 ??
WOW64 Running in 32bit IE on 64bit Windows
Win64; x64 Running in 64bit IE
msn OptimizedIE8 Installed with MSN branding and services
yie8 Installed with Yahoo! branding and services

Alternatives to UA Modification

In many cases, allowing client-side script to detect a capability without forcing the browser to send that information to the server would be sufficient. While new APIs might be proposed for this purpose, we need an alternative that already works in all versions of IE prior to Internet Explorer 10 Standards Mode.

You probably know that Conditional Comments can be used to detect the IE version, but they can also be used to detect custom information about any component listed in the registry’s version vector key. For instance, Windows 7 uses the new WindowsVersion entry to allow script to detect the OperatingSystemSKU.

To expose your capabilities via conditional comments, simply create a REG_SZ inside HKLM\SOFTWARE\Microsoft\Internet Explorer\Version Vector. The new entry should be named uniquely (e.g. EricLaw-SampleAddon) and contain a string in the format x.xxxx (e.g. 1.0002).

You can then detect the version (or absence) of your component using conditional comments:

<!--[if !EricLawSampleAddon]><script>alert("You don’t have my IE add-on yet. Go install it!");</script><![endif]-->
<!--[if lt EricLawSampleAddon 1.0002]><b>You have an outdated version. Go upgrade!</b><![endif]-->

These conditional comments are hidden from non-IE browsers, and will work properly in IE5 to IE9. IE10 Standards Mode removes support for Conditional Comments.

Conclusions?

Extensibility is an important aspect for any major software project, but can also be the source of severe compatibility problems that are extremely painful to fix in the future. As we increase the power of the web platform, we need to find ways to ensure that extension points and the tragedy of the commons don’t destroy the user’s experience.

Until next time,

-Eric

Update 3/23/2010: IEBlog announces that IE9 will no longer send registry tokens to the server.

Update 7/6/2011: IEBlog announces that IE10 Standards Mode will not support Conditional Comments.

Update 9/26/2012: Windows 8 "HTML and JavaScript" applications do not send User-Agent extension tokens from the registry.

Comments

  • Anonymous
    October 07, 2009
    I've had quite a few problems with a user-agent that's too long.  The biggest offender is actually Microsoft .Net.  After trimming mine down here's what it looks like: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.0.3705; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; WWTClient2; Zune 4.0) It's shorter than it used to be because I removed the 2 other .NET 3.0.x references and the one extra .NET 3.5.x reference.  It was originally long enough that a simple HTTP request header was more than 1024 bytes and more than one server wasn't ready for that. Thanks for explaining the issues with user-agent.  There isn't a lot of good info out there about it.

  • Anonymous
    October 07, 2009
    Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 1.1.4322) this is mine, do all version of CLR need to be included?

  • Anonymous
    October 08, 2009
    @Steve: You didn't trim far enough. Your IE8 UA string has an IE6 UA string shoved in the middle (problem #4). That is causing some sites to detect your browser as IE6 rather than IE8. @Vaibhav: That's a non-trivial question, but it's generally safe to remove earlier duplicates of major versions (e.g. leave only one 3.0., one 3.5., etc). If you want, you might choose to remove all but the latest version (3.5.21022).

  • Anonymous
    October 08, 2009
    You may have just changed the design for something I was going to implement next week.

  • Anonymous
    October 08, 2009
    Are there any plans to release a Windows Update patch to clean up nested UAs, GUIDs, redundant CLR versions, and other UA spam? Hopefully the Media Center, Tablet PC, Zune, Comcast Install, OfficeLive stuff, MSN stuff, Media Player, eMusic DLM, InfoPath, &c., too. Are there long-term plans to put the Internet on notice that "Mozilla/4.0 (compatible;" is going away?

  • Anonymous
    October 08, 2009
    The comment has been removed

  • Anonymous
    October 08, 2009
    I spend a lot of time working with UA strings ( http://www.stevesouders.com/blog/2009/01/18/user-agents-in-the-morning/ ). I plan on rolling out the UA parsing code in Browserscope ( http://www.browserscope.org/ ) as an open source project, but need someone to run the project. Contact me if you're interested ( http://stevesouders.com/contact.php ). Finally, make sure to read the "History of the browser user-agent string" ( http://webaim.org/blog/user-agent-string-history/ ).

  • Anonymous
    October 08, 2009
    >navigator.plugins. Clearly this, or something like it is the ideal. Yes, it would be better if navigator.plugins returned a non-empty array in IE, but that wouldn't be enough. You'd also want something like navigator.OS.supports() or something of that nature, to account for all of the ITunes, Zune, etc, clients that try to advertise in the UA string. >plans to put the Internet on notice that "Mozilla/4.0 (compatible;" is going away? The arrogant practice of "putting the Internet on notice" doesn't really work. There are tons of examples of this, but the most glaring is that the documentation on MSDN for the STRICT doctype clearly indicated that opting into STRICT would break pages in future versions. No one cared, and it led to massive compatibility problems in IE7, and the introduction of CompatView in IE8. As with all such proposed UA strings, I encourage you to use the UAPick tool to give your proposed UA a try. You need to keep in mind, of course, that IE is used to render billions of public and non-public pages, however, and you're unlikely to visit any significant fraction of them. Regarding your thoughts about HTTP ACCEPT, I've covered those in a prior post, and don't wish to rehash them here.

  • Anonymous
    October 08, 2009
    "The arrogant practice of "putting the Internet on notice" doesn't really work." So, we can expect the IE6, IE7, and IE8 rendering engines in IE9? ;) I just meant that even Microsoft can't withstand the crushing weight of infinite backward compatibility, and that means letting people know you're serious when breaking changes occur (once the market has adapted enough to minimize the damage, of course). I was just wondering if anyone is thinking four, five, twenty years down the road to see when "Mozilla" goes away. I mentioned Accept only because of your overly broad dismissal of it; there is certainly some opportunity to use it. I'll just leave it at that.

  • Anonymous
    October 09, 2009
    Can you comment on the article below from MSFT stating that "There's a known issue where IE has a bug where too many UA string willl cause problems." https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=362923&wa=wsignin1.0 I do not believe this is really an IE error, is it?  The same article mentions that .Net extends the string, which is a problem, but is not what I believe to be the real "bug".   I believe the "bug" is what you have described as "Fiasco" #1.  Applications need to be ready to handle strings longer than 255 bytes.  What are your thoughts?

  • Anonymous
    October 09, 2009
    @Nomen: IE6 and 7 did have a bug (navigator.userAgent property is incorrectly computed when the string is over 260 bytes) but generally speaking, yes, it's various server code that has problems with longer strings, not IE itself.

  • Anonymous
    November 12, 2009
    Thank you! Saved me a lot of grief: http://codecorner.galanter.net/2009/11/12/internet-explorer-renders-incorrect-html/

  • Anonymous
    March 31, 2010
    Actually Firefox and other Gecko-based browsers do now provide an easy way for add-ins to extend the UA string. For instance, one add-in provides a general.useragent.extra.microsoftdotnet value.

  • Anonymous
    March 31, 2010
    @Neil: I didn't say they didn't allow for extension of the UA, I said it wasn't common because it wasn't as simple as writing a registry key.

  • Anonymous
    May 27, 2010
    Hi Eric, You mentioned "Before IE8, calling this API inside IE would (depending on timing) set the User-Agent sent to the server by WinINET, or set the User-Agent property in the DOM, but never properly set both. " can you comment further on the 'timing' required for a urlmksetsessionoption call to have the UA set for both requests to the server and the UA property on the DOM (for IE7 users)?

  • Anonymous
    May 27, 2010
    @JakeC: As far as I know, it wasn't possible to set both correctly prior to IE8.

  • Anonymous
    May 27, 2010
    Hey Eric - just FYI... I played with this a bit more and found that if I set the UA during startup of my process and when handling DISPID_AMBIENT_USERAGENT, then the UA is setup properly for both cases in IE7. To ensure no duplication, I just check for the existence of my UA customization prior to setting it again. (Note that the very first call to ObtainUserAgentString in the handling of DISPID_AMBIENT_USERAGENT does not show the customization I added during startup of my process.)

  • Anonymous
    November 16, 2010
    I ran into an interesting issue using the web browser control and ie8... I specify 8000 as the value for FEATURE_BROWSER_EMULATION in the feature controls section of the registry because I want the WebOC to default to ie8 mode but honor the user's compatibility settings set via IE settings. This works great (UA reports MSIE 7.0 and webOC renders in compat mode) - except if I try to tweak IE's useragent string with my customization. To do this, I do an ObtainUserAgentString( 0, ...). Even if the user has selected 'show all sites in compatibility mode', this call returns a UA with MSIE 8.0. I add "; JakesBrowser 1.1.1.1;" to the string and set it with UrlMkSetSessionOption. From that point forward, the WebOC reports the UA to sites as MSIE 8.0, but renders the sites in compatibility mode (since it honors the users compatibility settings and in this case I had 'show all sites in compatibility view' set). This results in rendering issues if the site switches styles based on the UA. Is this expected behavior? I can understand why embedded IE might not update the MSIE version after I've messed with it - it probably assumes I know what I am doing :-) and doesn't try to update it again even if various compatibility view settings have been made. However, I'm not touching the string anywhere around the MSIE version. Is there a way to tweak the UA string with my customization and still have the WebOC automagically update the MSIE version based on the compatibility view settings (have my cake and eat it too :-) )?

  • Anonymous
    November 16, 2010
    @JakeC: Alas, no. When you use UrlMkSetSessionOption to set the UA string, it sets the UA string permanently until you call it again. We do not attempt to "tweak" that string based on the CompatView mode.

  • Anonymous
    March 02, 2011
    Just following up the above post (RE: useragent string issue I encountered using the webbrowser control with sites in compatibility view). This has come back up with our product as we are seeing a number of users encountering rendering problems with sites listed in the compatibility view settings. Unfortunately, the useragent customization I do is a product requirement (and for better or worse a number of sites look for it), but that prevents IE from updating the UA to report the proper IE version for sites in compatibility view. Any advice on how to workaround this? I thought of skipping the call to UrlMkSetSessionOption and postpone our useragent customization to http request time (perhaps by implementing ihttpnegotiate and tweaking the UA as each request goes out). That way, IE will already have (I think) updated the MSIE version. My concern there is that may not update the DOM's navigator.userAgent property with our UA customization. Any idea if tweaks to the UA via ihttpnegotiate will be reflected in the DOM's various userAgent properties?

  • Anonymous
    March 02, 2011
    @JakeC: No, UrlMkSetSessionOption is the right way to change the UA string for the DOM and for network requests. Manual editing in IHTTPNegotiate will not be reflected in the DOM.

  • Anonymous
    March 22, 2011
    One correction to my prior post... even if I do not touch the useragent string at all (i.e., no call to urlmksetsessionoption), I'm still seeing an issue with the webbrowser control's UA for sites in compatibility view. The useragent header sent in the http request looks good - it reports MSIE 7.0 for a site in compat view. However, the DOM properties navigator.userAgent and clientInformation.userAgent both still specify MSIE 8.0. Just a heads up - maybe it can be addressed in a future IE release.

  • Anonymous
    March 22, 2011
    @JakeC: The Web Browser Control doesn't use the Compatibility View feature offered by the desktop browser. Hence, I have no idea how you could get that result without calling URLMkSetSessionOption. www.enhanceie.com/ua.aspx is a nice test page which shows off the various versioning-related properties, and it properly shows "7.0" in both the HTTP headers and the DOM property.

  • Anonymous
    March 22, 2011
    Sorry if I have misunderstood... The documentation I read suggests that if your app specifies the decimal value 8000 for the FEATURE_BROWSER_EMULATION registry control, it will get compatibility view support. If I add enhanceie.com to the compatibility view list in IE, I agree that everything looks as expected for www.enhanceie.com/ua.aspx in standalone IE - host ua header specifies MSIE 7.0, DOM UA props specify MSIE 7.0, and document mode is 7. However, in my app hosting the WebOC (with urlmksetsessionoption removed) and in the Microsoft sample webbrowser control app,  (where I have setup both in the registry to specify feature_browser_emulation of 8000 decimal), browsing to www.enhanceie.com/ua.aspx shows a host UA header with MSIE 7.0, DOM UA props specify MSIE 8.0, and document mode of 7. So, the apps hosting WebOC appear to be getting compat view list support, but the DOM UA props don't look right. Browsing to sites not in the compatibility view list in the hosted web browser control, I see MSIE 8.0 for the UA and a document mode of 8 (again, with the feature_browser_emulation registry entry set to 8000 decimal)..

  • Anonymous
    March 05, 2013
    Eric, Could you please comment on a related issue about which I have just inquired at: social.msdn.microsoft.com/.../da94c031-1e13-4146-9923-ca2ff14f00ff  ? Thanks!