XML's overhead will open wallets?
There's a new article on the overhead that XML creates on networks, and what can be done about it:"Eyes, wallets wide open to XML traffic woes in 2005" This is a topic near and dear to my heart: I've been involved in long-running threads on the xml-dev mailing list on this, and gave a paper / presentation on this at the XML 2004 conference. Let's look at the points raised in the serarchwebservices article in some detail: I think it addresses a real challenge that people are having with XML, but it paints a somewhat misleading picture of the alternative solutions.
First, the article begins: "Enterprise affection for XML Web services may have C-level hearts fluttering over the immediate efficiency and productivity gains, but the other shoe is about to drop in this relationship." The obvious rejoinder is that in most organizations, human efficiency and productivity gains add vastly more to the bottom line than savings on hardware and wired network bandwidth, which gets cheaper and cheaper all the time.
Next, it is true that many people are starting to "realize en masse how taxing XML is on enterprise networks". This is true, but only in a couple of fairly specific scenarios. As I put it in the XML 2004 paper:
It is quite clear from surveying the research in this area that XML really does impose a significant overhead on a significant set of real-world applications, especially those in enterprise-class transaction processing environments and those involving wireless communication. In both scenarios it is clear that developers, vendors, and customers desire the benefits of standards-based portability and interoperability, but are unable to use XML in its current form.
Furthermore, currently deployed technological fixes do not alleviate this pain for these two classes of users. As for reducing size, conventional text compression algorithms do not work at all on the short messages with little redundant text that are common in web services applications and preferred by wireless developers. Likewise, the studies noted above generally show that the processing of of these algorithms often negates any perceived performance benefit from reducing the amount of bandwidth needed to send message. Furthermore, "throwing hardware at the problem" is not a viable option for battery powered mobile devices with intrinsically limited bandwidth and where every extra CPU cycle drains the battery all the sooner.
Let's be clear, however -- this refers to a relatively small number of use cases in which XML could be valuable, but its size or processing overhead stands in the way of its widespread use today. The article says "Users and experts expect 2005 to be the year companies realize en masse how taxing XML is on enterprise networks, sparking a spending spree on XML acceleration products and optimized appliances that offload this burden." Time will tell, of course, but I would find these predictions more credible if the article itself didn't have some factual errors.
For example, the author asserts that "standards bodies like the World Wide Web Consortium (W3C) work in the shadows on the ratification of a single binary XML standard that could bring an about-face to the commitment companies have to the ASCII text encoding that is currently the foundation of XML 1.0." This is not even close to being true. Besides the technical point that XML 1.x defines a Unicode text, not ASCII encoding :-) the real objective of the W3C XML Binary Characterization Working Group is:
gathering information about uses cases where the overhead of generating, parsing, transmitting, storing, or accessing XML-based data may be deemed too great for a particular application, characterizing the properties that XML provides as well as those that are required by the use cases, and establishing objective, shared measurements to help judge whether XML 1.x and alternate (binary) encodings provide the required properties.
The W3C is explicitly not ratifying a single binary XML standard, it is investigating whether that is even worth attempting. The early indication seems to be that while specialized, proprietary binary formats are widespread across the XML industry, finding a generalized standard binary format will be somewhere between politically difficult and technically impossible.
Finally, the article quotes James Kobielus of the Burton Group
Network managers are going to implement these XML acceleration appliances to offload the overhead of XML processing from application servers so [the app servers] can focus on their core competency, which is business logic.
I'm highly skeptical of this, although I am intrigued by the capabilities of the XML acceleration appliances. First, as it stands now, the acceleration appliances can only be used by network managers to offload processing of standalone operations such as XSLT transformations or processing WS-Security SOAP headers. Using them to offload the time consuming aspects of XML processing from general-purpose hardware requires more involvement and investment from the industry as a whole. Examples would include software products that detect the presence of XML acceleration hardware and use APIs that exploit it, and standards for efficiently exchanging parsed XML Infosets across hardware components in a distributed system.
Where does that leave us? As I see it (and stealing from my XML 2004 presentation):
- We have to deal with the reality that XML really requires too much bandwidth for many wireless scenarios, and requires far more processing resources than equivalent formats in transaction processing scenarios. Moore's Law won't make them go away, because it doesn't apply to wireless bandwidth or batteries. The bare facts are not really in dispute, what is in dispute is how to reduce the costs without destroying the benefits of XML. There are numerous alternatives, including XML-specific compression algorithms and improved XML text parsers, that are being researched that would not require end-user eyes or wallets to be opened.
- No known alternative offers anything resembling a silver bullet. There are probably plenty of alternative serializations of the XML Infoset that would be both smaller to transmit and faster to parse than XML text, but whether they offer enough value to justify putting them into the XML core is not at all clear. Likewise it is clear that dedicated hardware components can parse XML more efficiently than conventional parsers, but it is much less clear whether this translates into more cost-effective systems.
- As with all software optimization, the first thing to do is to determine where the bottlenecks are and figure out how to address them. Many of the "XML is bloated and slow" complaints I hear could be alleviated by being more clever about how technology is used: "Doctor, Doctor it costs me lots of money per megabyte of XML I download to my mobile device." Uhh, get a better mobile data plan? Or, "Doctor, doctor, it hurts when I try to process a 1-MB file to find the two attribute values I need"! Uhh, so don't DO that. Don't use expensive validation unless you really get value from it, restructure the XML so that a pull parser or SAX can find what you need quickly, use the right tool for the job, whatever it takes.
- Use enterprise-class tools to do the heavy lifting: Leverage the support for XML in database products such as SQL Server 2005 to pull out small chunks of relevant XML rather than forcing the parser to do that job. Use the fastest XML technology available, even if it costs money.
- Accept that premature standardization is the problem, not the solution. It is probably best to let individual industries such as wireless figure out serializations that meet their needs and then come to more global organizations such as W3C for standardization. It may be that experimentation and evolution brings us to a single, optimal serialization format toward which we can all migrate, but it is a very good bet that design-by-committee and consortium politics will not. Yes, there will be a period of confusion and inefficiency as developers have to support multiple formats for different user bases, and it will probably be obvious what we should have done in 20:20 hindsight. But so long as the alternative formats are relatively simple, it should be no more difficult to handle diversity than it is to handle the multiple graphics formats that are in widespread use -- even mobile devices typically support JPEG, GIF, and PNG.
Comments
- Anonymous
December 28, 2004
For me, binary xml is way overdue. Im working on real-time financial applications, and have found that serialization and deserialization of xml messages is a very significant overhead.
I know that xml is text, and therefore 'open', but I dont believe that would be a problem if binary xml was standardised. As noted in the text of the paper, most "view source" commands are actually displaying a serialised version of their internal data structures. A standardised binary xml would have a plethora of "view source" tools available for it. - Anonymous
December 28, 2004
What about measuring tools? (http://www.arstdesign.com/articles/xmloptimization.html) - Anonymous
December 28, 2004
By the way, I think those pointing Xml overhead are very right when it comes to using taxonomies rather than simple xml.
As a corollary, what about a "simple xml" proposal (without entities, and I'll go as far as pointing without support for attributes either) ? - Anonymous
December 28, 2004
Stepahnie Rodriguez: Thanks for the XML Optimization link! That is the kind of "work smarter" thing I was talking about.
damien morton: I agree; the paper http://www2003.org/cdrom/papers/alternate/P872/p872-kohlhoff.html I cited in the XML 2004 stuff opened my eyes on that. The question is whether a standard binary XML format that meets the needs of the financial industry will also meet the needs of others, e.g. the wireless folks. That's an open question, but from what I've seen, I'm not terribly optimistic. I hope I'm unduly pessimistic! - Anonymous
December 28, 2004
The comment has been removed - Anonymous
December 28, 2004
- 1, Mike
On a slightly different perspective and from a designing front, I have been advocating against the use of XML in all layers of a enterprise application esp. when tightly bound object technology is much more desired. In my presentations on SO(A), I have always preach on using service-messaging as communication b/w applications, NOT between tiers of an application.
However, many businesses are using XML Services as a communication mechanism JUST SO they can be seen as implementing an SO(A)... and of course, all for the wrong reasons.
Hence, many of them complain when performance suffers. What do they expect when they are making verbose calls to their own Data Access Layer via SOAP ?
Anonymous
December 28, 2004
Both the links to your paper and presentations are broken. Could you please share the latest links ? Thanks.Anonymous
December 29, 2004
Jack: Thanks! There was a problem in the PPT link in that it had some escaped spaces at the end. That's fixed. The other link worked ... It's possible that these are passoword protected and I just have a cookie set, but I've checked this on a couple of machines and a couple of browsers.Anonymous
December 31, 2004
The comment has been removedAnonymous
May 29, 2009
PingBack from http://paidsurveyshub.info/story.php?title=mikechampion-s-weblog-xml-s-overhead-will-open-walletsAnonymous
June 02, 2009
PingBack from http://uniformstores.info/story.php?id=44251