Why (good) Xml is much better than plain text

There are many reasons, sure, and probably there are also reasons why plain text files can be better, but I would like to remark just only one reason, just because I fighting with it right now:

Xml is human readable

Or at least, it should be.

I’m dealing with the HL7 standard for healthcare. HL7 files are text files with some strange delimiters such ^ and |. Luckily we can use the BizTalk HL7 Accelerator, that allow us to abstract from the HL7 details.

A sample of an HL7 file:

MSH|^~\&|REG|MCM|BTS||199601121005||ADT^A04|000001|P|2.2
EVN|A04|199601121005||01||199601121000
PID|||191919^^^MYHOS^MR~123-45-6789^^^USSSA^SS|253763|SMITH^JOHN^Q||19560129|M|||123MAIN^^BUFFALO^NY^98052^""||(123)555-0100||S|M|10199925^^^MYHOS^AN|123-45-6789
PD1|S|F|NormalString^A^+1^-1^ISO^simpletext&Test&HCD^GI^simpletext&NormalString&ISO^I|NormalString^Test&Test^Test^Test

^Test^Test^AE^simpletext^simpletext&Test&ISO
^P^NormalString^M10^MC^simpletext&NormalString&HCD^A|N|simpletext|I|I|N|NormalString^+1^M11^

simpletext&NormalString&L,M,N^RRI^simpletext&
NormalString&HCD|NOVALUE^NormalString^Test^Test^NormalString^Test|N
PV1|1|I|2000^2012^01^hey&test&DNS^test^test^test^test^test||||004777^MILLER^CONNIE^A.|||SUR||||2|A0

Where is the Patient Name? is “the substring between the fifth and the sixth | (pipe), in the third line (the line starting with PID). And remember, spaces are represented as ^(strange little hat)

The HL7 Accelerator comes with Xsd schemas to map these flat files. A sample message type ADT A04 (the above) looks something like this (just a small piece):

<ns0:ADT_A04_22_GLO_DEF xmlns:ns0="https://microsoft.com/HealthCare/HL7/2X">
<EVN_EventType>
<EVN.1_EventTypeCode>A04</EVN.1_EventTypeCode>
<EVN.2_DateTimeOfEvent>199601121005</EVN.2_DateTimeOfEvent>
<EVN.3_DateTimePlannedEvent>199601121000</EVN.3_DateTimePlannedEvent>
<EVN.4_EventReasonCode>01</EVN.4_EventReasonCode>
</EVN_EventType>
<PID_PatientIdentification>
<PID.1_SetIdPatientId>191919</PID.1_SetIdPatientId>
<PID.2_PatientIdExternalId>
<PID.5_PatientName>
<PN.0_FamiliyName>Doe</PN.0_FamiliyName>
<PN.1_GivenName>John</PN.1_GivenName>
</PID.5_PatientName>
[…]

we still deal with HL7 codes and semantic structure, but it’s much easier to work the Patient Name. It's located in “the FamilyName element under PatientIdentification” :-)

Comments

  • Anonymous
    June 13, 2005
    The comment has been removed
  • Anonymous
    June 13, 2005
    XML is the-facto a standard in cross-platform data transfering.
    And now, that al major companies set down in one room and got into agreement, let's take advantage of this opportunity and make newer version with binary format. Anyway xml is targeting machines and not the human eye. Binary will make it faster and lighter!
  • Anonymous
    June 13, 2005
    While XML does add a large amount of overhead to the file, this is totally irrelevant if the file is compressed and sent over the wire as text compresses 11 to 1 or better.

    ex: that 5.5meg XML file (which was 2.5megs as HL7) zips down to 500k.

    There are many articles about zipping webservices and using .zip in .Net
  • Anonymous
    June 13, 2005
    I also forgot to mention that since HIPAA is now in place, I would greatly frown upon sending a file as plain text over the wire, even if the connection is encrypted.

    I see compression as a form of weak encryption and it blocks the most basic forms of sniffing.
  • Anonymous
    June 14, 2005
    actualy it would be interesting to compare performance of compressing data + encrypting dictionary vs 3des.
  • Anonymous
    June 14, 2005
    Lots of comments! I just wanted to talk about ONE reason pro-Xml-against-Txt. But there are reasons in both directions. The worst thing about Xml is the space that it takes, of course.
    <FamilyName>Smith</FamilyName> takes about 30 characters just to say Smith! So if you need to save space or bandwith, Xml is not good.

    I think that Xml (or any tagged format) was not popular in the past late 80's and early 90's just because of the space and bandwith. Now, these things are not an issue (usually).
  • Anonymous
    August 19, 2005
    XML is great. HL7 V3 uses XML BUT it's very poorly done. You should have a look. You'll soon LOVE pipes and carets.
  • Anonymous
    May 22, 2009
    PingBack from http://hoot72.wordpress.com/2009/05/22/get-into-xml-quick/