Improvements in Password Hashing

There were a number of comments raised by national bodies about the password hashing (encryption) support defined in Open XML. There were some extra hurdles that we had initially put to ensure that we could upgrade an existing binary documents properly. On further discussion in TC45 though, we decided that this extra step should only be necessary with a legacy document where the password hasn't already been entered. For any new document, or any time the password has been verified, we'll now use a much cleaner approach.

The reason we have a different approach for legacy documents is pretty straightforward. These aren't passwords to secure the file, they are merely passwords that change what can happen within the application (ie change the editability of portions of the document). The problem is that since XML is just plain text, you don't want to store someone's password without first hashing it. Even though the document isn't secure, the password itself may be something the user has used in other situations, and you don't want it to be discoverable. So while it's possible for an application to open the file without the password (and save it into a different format), it's not possible to read the password. You instead need the user to enter the password, and then you compare the hashes. So, if the user doesn't enter the password, and saves the document into the new format, we need to keep the password hashed using the old method, and then we apply the new standards based hashing on top of that.

This added complexity wasn't necessary for all other scenarios though, which is why we decided to change the approach, and TC45 instead simplified it so that only the standards based hashing was used. Here’s some of TC45's response:

Agreed; several national bodies provided valid concerns that the existing legacy hashing mechanisms were insufficient within the context of the specification, and we believe that these concerns warrant the replacement of the existing mechanisms with a new mechanism which:

· Takes as input the UTF-16 encoding of the input string (and therefore does not have problems with characters above the Latin-1 Unicode subrange)

· Feeds them directly to hash algorithms which have been standardized after appropriate cryptographic review

· Stores the resulting hash value, salt, and spin count within the Office Open XML document

This change will be made in each instance where a password is stored throughout WordprocessingML, SpreadsheetML, and PresentationML.

However, in order to ensure that existing binary documents which contain the “legacy” hash can be correctly interpreted and migrated across multiple vendors/platforms, we will remove these mechanisms from their current location in the specification, and place them into a new annex for deprecated features. 

Here's what the updated documentation will look like:

2.15.1.28 documentProtection (Document Editing Restrictions)

This element specifies the set of document protection restrictions which have been applied to the contents of a WordprocessingML document. These restrictions shall be enforced by applications editing this document when the enforcement attribute is turned on, and should be ignored (but persisted) otherwise. Document protection is a set of restrictions used to prevent unintentional changes to all or part of a WordprocessingML document - since this protection does not encrypt the document, malicious applications may circumvent its use. This protection is not intended as a security feature and may be ignored.

If this element is omitted, then no protection shall be applied to this document. When a password is to be hashed and stored in this element, it shall be hashed as defined below, starting from a UTF-16 encoded string value.

Parent Elements

settings (§2.15.1.78)

Attributes

Description

algorithmName (Cryptographic Algorithm Name)

Specifies the specific cryptographic hashing algorithm which shall be used along with the salt attribute and input password in order to compute the hash value.

The following values are reserved:

Value

Algorithm

MD2

Specifies that the MD2 algorithm, as defined by RFC 1319, shall be used.

[Note: It is recommended that applications should avoid using this algorithm to store new hash values, due to publically known breaks. end note]

MD4

Specifies that the MD4 algorithm, as defined by RFC 1320, shall be used.

[Note: It is recommended that applications should avoid using this algorithm to store new hash values, due to publically known breaks. end note]

MD5

Specifies that the MD5 algorithm, as defined by RFC 1321, shall be used.

[Note: It is recommended that applications should avoid using this algorithm to store new hash values, due to publically known breaks. end note]

RIPEMD-128

Specifies that the RIPEMD-128 algorithm, as defined by ISO/IEC 10118-3:2004 shall be used.

[Note: It is recommended that applications should avoid using this algorithm to store new hash values, due to publically known breaks. end note]

RIPEMD-160

Specifies that the RIPEMD-160 algorithm, as defined by ISO/IEC 10118-3:2004 shall be used.

SHA-1

Specifies that the SHA-1 algorithm, as defined by ISO/IEC 10118-3:2004 shall be used.

SHA-256

Specifies that the SHA-256 algorithm, as defined by ISO/IEC 10118-3:2004 shall be used.

SHA-384

Specifies that the SHA-384 algorithm, as defined by ISO/IEC 10118-3:2004 shall be used.

SHA-512

Specifies that the SHA-512 algorithm, as defined by ISO/IEC 10118-3:2004 shall be used.

WHIRLPOOL

Specifies that the WHIRLPOOL algorithm, as defined by ISO/IEC 10118-3:2004 shall be used.

[Example: Consider an Office Open XML document with the following information stored in one of its protection elements:

< … algorithmName="SHA-1" hashValue="9oN7nWkCAyEZib1RomSJTjmPpCY=" />

The algorithmName attribute value of “SHA-1” specifies that the SHA-1 hashing algorithm shall be used to generate a hash from the user-defined password. end example]

The possible values for this attribute are defined by the ST_String simple type (§2.18.19).

edit (Document Editing Restrictions)

Specifies the set of editing restrictions which shall be enforced on a given WordprocessingML document, as defined by the simple type referenced below

If this attribute is omitted, the consumer shall behave as though there are no editing restrictions applied to this document; equivalent to an attribute value of none.

[Example: Consider a WordprocessingML document that contains the following WordprocessingML specifying that hosting applications shall enforce read-only protection for a given document:

<w:documentProtection w:edit="readOnly" w:enforcement="1" />

The edit attribute has a value of readOnly and a enforcement attribute with a value of 1, specifying that read-only document protection shall be enforced on the given document. end example]

The possible values for this attribute are defined by the ST_DocProtect simple type (§2.18.22).

enforcement (Enforce Document Protection Settings)

Specifies if the document protection settings shall be enforced for a given WordprocessingML document. If the value of this element is off, 0, or false, all the WordprocessingML pertaining to document protection is still preserved in the document, but is not enforced. If the value of this element is on, 1, or true, the document protection is enforced.

If this attribute is omitted, then document protection settings shall not be enforced by applications.

[Example: Consider a WordprocessingML document that contains the following WordprocessingML specifying that hosting applications shall apply read-only protection for a given document:

<w:documentProtection w:edit="readOnly" w:enforcement="1" />

The enforcement attribute has a value of 1, specifying that the document protection specified shall be enforced on the given document. end example]

The possible values for this attribute are defined by the ST_OnOff simple type (§2.18.67).

formatting (Only Allow Formatting With Unlocked Styles)

Specifies if formatting restrictions are in effect for a given WordprocessingML document. This enables the document to restrict the types of styles that may exist in a given WordprocessingML document. Specifically, by setting this attribute's value equal to true, every style whose locked element (§2.7.3.7) has a value of true (or latent styles (§2.7.3.5) whose locked attribute is true) shall not be available for use in the application, nor should any direct formatting. Only styles with a locked value of false may be used.

If this attribute is omitted, then no formatting restrictions shall be applied, even when document protection is enforced.

[Example: Consider a WordprocessingML document that shall apply formatting protection. This requirement would be specified using the following WordprocessingML in the document settings:

<w:documentProtection w:formatting="true" w:enforcement="true" />

If the following definition for a style was also present in the document:

<w:style w:type="paragraph" w:styleId="Heading1">

  <w:name w:val="heading 1" />

  <w:locked="1" />

  …

</w:style>

The formatting attribute has a value of true specifying that the applications shall not allow the style above to be added to the WordprocessingML document. This does not preclude previous uses of that style (which shall not be removed), but does prevent new uses of this style from being added. end example]

The possible values for this attribute are defined by the ST_OnOff simple type (§2.18.67).

hashValue (Password Hash Value)

Specifies the hash value for the password stored with this document. This value shall be compared with the resulting hash value after hashing the user-supplied password using the algorithm specified by the preceding attributes and parent XML element, and if the two values match, the protection shall no longer be enforced.

If this value is omitted, then the reservationPassword attribute shall contain the password hash for the workbook.

 

[Example: Consider an Office Open XML document with the following information stored in one of its protection elements:

<… AlgorithmName="SHA-1" hashValue="9oN7nWkCAyEZib1RomSJTjmPpCY=" />

The hashValue attribute value of 9oN7nWkCAyEZib1RomSJTjmPpCY= specifies that the user-supplied password shall be hashed using the pre-processing defined by the parent element (if any) followed by the SHA-1 algorithm (specified via the algorithmName attribute value of SHA-1) and that the resulting has value must be 9oN7nWkCAyEZib1RomSJTjmPpCY= for the protection to be disabled. end example]

The possible values for this attribute are defined by the XML Schema base64Binary datatype.

saltValue (Salt Value for Password Verifier)

Specifies the salt which was prepended to the user-supplied password before it was hashed using the hashing algorithm defined by the preceding attribute values to generate the hashValue attribute, and which shall also be prepended to the user-supplied password before attempting to generate a hash value for comparison. A salt is a random string which is added to a user-supplied password before it is hashed in order to prevent a malicious party from pre-calculating all possible password/hash combinations and simply using those pre-calculated values (often referred to as a "dictionary attack").

If this attribute is omitted, then no salt shall be prepended to the user-supplied password before it is hashed for comparison with the stored hash value.

[Example: Consider an Office Open XML document with the following information stored in one of its protection elements:

<… saltValue="ZUdHa+D8F/OAKP3I7ssUnQ==" hashValue="9oN7nWkCAyEZib1RomSJTjmPpCY=" />

The saltValue attribute value of ZUdHa+D8F/OAKP3I7ssUnQ== specifies that the user-supplied password shall have this value prepended before it is run through the specified hashing algorithm to generate a resulting hash value for comparison. end example]

The possible values for this attribute are defined by the XML Schema base64Binary datatype.

spinCount (Iterations to Run Hashing Algorithm)

Specifies the number of times the hashing function shall be iteratively run (using each iteration's result as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute.

[Rationale: Running the algorithm many times increases the cost of exhaustive search attacks correspondingly. Storing this value allows for the number of iterations to be increased over time to accommodate faster hardware (and hence the ability to run more iterations in less time). end rationale]

[Example: Consider an Office Open XML document with the following information stored in one of its protection elements:

<… spinCount="100000" hashValue="9oN7nWkCAyEZib1RomSJTjmPpCY=" />

The spinCount attribute value of 100000 specifies that the hashing function shall be run one hundred thousand times to generate a hash value for comparison with the hashValue attribute. end example]

The possible values for this attribute are defined by the ST_DecimalNumber simple type (§2.18.16).

The following XML Schema fragment defines the contents of this element:

<complexType name="CT_DocProtect">

   <attribute name="algorithmName" type="ST_String" use="optional"/>

   <attribute name="edit" type="ST_DocProtect" use="optional"/>

   <attribute name="formatting" type="ST_OnOff" use="optional"/>

   <attribute name="enforcement" type="ST_OnOff"/>

   <attribute name="hashValue" type="xsd:base64binary" use="optional"/>

   <attribute name="saltValue" type="xsd:base64binary" use="optional"/>

   <attribute name="spinCount" type="ST_DecimalNumber" use="optional"/>

   <attributeGroup ref="AG_Password"/>

</complexType>

Comments

  • Anonymous
    January 24, 2008
    The comment has been removed

  • Anonymous
    January 24, 2008
    I initially had the exact same impression as Jirka Kosek above. There are three variations of UTF-16: BE, LE, and unmarked. Unmarked, as in this case, is actually an alias for BE. Windows, however, is primarily a LE environment, so I would imagine that Microsoft would desire UTF-16 LE. Also, please specify the behavior of a BOM. While it be ignored or will it be used to determine BE vs. LE? And if so, how exactly will it be used to determine this? More info: http://www.unicode.org/faq/utf_bom.html#36 Further, I think you should specify the valid range of Unicode characters. Unicode by definition has a range of U+0000 through U+10FFFF. However, older implementations often do not take surrogates into consideration and end up limiting valid characters at U+FFFF. Specifically the range, while strictly not necessary, would be useful in ensuring full Unicode compliance.

  • Anonymous
    January 25, 2008
    The comment has been removed

  • Anonymous
    January 25, 2008
    TechDays 2008. TechDays 2008 is coming soon to Ghent, and Gill Cleeren has the details. I enjoyed presenting

  • Anonymous
    January 25, 2008
    hAl, I haven't heard of technical errors yet, but if folks find them we can obviously fix them easily enough. In this case, good catch guys, we should have provided a bit more info. It should read UTF-16LE with no byte order mark. Thanks for catching that. -Brian

  • Anonymous
    January 25, 2008
    TechDays 2008. TechDays 2008 is coming soon to Ghent, and Gill Cleeren has the details. I enjoyed presenting

  • Anonymous
    January 25, 2008
    Brian - So right off the bat you post a change and you get corrections from the community that you can use to fix the spec.  That tells me that maybe you should post more of these sorts of things.

  • Anonymous
    January 25, 2008
    That's the plan bill :-)

  • Anonymous
    January 25, 2008
    Jirka: if a field MUST be UTF-16, there actually must NOT be a BOM - it'd be just considered a deprecated encoding of a zero-width space. The BOM is for ambiguous cases like flat binary files/blobs. Little endian is pretty much a given and should be a spec-wide requirement

  • Anonymous
    January 26, 2008
    The comment has been removed

  • Anonymous
    January 26, 2008
    The Open XML specification is one of the most scrutinized specs ever to go through a standards process,

  • Anonymous
    January 30, 2008
    The comment has been removed