Share via


Equivalence class partitioning - Part 2: Character/String data decomposition

 Again, I am remiss in my postings...too many irons in the fire these days. Two weeks ago, I posted a challenge to decompose a set of character data (The ANSI Latin 1 Character Set) into valid and invalid equivalence class subsets in order to test the base filename parameter of a filename passed to COMDLG32.DLL on the Windows Xp platform from the user interface using the File Save As... dialog of Notepad.

As illustrated below the filename on a Windows platform is composed of two separate parameters. Although the file name parameter of the Save As... dialog will accept a base filename, a base filename with an extension, or a path with a filename with or without an extension, the purpose of the challenge was to decompose the limited set of characters into equivalence class subsets for the base filename component only (the part outlined with green). (Of course, complete testing will include testing with and without extensions, but let's first focus on building a foundation of tests to adequately evaluate the base filename parameter first, then we can expand our tests from there to include extensions.)

Windows Filename

As suggested in the earlier post, in order to adequately decompose this set of data within the defined, real world context (and not in alternate philosophical alternate universes) a professional tester would need to understand programming concepts, file naming conventions on a Windows platform, Windows Xp file system, basic default character encoding on the Windows Xp operating system (Unicode), some historical knowledge of the FAT file system, and even a bit of knowledge of the PC/AT architecture. The following is a table illustrating how I would decompose the data set into equivalence class subsets.

Input/OutputParameter Valid Class Subsets Invalid ClassSubsets
Filename

V1 – escape sequence literal strings      (STX, SOT, ETX, EOT, ENQ, ACK, BEL,       BS, HT, LF, VT, FF, CR, SO, SI, DLE,       DC1, DC2, DC3, DC4, NAK, SYN, ETB,       CAN, EM, SUB, ESC, FS, GS, RS, US,       DEL)

V2 – space character (0x20) (but not as        only, first, or last character in the        base file name)

V3 – period character (0x2E) (but not as        only character in the base file name)

V4 – ASCII characters        punctuation (0x21, 0x23 – 0x29, 0x2B –       0x2D, 0x3B, 0x3D, 0x40, 0x5B, 0x5D, -        0x60, 0x7B, 0x7D, 0x7E)       numbers (0x30 – 0x39)

       alpha (0x41 – 0x5A, 0x61 – 0x7A)

V5 – Ox80 through 0xFF

V6 – 0x81, 0x8D, 0x8F, 0x90, 0x9D

V7 – Component length between 1 – 251        characters (assuming a default 3-       letter extension and a maximum path        length of 260 characters)

V8 – Literal string CLOCK$ (NT 4.0 code        base)

V9 – a valid string with a reserved          character 0x22 as the first and          last character in the string

I1 – control codes      (Ctrl + @, Ctrl + B, Ctrl + C, Ctrl + ], Ctrl + N,      etc.)

I2 – escape sequence literal string NUL

I3 – Tab character

I4 – reserved words       (LPT1 – LPT4, COM1 – COM4, CON, PRN, AUX,       etc.)

I5 – reserved words        (LPT5 – LPT9, COM5 – COM9)

I6 – reserved characters (/ : < > | )       (0x2F, 0x3A, 0x3C, 0x3E, 0x7C) by       themselves or as part of a string of       characters

I7 – reserved character 0x22 as the only       character or > 2 characters in the string

I8 – a string composed of > 1 reserved character      0x5C

I9 – a string containing only 2 reserved        characters 0x22

I10 – period character (0x2E) as only        character in a string

 I11 – two period characters (0x2E) as only        characters in a string

I12 – > 2 period characters (0x2E) as only        characters in a string

I13 – reserved character 0x5C as the only          character in the string

I14 – space character (0x20) as only character in        a string

I15 – space character (0x20) as first character in         a string

I16  – space character (0x20) as last character in a         string

I17 – reserved characters (* ?)  (0x2A, 0x3F)

I18 – a string of valid characters that contains at least one reserved characters (* ?)  (0x2A, 0x3F)

I19 – a string of valid characters that contains at        least one reserved character 0x5C but not         in the first position

I20 string > 251 characters

I21 character case sensitivity

I22 empty

Discussion of valid equivalence class subsets

  • Valid subset V1 is composed of the literal strings for control characters (or escape sequences) between 0x01 and 0x1F, and including 0x7F. The literal strings for control characters may cause problems under various configurations or unique situations. The book How to Break Software: A Practical Guide to Testing goes into great detail explaining various fault models for these various character values. The literal strings in this subset should be tested as the base filename component and possibly in a separate test as an extension component. However, on the Windows platform the probability of one particular string in this subset behaving or being handled differently than any of the others is very low negating the need to test every string in this subset; although the overhead to test all would be minimal and once complete would not likely require repeated testing of all literal strings in this subset during a project cycle.
  • Valid subset V2 provides guidance on the use of the space character in valid filenames. On the Windows operating system a space character (0x20) is allowed in a base filename, but is not permitted as the only character as a file name. Typical behavior on the Windows platform also truncates the space character if it is used as the first character of a base filename or the last character of a base filename. However, if the extension is appended to the base filename in the Filename edit control on the Save or Save As… dialog a space character can be the last character in the base filename. Also note that a space character by itself or as the first character in a filename is acceptable on a UNIX based operating system. Also, although we can force the Windows platform to save a file name with only a space character by typing “ .txt” (including the quotes) in the Filename edit control on the Save/Save As… dialog this practice is not typical of reasonable Windows users’ expectations.
  • Valid subset V3 is the period character (0x2E) which is allowed in a base filename, but it is not a valid filename if it is the only character in the base filename (see Invalid subset for the period character).
  • Valid subset V4 is composed of ‘printable’ ASCII characters that are valid ASCII characters in a Windows filename. The subset includes punctuation characters, numeric characters, and alpha characters. We could also decompose this subset further into additional subsets including valid punctuation characters, numbers, upper case, and lower case characters if we wanted to ensure that we test at least one element from the superset at least once.
  • Valid subset V5 is the set of character code points between 0x80 and 0xFF.
  • Valid subset V6 is a superset of subset V5 and are separated only because they are code points that do not have character glyphs assigned to those code point values. These would be interesting especially if we needed to test filenames for backwards compatibility on Windows 9x platforms.
  • Valid subset V7 is the minimum and maximum component length assuming the filename is saved in the root directory (C:\). 
  • Valid subset V8 is a probably a red-herring. On the NT 4 platform the string CLOCK$ was a reserved word. On an older application first created for the Windows NT 4 platform that does not use the system Save/Save As dialog we might want to test this string just to make sure the developer did not hard code the string in an error handling routine.
  • Valid subset V9 is an interesting case because this invalid reserved character (0x22) is handled differently when used in first and last character positions of a base filename. When used in the first and last positions of a base filename the characters are truncated and if the remaining string is valid the filename is saved. If only one 0x22 character is used, or if two or more 0x22 characters are used in a string other than the first and last character positions the result will be an error message.

Discussion of invalid equivalence class subsets

  • Invalid subset I1 consists of the control code inputs for escape sequences in the range of 0x01 through 0x1F, and also includes 0x7F. Pressing the control key (CTRL) and any of the control codes keys will cause a system beep.
  • Invalid subset I2 is the literal string “nul”. Nul is a reserved word but could be processed differently than other reserved words on the Windows platform because it is also used in many coding languages as a character for string termination.
  • Invalid subset I3 is the tab character which can be copied and pasted into the Filename textbox control. Pasting a tab into the and pressing the save button will generate an error message.
  • The invalid subset I4 includes literal strings for reserved device names on the PC/AT machine and the Windows platform. Using any string in this subset result in an error message indicating the filename is a reserved device name.
  • Invalid subset I5 also includes reserved device names for LPT5 – LPT9 and COM5 – COM9. However these must be separated into a unique subset because using these specific device names as the base filename on the Windows Xp operating system result in an error message indicating the filename is invalid.
  • Invalid subsets I6, I7, and I8, include reserved characters on a Windows platform. When characters in this subset are used by themselves or in any position in a string of characters the result is an error message indicating the above file name is invalid.
  • Invalid subsets I9, I10, I13, also include reserved characters and the space and period characters. When these subsets are tested as defined no error message displayed and focus is restored to the File name control on the Save/Save As… dialog.
  • Invalid subsets I11, I12, also include the reserved character (0x2E) as 2 characters in the string and greater than 2 characters in a string. The state machine changes are different.
  • Invalid subsets I15 and I16 define the space character when used in the first or last character position of a string. These are placed in the invalid class because Windows normal behavior is to truncate a leading or trailing space character in a file name. If the leading or trailing space character was not truncated and saved as part of the file name on a Windows platform that would constitute a defect.
  • Invalid subset I17 and I18 contains two additional reserved characters; the asterisk and the question mark (0x2A and 0x3F respectively). If these characters are used by themselves or as a character in a string of other valid characters a file will not be saved, and no error message will occur. However, the state of the Save/Save As… dialog does change. If the default file type is .txt and there are text files displayed in the Folder View control on the Save As… dialog the files with the .txt extension will be removed after the Save button is depressed. If the default file type is All files then all files will be removed from the Folder View control on the Save As… dialog after the Save button is depressed.
  • Invalid subset I19 is a string of valid characters which contains at least backslash character except as the lead character in the string. (Of course, this assumes the string is random and the position of the backslash character in the string is not in a position which would resolve to a valid path.) The backslash character is a reserved character for use as a path delimiter in the file system. An error message will appear indicating the path is invalid.
  • Invalid subset I20 tests for extremely long base file name lengths of greater than 252 characters. Note that an interesting anomaly occurs with string lengths. A base file name string length which tests the boundaries of 252 or 253 valid characters will cause an error message to display indicating the file name is invalid. However, a base file name string length of 254 or 255 characters will actually get saved as file name but is not associated with any file type. Any base file name string longer than 255 characters again instantiates an error message.
  • Invalid subset I21 describes the tests for case sensitivity. The Windows platform does not consider character case of characters that have an upper case and a lower case representation. For example, a file name with a lower case Latin character ‘a’ is considered the same as a file name with the upper case Latin character ‘A’.
  • Invalid subset I22 is, of course, an empty string 

Of course, this is a partial list of the complete data set since the filename on a Windows Xp operating system can be any valid Unicode value of which there are several thousand character code points, including surrogate pair characters.

The first and by far the most complex step in the application of the functional technique of equivalence class partitioning is data decomposition. This requires an incredible amount of knowledge about the system. Data decomposition is an exercise in modeling data. The less one understands the data set, or the system under test the greater the probability of missing something. Next week we will analyze the equivalence class subsets to define are baseline set of tests to evaluate the base filename component.

Comments

  • Anonymous
    November 14, 2007
    Very Good

  • Anonymous
    November 30, 2007
    The comment has been removed

  • Anonymous
    February 28, 2008
    The comment has been removed

  • Anonymous
    February 29, 2008
    The comment has been removed

  • Anonymous
    March 02, 2008
    The comment has been removed

  • Anonymous
    March 03, 2008
    In my previous message I typoed 0x5C as 0xC5.  I wonder how I managed to do that twice.  Anyway all three of those should be 0x5C, which Windows interprets as a path separator when it occurs as a single-byte code point in any ANSI code page. The strange things that happen with U+00A5 still happen with U+00A5; I typed that one correctly.

  • Anonymous
    April 15, 2008
    Security Considerations for Character Sets in File Names Windows code page and OEM character sets used on Japanese-language systems contain the Yen symbol (¥) instead of a backslash (). Thus, the Yen character is a prohibited character for those file systems. When mapping Unicode to a Japanese-language code page, conversion functions map both backslash (U+005C) and the normal Unicode Yen symbol (U+00A5) to this same character. For security reasons, your applications should not typically allow the character U+00A5 in a Unicode string that might be converted for use as a FAT file name. The above is a quotation.  My observation is that security isn't the only reason to hesitate on whether to allow the character U+00A5.  As mentioned before, even in cases where security isn't (or shouldn't) be a problem, Windows sometimes has problems. Here is where that quotation came from: http://msdn2.microsoft.com/en-us/library/ms776406(VS.85).aspx