Remove hidden text from a word processing document

This topic shows how to use the classes in the Open XML SDK for Office to programmatically remove hidden text from a word processing document.


Structure of a WordProcessingML Document

The basic document structure of a WordProcessingML document consists of the document and body elements, followed by one or more block level elements such as p, which represents a paragraph. A paragraph contains one or more r elements. The r stands for run, which is a region of text with a common set of properties, such as formatting. A run contains one or more t elements. The t element contains a range of text. The following code example shows the WordprocessingML markup for a document that contains the text "Example text."

    <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
      <w:body>
        <w:p>
          <w:r>
            <w:t>Example text.</w:t>
          </w:r>
        </w:p>
      </w:body>
    </w:document>

Using the Open XML SDK, you can create document structure and content using strongly-typed classes that correspond to WordprocessingML elements. You will find these classes in the namespace. The following table lists the class names of the classes that correspond to the document, body, p, r, and t elements.

WordprocessingML Element Open XML SDK Class Description
<document/> Document The root element for the main document part.
<body/> Body The container for the block level structures such as paragraphs, tables, annotations and others specified in the ISO/IEC 29500 specification.
<p/> Paragraph A paragraph.
<r/> Run A run.
<t/> Text A range of text.

For more information about the overall structure of the parts and elements of a WordprocessingML document, see Structure of a WordprocessingML document.


Structure of the Vanish Element

The vanish element plays an important role in hiding the text in a Word file. The Hidden formatting property is a toggle property, which means that its behavior differs between using it within a style definition and using it as direct formatting. When used as part of a style definition, setting this property toggles its current state. Setting it to false (or an equivalent) results in keeping the current setting unchanged. However, when used as direct formatting, setting it to true or false sets the absolute state of the resulting property.

The following information from the ISO/IEC 29500 specification introduces the vanish element.

vanish (Hidden Text)

This element specifies whether the contents of this run shall be hidden from display at display time in a document. [Note: The setting should affect the normal display of text, but an application can have settings to force hidden text to be displayed. end note]

This formatting property is a toggle property (§17.7.3).

If this element is not present, the default value is to leave the formatting applied at previous level in the style hierarchy. If this element is never applied in the style hierarchy, then this text shall not be hidden when displayed in a document.

[Example: Consider a run of text which shall have the hidden text property turned on for the contents of the run. This constraint is specified using the following WordprocessingML:

    <w:rPr>
      <w:vanish />
    </w:rPr>

This run declares that the vanish property is set for the contents of this run, so the contents of this run will be hidden when the document contents are displayed. end example]

© ISO/IEC 29500: 2016

The following XML schema segment defines the contents of this element.

    <complexType name="CT_OnOff">
       <attribute name="val" type="ST_OnOff"/>
    </complexType>

The val property in the code above is a binary value that can be turned on or off. If given a value of on, 1, or true the property is turned on. If given the value off, 0, or false the property is turned off.

How the Code Works

The WDDeleteHiddenText method works with the document you specify and removes all of the run elements that are hidden and removes extra vanish elements. The code starts by opening the document, using the Open method and indicating that the document should be opened for read/write access (the final true parameter). Given the open document, the code uses the MainDocumentPart property to navigate to the main document, storing the reference in a variable.

using (WordprocessingDocument doc = WordprocessingDocument.Open(docName, true))
{

Get a List of Vanish Elements

The code first checks that doc.MainDocumentPart and doc.MainDocumentPart.Document.Body are not null and throws an exception if one is missing. Then uses the Descendants() passing it the Vanish type to get an IEnumerable of the Vanish elements and casts them to a list.

if (doc.MainDocumentPart is null || doc.MainDocumentPart.Document.Body is null)
{
    throw new ArgumentNullException("MainDocumentPart and/or Body is null.");
}

// Get a list of all the Vanish elements
List<Vanish> vanishes = doc.MainDocumentPart.Document.Body.Descendants<Vanish>().ToList();

Remove Runs with Hidden Text and Extra Vanish Elements

To remove the hidden text we next loop over the List of Vanish elements. The Vanish element is a child of the RunProperties but RunProperties can be a child of a Run or xref:DocumentFormat.OpenXml.Wordprocessing.ParagraphProperties>, so we get the parent and grandparent of each Vanish and check its type. Then if the grandparent is a Run we remove that run and if not we we remove the Vanish child elements from the parent.

// Loop over the list of Vanish elements
foreach (Vanish vanish in vanishes)
{
    var parent = vanish?.Parent;
    var grandparent = parent?.Parent;

    // If the grandparent is a Run remove it
    if (grandparent is Run)
    {
        grandparent.Remove();
    }
    // If it's not a run remove the Vanish
    else if (parent is not null)
    {
        parent.RemoveAllChildren<Vanish>();
    }
}

Sample Code

Note

This example assumes that the file being opened contains some hidden text. In order to hide part of the file text, select it, and click CTRL+D to show the Font dialog box. Select the Hidden box and click OK.

Following is the complete sample code in both C# and Visual Basic.

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System;
using System.Collections.Generic;
using System.Linq;


static void WDDeleteHiddenText(string docName)
{
    // Given a document name, delete all the hidden text.

    using (WordprocessingDocument doc = WordprocessingDocument.Open(docName, true))
    {

        if (doc.MainDocumentPart is null || doc.MainDocumentPart.Document.Body is null)
        {
            throw new ArgumentNullException("MainDocumentPart and/or Body is null.");
        }

        // Get a list of all the Vanish elements
        List<Vanish> vanishes = doc.MainDocumentPart.Document.Body.Descendants<Vanish>().ToList();

        // Loop over the list of Vanish elements
        foreach (Vanish vanish in vanishes)
        {
            var parent = vanish?.Parent;
            var grandparent = parent?.Parent;

            // If the grandparent is a Run remove it
            if (grandparent is Run)
            {
                grandparent.Remove();
            }
            // If it's not a run remove the Vanish
            else if (parent is not null)
            {
                parent.RemoveAllChildren<Vanish>();
            }
        }
    }
}

See also