Retrieve the Numbered Bullet Number from Word Document Table Cell with OpenXml

Solanke, Sunil (Pune) 0 Reputation points
2024-08-06T11:48:38.8+00:00

We are working on an application which reads the MS Word document template having information entered in a specific table. The first column of the table holds the Steps number in numbered bullet list format. We have tried many alternatives to read the cell value, but it always gives the empty string. We are able to retrieve the cell value using MS Word Office Interop, but we want the solution with Open XML. Any help with sample code to achieve this is much appreciated.

We have tried to find the abstractnumId and levels but could not find the cell value.

Office Open Specifications
Office Open Specifications
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Open Specifications: Technical documents for protocols, computer languages, standards support, and data portability. The goal with Open Specifications is to help developers open new opportunities to interoperate with Windows, SQL, Office, and SharePoint.
140 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Mike Bowen 1,791 Reputation points Microsoft Employee
    2024-08-07T23:53:16.89+00:00

    Hi Solanke, Sunil (Pune),

    Unfortunately, the cell value when using a numbered list is not stored in the xml, but calculated by Word when the file is opened, so there is no cell that directly contains its numbering number.

    Documentation for Office Open XML file formats such as docx are in ISO/IEC 29500-1:2016(E). The Numbering Part (11.3.11 Numbering Definitions Part) contains a definition for the structure of each unique numbering definition in a docx document.

    In the document.xml you sent the first numbering cell has this Paragraph (<w:p/>) whose Paragraph Properties (<w:pPr/>) contains a <w:numPr /> element. The XML markup for a list usage involves a reference to a numbering definition via the child elements of the numPr element.

    <w:p w14:paraId="3163B08A" w14:textId="390B0962" w:rsidR="0089403A" w:rsidRPr="00AC1015"
      w:rsidRDefault="0089403A" w:rsidP="00C52ED8">
      <w:pPr>
        <w:pStyle w:val="CellBodyGrid" />
        <w:numPr>
          <w:ilvl w:val="0" />
          <w:numId w:val="50" />
        </w:numPr>
        <w:ind w:left="342" />
      </w:pPr>
    </w:p>
    

    From this xml we can see <w:ilvl w:val="0" /> and <w:numId w:val="50" />, which are the values needed to look up the numbering style in numbering.xml.

    From section 17.9.3 ilvl (Numbering Level Reference) "This element specifies the numbering level of the numbering definition instance which shall be applied to the parent paragraph."

    From section 17.9.18 numId (Numbering Definition Instance Reference) "This element specifies the numbering definition instance which shall be used for the given parent numbered paragraph in the WordprocessingML document."

    So, to find the number value in each cell you will have to calculate based on examining the table and calculating the cell's position then using the w:ilv and w:numId elements' values to determine the numbering style from the Numbering part.

    I hope this helped clear up what's going on here.

    Best regards,

    Michael Bowen

    Microsoft Open Specifications


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.