2.4.2 Determining Paragraph Boundaries

Article
11/15/2022

This section specifies how to find the beginning and end character positions of the paragraph that contains a given character position. The character at the end character position of a paragraph MUST be a paragraph mark, an end-of-section character, a cell mark, or a TTP mark (See Overview of Tables). Negative character positions are not valid.

To find the character position of the first character in the paragraph that contains a given character position cp:

Follow the algorithm from Retrieving Text up to and including step 3 to find i. Also remember the FibRgFcLcb97 and PlcPcd found in step 1 of Retrieving Text. If the algorithm from Retrieving Text specifies that cp is invalid, leave the algorithm.
Let pcd be PlcPcd.aPcd[i].
Let fcPcd be Pcd.fc.fc. Let fc be fcPcd + 2(cp – PlcPcd.aCp[i]). If Pcd.fc.fCompressed is one, set fc to fc / 2, and set fcPcd to fcPcd/2.
Read a PlcBtePapx at offset FibRgFcLcb97.fcPlcfBtePapx in the Table Stream, and of size FibRgFcLcb97.lcbPlcfBtePapx. Let fcLast be the last element of plcbtePapx.aFc. If fcLast is less than or equal to fc, examine fcPcd. If fcLast is less than fcPcd, go to step 8. Otherwise, set fc to fcLast. If Pcd.fc.fCompressed is one, set fcLast to fcLast / 2. Set fcFirst to fcLast and go to step 7.
Find the largest j such that plcbtePapx.aFc[j] ≤ fc. Read a PapxFkp at offset aPnBtePapx[j].pn *512 in the WordDocument Stream.
Find the largest k such that PapxFkp.rgfc[k] ≤ fc. If the last element of PapxFkp.rgfc is less than or equal to fc, then cp is outside the range of character positions in this document, and is not valid. Let fcFirst be PapxFkp.rgfc[k].
If fcFirst is greater than fcPcd, then let dfc be (fcFirst – fcPcd). If Pcd.fc.fCompressed is zero, then set dfc to dfc / 2. The first character of the paragraph is at character position PlcPcd.aCp[i] + dfc. Leave the algorithm.
If PlcPcd.aCp[i] is 0, then the first character of the paragraph is at character position 0. Leave the algorithm.
Set cp to PlcPcd.aCp[i]. Set i to i - 1. Go to step 2.

To find the character position of the last character in the paragraph that contains a given character position cp:

Follow the algorithm from Retrieving Text up to and including step 3 to find i. Also remember the FibRgFcLcb97, and PlcPcd found in step 1 of Retrieving Text. If the algorithm from Retrieving Text specifies that cp is invalid, leave the algorithm.
Let pcd be PlcPcd.aPcd[i].
Let fcPcd be Pcd.fc.fc. Let fc be fcPcd + 2(cp – PlcPcd.aCp[i]). Let fcMac be fcPcd + 2(PlcPcd.aCp[i+1] - PlcPcd.aCp[i]). If Pcd.fc.fCompressed is one, set fc to fc/2, set fcPcd to fcPcd /2 and set fcMac to fcMac/2.
Read a PlcBtePapx at offset FibRgFcLcb97.fcPlcfBtePapx in the Table Stream, and of size FibRgFcLcb97.lcbPlcfBtePapx. Then find the largest j such that plcbtePapx.aFc[j] ≤ fc. If the last element of plcbtePapx.aFc is less than or equal to fc, then go to step 7. Read a PapxFkp at offset aPnBtePapx[j].pn *512 in the WordDocument Stream.
Find largest k such that PapxFkp.rgfc[k] ≤ fc. If the last element of PapxFkp.rgfc is less than or equal to fc, then cp is outside the range of character positions in this document, and is not valid. Let fcLim be PapxFkp.rgfc[k+1].
If fcLim ≤ fcMac, then let dfc be (fcLim – fcPcd). If Pcd.fc.fCompressed is zero, then set dfc to dfc / 2. The last character of the paragraph is at character position PlcPcd.aCp[i] + dfc – 1. Leave the algorithm.
Set cp to PlcPcd.aCp[i+1]. Set i to i + 1. Go to step 2.

Share via

2.4.2 Determining Paragraph Boundaries

Additional resources