Get all the text in a slide in a presentation
This topic shows how to use the classes in the Open XML SDK for Office to get all the text in a slide in a presentation programmatically.
Getting a PresentationDocument object
In the Open XML SDK, the PresentationDocument class represents a
presentation document package. To work with a presentation document,
first create an instance of the PresentationDocument
class, and then work with
that instance. To create the class instance from the document call the
Open
method that uses a file path, and a Boolean value as the second
parameter to specify whether a document is editable. To open a document
for read/write access, assign the value true
to this parameter; for read-only access
assign it the value false
as shown in the
following using
statement. In this code,
the file
parameter is a string that
represents the path for the file from which you want to open the
document.
// Open the presentation as read-only.
using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
With v3.0.0+ the Close() method
has been removed in favor of relying on the using statement.
This ensures that the Dispose() method is automatically called
when the closing brace is reached. The block that follows the using
statement establishes a scope for the
object that is created or named in the using
statement, in this case presentationDocument
.
Basic Presentation Document Structure
The basic document structure of a PresentationML
document consists of a number of
parts, among which is the main part that contains the presentation
definition. The following text from the ISO/IEC 29500 specification
introduces the overall form of a PresentationML
package.
The main part of a
PresentationML
package starts with a presentation root element. That element contains a presentation, which, in turn, refers to a slide list, a slide master list, a notes master list, and a handout master list. The slide list refers to all of the slides in the presentation; the slide master list refers to the entire slide masters used in the presentation; the notes master contains information about the formatting of notes pages; and the handout master describes how a handout looks.A handout is a printed set of slides that can be provided to an audience.
As well as text and graphics, each slide can contain comments and notes, can have a layout, and can be part of one or more custom presentations. A comment is an annotation intended for the person maintaining the presentation slide deck. A note is a reminder or piece of text intended for the presenter or the audience.
Other features that a
PresentationML
document can include the following: animation, audio, video, and transitions between slides.A
PresentationML
document is not stored as one large body in a single part. Instead, the elements that implement certain groupings of functionality are stored in separate parts. For example, all authors in a document are stored in one authors part while each slide has its own part.ISO/IEC 29500: 2016
The following XML code example represents a presentation that contains two slides denoted by the IDs 267 and 256.
<p:presentation xmlns:p="…" … >
<p:sldMasterIdLst>
<p:sldMasterId
xmlns:rel="https://…/relationships" rel:id="rId1"/>
</p:sldMasterIdLst>
<p:notesMasterIdLst>
<p:notesMasterId
xmlns:rel="https://…/relationships" rel:id="rId4"/>
</p:notesMasterIdLst>
<p:handoutMasterIdLst>
<p:handoutMasterId
xmlns:rel="https://…/relationships" rel:id="rId5"/>
</p:handoutMasterIdLst>
<p:sldIdLst>
<p:sldId id="267"
xmlns:rel="https://…/relationships" rel:id="rId2"/>
<p:sldId id="256"
xmlns:rel="https://…/relationships" rel:id="rId3"/>
</p:sldIdLst>
<p:sldSz cx="9144000" cy="6858000"/>
<p:notesSz cx="6858000" cy="9144000"/>
</p:presentation>
Using the Open XML SDK, you can create document structure and
content using strongly-typed classes that correspond to PresentationML
elements. You can find these classes in the
namespace. The following table lists the class names of the classes that
correspond to the sld
, sldLayout
, sldMaster
, and notesMaster
elements.
PresentationML Element | Open XML SDK Class | Description |
---|---|---|
<sld/> |
Slide | Presentation Slide. It is the root element of SlidePart. |
<sldLayout/> |
SlideLayout | Slide Layout. It is the root element of SlideLayoutPart. |
<sldMaster/> |
SlideMaster | Slide Master. It is the root element of SlideMasterPart. |
<notesMaster/> |
NotesMaster | Notes Master (or handoutMaster). It is the root element of NotesMasterPart. |
How the Sample Code Works
The sample code consists of three overloads of the GetAllTextInSlide
method. In the following
segment, the first overloaded method opens the source presentation that
contains the slide with text to get, and passes the presentation to the
second overloaded method, which gets the slide part. This method returns
the array of strings that the second method returns to it, each of which
represents a paragraph of text in the specified slide.
// Get all the text in a slide.
public static string[] GetAllTextInSlide(string presentationFile, int slideIndex)
{
// Open the presentation as read-only.
using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
{
// Pass the presentation and the slide index
// to the next GetAllTextInSlide method, and
// then return the array of strings it returns.
return GetAllTextInSlide(presentationDocument, slideIndex);
}
}
The second overloaded method takes the presentation document passed in and gets a slide part to pass to the third overloaded method. It returns to the first overloaded method the array of strings that the third overloaded method returns to it, each of which represents a paragraph of text in the specified slide.
static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex)
{
// Verify that the slide index is not out of range.
if (slideIndex < 0)
{
throw new ArgumentOutOfRangeException("slideIndex");
}
// Get the presentation part of the presentation document.
PresentationPart? presentationPart = presentationDocument.PresentationPart;
// Verify that the presentation part and presentation exist.
if (presentationPart is not null && presentationPart.Presentation is not null)
{
// Get the Presentation object from the presentation part.
Presentation presentation = presentationPart.Presentation;
// Verify that the slide ID list exists.
if (presentation.SlideIdList is not null)
{
// Get the collection of slide IDs from the slide ID list.
DocumentFormat.OpenXml.OpenXmlElementList slideIds = presentation.SlideIdList.ChildElements;
// If the slide ID is in range...
if (slideIndex < slideIds.Count)
{
// Get the relationship ID of the slide.
string? slidePartRelationshipId = ((SlideId)slideIds[slideIndex]).RelationshipId;
if (slidePartRelationshipId is null)
{
return [];
}
// Get the specified slide part from the relationship ID.
SlidePart slidePart = (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);
// Pass the slide part to the next method, and
// then return the array of strings that method
// returns to the previous method.
return GetAllTextInSlide(slidePart);
}
}
}
// Else, return null.
return [];
}
The following code segment shows the third overloaded method, which
takes takes the slide part passed in, and returns to the second
overloaded method a string array of text paragraphs. It starts by
verifying that the slide part passed in exists, and then it creates a
linked list of strings. It iterates through the paragraphs in the slide
passed in, and using a StringBuilder
object
to concatenate all the lines of text in a paragraph, it assigns each
paragraph to a string in the linked list. It then returns to the second
overloaded method an array of strings that represents all the text in
the specified slide in the presentation.
static string[] GetAllTextInSlide(SlidePart slidePart)
{
// Verify that the slide part exists.
if (slidePart is null)
{
throw new ArgumentNullException("slidePart");
}
// Create a new linked list of strings.
LinkedList<string> texts = new LinkedList<string>();
// If the slide exists...
if (slidePart.Slide is not null)
{
// Iterate through all the paragraphs in the slide.
foreach (DocumentFormat.OpenXml.Drawing.Paragraph paragraph in
slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())
{
// Create a new string builder.
StringBuilder paragraphText = new StringBuilder();
// Iterate through the lines of the paragraph.
foreach (DocumentFormat.OpenXml.Drawing.Text text in
paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())
{
// Append each line to the previous lines.
paragraphText.Append(text.Text);
}
if (paragraphText.Length > 0)
{
// Add each paragraph to the linked list.
texts.AddLast(paragraphText.ToString());
}
}
}
// Return an array of strings.
return texts.ToArray();
}
Sample Code
Following is the complete sample code that you can use to get all the
text in a specific slide in a presentation file. For example, you can
use the following foreach
loop in your
program to get the array of strings returned by the method GetAllTextInSlide
, which represents the text in
the slide at the index of slideIndex
of the presentation file found at the filePath
.
foreach (string text in GetAllTextInSlide(filePath, int.Parse(slideIndex)))
{
Console.WriteLine(text);
}
Following is the complete sample code in both C# and Visual Basic.
// Get all the text in a slide.
public static string[] GetAllTextInSlide(string presentationFile, int slideIndex)
{
// Open the presentation as read-only.
using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
{
// Pass the presentation and the slide index
// to the next GetAllTextInSlide method, and
// then return the array of strings it returns.
return GetAllTextInSlide(presentationDocument, slideIndex);
}
}
static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex)
{
// Verify that the slide index is not out of range.
if (slideIndex < 0)
{
throw new ArgumentOutOfRangeException("slideIndex");
}
// Get the presentation part of the presentation document.
PresentationPart? presentationPart = presentationDocument.PresentationPart;
// Verify that the presentation part and presentation exist.
if (presentationPart is not null && presentationPart.Presentation is not null)
{
// Get the Presentation object from the presentation part.
Presentation presentation = presentationPart.Presentation;
// Verify that the slide ID list exists.
if (presentation.SlideIdList is not null)
{
// Get the collection of slide IDs from the slide ID list.
DocumentFormat.OpenXml.OpenXmlElementList slideIds = presentation.SlideIdList.ChildElements;
// If the slide ID is in range...
if (slideIndex < slideIds.Count)
{
// Get the relationship ID of the slide.
string? slidePartRelationshipId = ((SlideId)slideIds[slideIndex]).RelationshipId;
if (slidePartRelationshipId is null)
{
return [];
}
// Get the specified slide part from the relationship ID.
SlidePart slidePart = (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);
// Pass the slide part to the next method, and
// then return the array of strings that method
// returns to the previous method.
return GetAllTextInSlide(slidePart);
}
}
}
// Else, return null.
return [];
}
static string[] GetAllTextInSlide(SlidePart slidePart)
{
// Verify that the slide part exists.
if (slidePart is null)
{
throw new ArgumentNullException("slidePart");
}
// Create a new linked list of strings.
LinkedList<string> texts = new LinkedList<string>();
// If the slide exists...
if (slidePart.Slide is not null)
{
// Iterate through all the paragraphs in the slide.
foreach (DocumentFormat.OpenXml.Drawing.Paragraph paragraph in
slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())
{
// Create a new string builder.
StringBuilder paragraphText = new StringBuilder();
// Iterate through the lines of the paragraph.
foreach (DocumentFormat.OpenXml.Drawing.Text text in
paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())
{
// Append each line to the previous lines.
paragraphText.Append(text.Text);
}
if (paragraphText.Length > 0)
{
// Add each paragraph to the linked list.
texts.AddLast(paragraphText.ToString());
}
}
}
// Return an array of strings.
return texts.ToArray();
}