Get all the text in all slides in a presentation
This topic shows how to use the classes in the Open XML SDK to get all of the text in all of the slides in a presentation programmatically.
Getting a PresentationDocument object
In the Open XML SDK, the PresentationDocument class represents a
presentation document package. To work with a presentation document,
first create an instance of the PresentationDocument
class, and then work with
that instance. To create the class instance from the document call the
Open
method that uses a file path, and a Boolean value as the second
parameter to specify whether a document is editable. To open a document
for read/write access, assign the value true
to this parameter; for read-only access
assign it the value false
as shown in the
following using
statement. In this code,
the presentationFile
parameter is a string
that represents the path for the file from which you want to open the
document.
// Open the presentation as read-only.
using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
With v3.0.0+ the Close() method
has been removed in favor of relying on the using statement.
This ensures that the Dispose() method is automatically called
when the closing brace is reached. The block that follows the using
statement establishes a scope for the
object that is created or named in the using
statement, in this case presentationDocument
.
Basic Presentation Document Structure
The basic document structure of a PresentationML
document consists of a number of
parts, among which is the main part that contains the presentation
definition. The following text from the ISO/IEC 29500 specification
introduces the overall form of a PresentationML
package.
The main part of a
PresentationML
package starts with a presentation root element. That element contains a presentation, which, in turn, refers to a slide list, a slide master list, a notes master list, and a handout master list. The slide list refers to all of the slides in the presentation; the slide master list refers to the entire slide masters used in the presentation; the notes master contains information about the formatting of notes pages; and the handout master describes how a handout looks.A handout is a printed set of slides that can be provided to an audience.
As well as text and graphics, each slide can contain comments and notes, can have a layout, and can be part of one or more custom presentations. A comment is an annotation intended for the person maintaining the presentation slide deck. A note is a reminder or piece of text intended for the presenter or the audience.
Other features that a
PresentationML
document can include the following: animation, audio, video, and transitions between slides.A
PresentationML
document is not stored as one large body in a single part. Instead, the elements that implement certain groupings of functionality are stored in separate parts. For example, all authors in a document are stored in one authors part while each slide has its own part.ISO/IEC 29500: 2016
The following XML code example represents a presentation that contains two slides denoted by the IDs 267 and 256.
<p:presentation xmlns:p="…" … >
<p:sldMasterIdLst>
<p:sldMasterId
xmlns:rel="https://…/relationships" rel:id="rId1"/>
</p:sldMasterIdLst>
<p:notesMasterIdLst>
<p:notesMasterId
xmlns:rel="https://…/relationships" rel:id="rId4"/>
</p:notesMasterIdLst>
<p:handoutMasterIdLst>
<p:handoutMasterId
xmlns:rel="https://…/relationships" rel:id="rId5"/>
</p:handoutMasterIdLst>
<p:sldIdLst>
<p:sldId id="267"
xmlns:rel="https://…/relationships" rel:id="rId2"/>
<p:sldId id="256"
xmlns:rel="https://…/relationships" rel:id="rId3"/>
</p:sldIdLst>
<p:sldSz cx="9144000" cy="6858000"/>
<p:notesSz cx="6858000" cy="9144000"/>
</p:presentation>
Using the Open XML SDK, you can create document structure and
content using strongly-typed classes that correspond to PresentationML
elements. You can find these classes in the
namespace. The following table lists the class names of the classes that
correspond to the sld
, sldLayout
, sldMaster
, and notesMaster
elements.
PresentationML Element | Open XML SDK Class | Description |
---|---|---|
<sld/> |
Slide | Presentation Slide. It is the root element of SlidePart. |
<sldLayout/> |
SlideLayout | Slide Layout. It is the root element of SlideLayoutPart. |
<sldMaster/> |
SlideMaster | Slide Master. It is the root element of SlideMasterPart. |
<notesMaster/> |
NotesMaster | Notes Master (or handoutMaster). It is the root element of NotesMasterPart. |
Sample Code
The following code gets all the text in all the slides in a specific
presentation file. For example, you can pass the name of the file as an argument,
and then use a foreach
loop in your program to get the array of
strings returned by the method GetSlideIdAndText
as shown in the following
example.
if (args is [{ } path])
{
int numberOfSlides = CountSlides(path);
Console.WriteLine($"Number of slides = {numberOfSlides}");
for (int i = 0; i < numberOfSlides; i++)
{
GetSlideIdAndText(out string text, path, i);
Console.WriteLine($"Side #{i + 1} contains: {text}");
}
}
The following is the complete sample code in both C# and Visual Basic.
static int CountSlides(string presentationFile)
{
// Open the presentation as read-only.
using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
{
// Pass the presentation to the next CountSlides method
// and return the slide count.
return CountSlidesFromPresentation(presentationDocument);
}
}
// Count the slides in the presentation.
static int CountSlidesFromPresentation(PresentationDocument presentationDocument)
{
// Check for a null document object.
if (presentationDocument is null)
{
throw new ArgumentNullException("presentationDocument");
}
int slidesCount = 0;
// Get the presentation part of document.
PresentationPart? presentationPart = presentationDocument.PresentationPart;
// Get the slide count from the SlideParts.
if (presentationPart is not null)
{
slidesCount = presentationPart.SlideParts.Count();
}
// Return the slide count to the previous method.
return slidesCount;
}
static void GetSlideIdAndText(out string sldText, string docName, int index)
{
using (PresentationDocument ppt = PresentationDocument.Open(docName, false))
{
// Get the relationship ID of the first slide.
PresentationPart? part = ppt.PresentationPart;
OpenXmlElementList slideIds = part?.Presentation?.SlideIdList?.ChildElements ?? default;
if (part is null || slideIds.Count == 0)
{
sldText = "";
return;
}
string? relId = ((SlideId)slideIds[index]).RelationshipId;
if (relId is null)
{
sldText = "";
return;
}
// Get the slide part from the relationship ID.
SlidePart slide = (SlidePart)part.GetPartById(relId);
// Build a StringBuilder object.
StringBuilder paragraphText = new StringBuilder();
// Get the inner text of the slide:
IEnumerable<A.Text> texts = slide.Slide.Descendants<A.Text>();
foreach (A.Text text in texts)
{
paragraphText.Append(text.Text);
}
sldText = paragraphText.ToString();
}
}