Получить весь текст в слайде в презентации

Статья
01/21/2025

В этом разделе показано, как использовать классы в пакете SDK Open XML для Office для программного получения всего текста на слайде презентации.

Получение объекта PresentationDocument

В пакете SDK Open PresentationDocument XML класс представляет пакет документов презентации. Чтобы работать с документом презентации, сначала создайте экземпляр PresentationDocument класса , а затем работайте с этим экземпляром. Чтобы создать экземпляр класса из документа, вызовите Open метод, использующий путь к файлу, и логическое значение в качестве второго параметра, чтобы указать, доступен ли документ для редактирования. Чтобы открыть документ для доступа на чтение и запись, присвойте этому параметру значение true ; для доступа только для чтения присвойте ему значение false , как показано в следующей using инструкции. В этом коде file параметр представляет собой строку, представляющую путь к файлу, из которого требуется открыть документ.

C#
Visual Basic

// Open the presentation as read-only.
using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))

' Open the presentation as read-only.
Using presentationDocument As PresentationDocument = PresentationDocument.Open(presentationFile, False)

В версии 3.0.0+ Close() метод был удален в пользу использования инструкции using. Это гарантирует автоматический Dispose() вызов метода при достижении закрывающей фигурной скобки. Блок, следующий за using оператором , устанавливает область для объекта, созданного или именованного в инструкцииusing, в данном случае presentationDocument.

Базовая структура документа презентации

Базовая структура PresentationML документа состоит из нескольких частей, среди которых есть main часть, содержащая определение презентации. В следующем тексте из спецификации ISO/IEC 29500 представлена общая форма PresentationML пакета.

Main часть PresentationML пакета начинается с корневого элемента презентации. Этот элемент содержит презентацию, которая, в свою очередь, ссылается на список слайдов, список образцов слайдов, список образцов заметок и список образцов раздаточных материалов. Список слайдов ссылается на все слайды в презентации; список образцов слайдов ссылается на все образцы слайдов, используемые в презентации; в списке образцов заметок содержатся данные о форматировании страниц заметок, а в списке образцов раздаточных материалов описан внешний вид раздаточных материалов.

Раздаточные материалы представляют собой набор распечатанных слайдов, которые можно раздать слушателям для последующего использования.

Наряду с текстом и изображениями слайды могут содержать комментарии, заметки и разметку, а также могут входить в одну или несколько пользовательских презентаций. Комментарий представляет собой примечание, которое адресовано сотруднику, ответственному за обслуживание набора слайдов. Заметка представляет собой напоминание или отрывок текста, предназначенный для докладчика или для слушателей.

Другие функции документа PresentationML могут включать следующие: анимацию, звук, видео и переходы между слайдами.

Документ PresentationML не хранится в виде одного большого текста в одной части. Элементы с определенной группировкой функций хранятся в различных частях. Например, все авторы в документе хранятся в одной части авторов, а каждый слайд имеет свою собственную часть.

ISO/IEC 29500: 2016

Указанный ниже пример кода XML описывает презентацию, содержащую 2 слайда с идентификаторами 267 и 256.

    <p:presentation xmlns:p="…" … > 
       <p:sldMasterIdLst>
          <p:sldMasterId
             xmlns:rel="https://…/relationships" rel:id="rId1"/>
       </p:sldMasterIdLst>
       <p:notesMasterIdLst>
          <p:notesMasterId
             xmlns:rel="https://…/relationships" rel:id="rId4"/>
       </p:notesMasterIdLst>
       <p:handoutMasterIdLst>
          <p:handoutMasterId
             xmlns:rel="https://…/relationships" rel:id="rId5"/>
       </p:handoutMasterIdLst>
       <p:sldIdLst>
          <p:sldId id="267"
             xmlns:rel="https://…/relationships" rel:id="rId2"/>
          <p:sldId id="256"
             xmlns:rel="https://…/relationships" rel:id="rId3"/>
       </p:sldIdLst>
           <p:sldSz cx="9144000" cy="6858000"/>
       <p:notesSz cx="6858000" cy="9144000"/>
    </p:presentation>

С помощью пакета SDK Open XML можно создавать структуру документа и содержимое с помощью строго типизированных классов, соответствующих элементам PresentationML. Эти классы можно найти в пространстве имен. В следующей таблице перечислены имена классов, которые соответствуют sldэлементам , sldLayout, sldMasterи notesMaster .

Элемент PresentationML	Класс пакета SDK Open XML	Описание
`<sld/>`	Slide	Слайд презентации. Это корневой элемент части SlidePart.
`<sldLayout/>`	SlideLayout	Разметка слайда. Это корневой элемент части SlideLayoutPart.
`<sldMaster/>`	SlideMaster	Образец слайда. Это корневой элемент части SlideMasterPart.
`<notesMaster/>`	NotesMaster	Образец заметок (или handoutMaster). Это корневой элемент части NotesMasterPart.

Механизм работы примера кода

Пример кода состоит из трех перегрузок GetAllTextInSlide метода . В приведенном ниже сегменте первый перегруженный метод открывает исходную презентацию со слайдом, из которого необходимо извлечь текст, и передает презентацию второму перегруженному методу, который извлекает часть слайда. Этот метод возвращает массив строк, который второй метод возвращает ему; каждая строка представляет собой абзац текста в указанном слайде.

C#
Visual Basic

// Get all the text in a slide.
public static string[] GetAllTextInSlide(string presentationFile, int slideIndex)
{
    // Open the presentation as read-only.
    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
    {
        // Pass the presentation and the slide index
        // to the next GetAllTextInSlide method, and
        // then return the array of strings it returns. 
        return GetAllTextInSlide(presentationDocument, slideIndex);
    }
}

' Get all the text in a slide.
Public Shared Function GetAllTextInSlide(presentationFile As String, slideIndex As Integer) As String()
    ' Open the presentation as read-only.
    Using presentationDocument As PresentationDocument = PresentationDocument.Open(presentationFile, False)
        ' Pass the presentation and the slide index
        ' to the next GetAllTextInSlide method, and
        ' then return the array of strings it returns. 
        Return GetAllTextInSlide(presentationDocument, slideIndex)
    End Using
End Function

Второй перегруженный метод принимает переданный документ презентации и получает часть слайда для передачи третьему перегруженному методу. Он возвращает первому перегруженному методу массив строк, возвращенный третьим методом; каждая строка представляет собой абзац текста в указанном слайде.

C#
Visual Basic

static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex)
{
    // Verify that the slide index is not out of range.
    if (slideIndex < 0)
    {
        throw new ArgumentOutOfRangeException("slideIndex");
    }

    // Get the presentation part of the presentation document.
    PresentationPart? presentationPart = presentationDocument.PresentationPart;

    // Verify that the presentation part and presentation exist.
    if (presentationPart is not null && presentationPart.Presentation is not null)
    {
        // Get the Presentation object from the presentation part.
        Presentation presentation = presentationPart.Presentation;

        // Verify that the slide ID list exists.
        if (presentation.SlideIdList is not null)
        {
            // Get the collection of slide IDs from the slide ID list.
            DocumentFormat.OpenXml.OpenXmlElementList slideIds = presentation.SlideIdList.ChildElements;

            // If the slide ID is in range...
            if (slideIndex < slideIds.Count)
            {
                // Get the relationship ID of the slide.
                string? slidePartRelationshipId = ((SlideId)slideIds[slideIndex]).RelationshipId;

                if (slidePartRelationshipId is null)
                {
                    return [];
                }

                // Get the specified slide part from the relationship ID.
                SlidePart slidePart = (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);

                // Pass the slide part to the next method, and
                // then return the array of strings that method
                // returns to the previous method.
                return GetAllTextInSlide(slidePart);
            }
        }
    }

    // Else, return null.
    return [];
}

Private Shared Function GetAllTextInSlide(presentationDocument As PresentationDocument, slideIndex As Integer) As String()
    ' Verify that the slide index is not out of range.
    If slideIndex < 0 Then
        Throw New ArgumentOutOfRangeException(NameOf(slideIndex))
    End If

    ' Get the presentation part of the presentation document.
    Dim presentationPart As PresentationPart = presentationDocument.PresentationPart

    ' Verify that the presentation part and presentation exist.
    If presentationPart IsNot Nothing AndAlso presentationPart.Presentation IsNot Nothing Then
        ' Get the Presentation object from the presentation part.
        Dim presentation As Presentation = presentationPart.Presentation

        ' Verify that the slide ID list exists.
        If presentation.SlideIdList IsNot Nothing Then
            ' Get the collection of slide IDs from the slide ID list.
            Dim slideIds As DocumentFormat.OpenXml.OpenXmlElementList = presentation.SlideIdList.ChildElements

            ' If the slide ID is in range...
            If slideIndex < slideIds.Count Then
                ' Get the relationship ID of the slide.
                Dim slidePartRelationshipId As String = CType(slideIds(slideIndex), SlideId).RelationshipId

                If slidePartRelationshipId Is Nothing Then
                    Return Array.Empty(Of String)()
                End If

                ' Get the specified slide part from the relationship ID.
                Dim slidePart As SlidePart = CType(presentationPart.GetPartById(slidePartRelationshipId), SlidePart)

                ' Pass the slide part to the next method, and
                ' then return the array of strings that method
                ' returns to the previous method.
                Return GetAllTextInSlide(slidePart)
            End If
        End If
    End If

    ' Else, return an empty array.
    Return Array.Empty(Of String)()
End Function

В приведенном ниже сегменте кода показан третий перегруженный метод, который принимает часть слайда и возвращает второму перегруженному методу массив строк, каждая из которых представляет абзац текста. Сначала в методе выполняется проверка того, что переданный слайд существует, после чего создается связанный список строк. Он выполняет итерацию по абзацам на слайде, переданном StringBuilder в, и используя объект для сцепления всех строк текста в абзаце, присваивает каждому абзацу строку в связанном списке. Затем второму перегруженному методу возвращается массив строк, представляющий весь текст указанного слайда презентации.

C#
Visual Basic

static string[] GetAllTextInSlide(SlidePart slidePart)
{
    // Verify that the slide part exists.
    if (slidePart is null)
    {
        throw new ArgumentNullException("slidePart");
    }

    // Create a new linked list of strings.
    LinkedList<string> texts = new LinkedList<string>();

    // If the slide exists...
    if (slidePart.Slide is not null)
    {
        // Iterate through all the paragraphs in the slide.
        foreach (DocumentFormat.OpenXml.Drawing.Paragraph paragraph in
            slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())
        {
            // Create a new string builder.                    
            StringBuilder paragraphText = new StringBuilder();

            // Iterate through the lines of the paragraph.
            foreach (DocumentFormat.OpenXml.Drawing.Text text in
                paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())
            {
                // Append each line to the previous lines.
                paragraphText.Append(text.Text);
            }

            if (paragraphText.Length > 0)
            {
                // Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString());
            }
        }
    }

    // Return an array of strings.
    return texts.ToArray();
}

Private Shared Function GetAllTextInSlide(slidePart As SlidePart) As String()
    ' Verify that the slide part exists.
    If slidePart Is Nothing Then
        Throw New ArgumentNullException(NameOf(slidePart))
    End If

    ' Create a new linked list of strings.
    Dim texts As New LinkedList(Of String)()

    ' If the slide exists...
    If slidePart.Slide IsNot Nothing Then
        ' Iterate through all the paragraphs in the slide.
        For Each paragraph As DocumentFormat.OpenXml.Drawing.Paragraph In slidePart.Slide.Descendants(Of DocumentFormat.OpenXml.Drawing.Paragraph)()
            ' Create a new string builder.                    
            Dim paragraphText As New StringBuilder()

            ' Iterate through the lines of the paragraph.
            For Each text As DocumentFormat.OpenXml.Drawing.Text In paragraph.Descendants(Of DocumentFormat.OpenXml.Drawing.Text)()
                ' Append each line to the previous lines.
                paragraphText.Append(text.Text)
            Next

            If paragraphText.Length > 0 Then
                ' Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString())
            End If
        Next
    End If

    ' Return an array of strings.
    Return texts.ToArray()
End Function

Пример кода

Ниже приведен полный пример кода для получения всего текста конкретного слайда в файле презентации. Например, можно использовать следующий foreach цикл в программе, чтобы получить массив строк, возвращаемый методом GetAllTextInSlide, который представляет текст на слайде по индексу slideIndex файла презентации, найденного filePathв .

C#
Visual Basic

foreach (string text in GetAllTextInSlide(filePath, int.Parse(slideIndex)))
{
    Console.WriteLine(text);
}

For Each text As String In TextInSlide.GetAllTextInSlide(filePath, slideIndex)
    Console.WriteLine(text)
Next

Ниже приведен полный пример кода на языках C# и Visual Basic.

C#
Visual Basic

// Get all the text in a slide.
public static string[] GetAllTextInSlide(string presentationFile, int slideIndex)
{
    // Open the presentation as read-only.
    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
    {
        // Pass the presentation and the slide index
        // to the next GetAllTextInSlide method, and
        // then return the array of strings it returns. 
        return GetAllTextInSlide(presentationDocument, slideIndex);
    }
}
static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex)
{
    // Verify that the slide index is not out of range.
    if (slideIndex < 0)
    {
        throw new ArgumentOutOfRangeException("slideIndex");
    }

    // Get the presentation part of the presentation document.
    PresentationPart? presentationPart = presentationDocument.PresentationPart;

    // Verify that the presentation part and presentation exist.
    if (presentationPart is not null && presentationPart.Presentation is not null)
    {
        // Get the Presentation object from the presentation part.
        Presentation presentation = presentationPart.Presentation;

        // Verify that the slide ID list exists.
        if (presentation.SlideIdList is not null)
        {
            // Get the collection of slide IDs from the slide ID list.
            DocumentFormat.OpenXml.OpenXmlElementList slideIds = presentation.SlideIdList.ChildElements;

            // If the slide ID is in range...
            if (slideIndex < slideIds.Count)
            {
                // Get the relationship ID of the slide.
                string? slidePartRelationshipId = ((SlideId)slideIds[slideIndex]).RelationshipId;

                if (slidePartRelationshipId is null)
                {
                    return [];
                }

                // Get the specified slide part from the relationship ID.
                SlidePart slidePart = (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);

                // Pass the slide part to the next method, and
                // then return the array of strings that method
                // returns to the previous method.
                return GetAllTextInSlide(slidePart);
            }
        }
    }

    // Else, return null.
    return [];
}
static string[] GetAllTextInSlide(SlidePart slidePart)
{
    // Verify that the slide part exists.
    if (slidePart is null)
    {
        throw new ArgumentNullException("slidePart");
    }

    // Create a new linked list of strings.
    LinkedList<string> texts = new LinkedList<string>();

    // If the slide exists...
    if (slidePart.Slide is not null)
    {
        // Iterate through all the paragraphs in the slide.
        foreach (DocumentFormat.OpenXml.Drawing.Paragraph paragraph in
            slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())
        {
            // Create a new string builder.                    
            StringBuilder paragraphText = new StringBuilder();

            // Iterate through the lines of the paragraph.
            foreach (DocumentFormat.OpenXml.Drawing.Text text in
                paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())
            {
                // Append each line to the previous lines.
                paragraphText.Append(text.Text);
            }

            if (paragraphText.Length > 0)
            {
                // Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString());
            }
        }
    }

    // Return an array of strings.
    return texts.ToArray();
}

' Get all the text in a slide.
Public Shared Function GetAllTextInSlide(presentationFile As String, slideIndex As Integer) As String()
    ' Open the presentation as read-only.
    Using presentationDocument As PresentationDocument = PresentationDocument.Open(presentationFile, False)
        ' Pass the presentation and the slide index
        ' to the next GetAllTextInSlide method, and
        ' then return the array of strings it returns. 
        Return GetAllTextInSlide(presentationDocument, slideIndex)
    End Using
End Function
Private Shared Function GetAllTextInSlide(presentationDocument As PresentationDocument, slideIndex As Integer) As String()
    ' Verify that the slide index is not out of range.
    If slideIndex < 0 Then
        Throw New ArgumentOutOfRangeException(NameOf(slideIndex))
    End If

    ' Get the presentation part of the presentation document.
    Dim presentationPart As PresentationPart = presentationDocument.PresentationPart

    ' Verify that the presentation part and presentation exist.
    If presentationPart IsNot Nothing AndAlso presentationPart.Presentation IsNot Nothing Then
        ' Get the Presentation object from the presentation part.
        Dim presentation As Presentation = presentationPart.Presentation

        ' Verify that the slide ID list exists.
        If presentation.SlideIdList IsNot Nothing Then
            ' Get the collection of slide IDs from the slide ID list.
            Dim slideIds As DocumentFormat.OpenXml.OpenXmlElementList = presentation.SlideIdList.ChildElements

            ' If the slide ID is in range...
            If slideIndex < slideIds.Count Then
                ' Get the relationship ID of the slide.
                Dim slidePartRelationshipId As String = CType(slideIds(slideIndex), SlideId).RelationshipId

                If slidePartRelationshipId Is Nothing Then
                    Return Array.Empty(Of String)()
                End If

                ' Get the specified slide part from the relationship ID.
                Dim slidePart As SlidePart = CType(presentationPart.GetPartById(slidePartRelationshipId), SlidePart)

                ' Pass the slide part to the next method, and
                ' then return the array of strings that method
                ' returns to the previous method.
                Return GetAllTextInSlide(slidePart)
            End If
        End If
    End If

    ' Else, return an empty array.
    Return Array.Empty(Of String)()
End Function
Private Shared Function GetAllTextInSlide(slidePart As SlidePart) As String()
    ' Verify that the slide part exists.
    If slidePart Is Nothing Then
        Throw New ArgumentNullException(NameOf(slidePart))
    End If

    ' Create a new linked list of strings.
    Dim texts As New LinkedList(Of String)()

    ' If the slide exists...
    If slidePart.Slide IsNot Nothing Then
        ' Iterate through all the paragraphs in the slide.
        For Each paragraph As DocumentFormat.OpenXml.Drawing.Paragraph In slidePart.Slide.Descendants(Of DocumentFormat.OpenXml.Drawing.Paragraph)()
            ' Create a new string builder.                    
            Dim paragraphText As New StringBuilder()

            ' Iterate through the lines of the paragraph.
            For Each text As DocumentFormat.OpenXml.Drawing.Text In paragraph.Descendants(Of DocumentFormat.OpenXml.Drawing.Text)()
                ' Append each line to the previous lines.
                paragraphText.Append(text.Text)
            Next

            If paragraphText.Length > 0 Then
                ' Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString())
            End If
        Next
    End If

    ' Return an array of strings.
    Return texts.ToArray()
End Function

См. также

Справочник по библиотеке классов пакета SDK Open XML

Поделиться через