Share via


Beginning with the PowerPoint Document Stream

This blog will expand on my previous blog Parsing Pictures in a PowerPoint binary file, which details the Pictures Stream, and how you might parse the stream to extract pictures contained in the PowerPoint document. I’ll extend the concepts of the previous blog to apply to parsing the “PowerPoint Document” stream.

You’ll notice as you read through the following that you could just use the Pictures Stream, as shown previously, instead of the PowerPoint Document stream (the OfficeArtDggContainer) and accomplish the same thing, and in the strictest sense, yes, this is “almost” another way of doing the same thing. However, if you are building a generalized parser for PowerPoint binary files, and not just a one-off picture enumeration or extraction tool, then you are expanding beyond just the Pictures Stream and implementing a larger portion of the binary specification, so this is an evolutionary step in that direction.

Also, having detailed in the previous blog, and in this one, how to manually parse Office binary files with a Hex editor and code usage of IStorage to browse over the streams, at the end of this blog I’ll introduce a new tool by the Microsoft Security Response Center team (MSRC) that makes analyzing Office binary files much, much easier.

The binary file specification of course is still PowerPoint (MS-PPT), which you may use to follow along. However, as before, much of the parsing details will derive from definitions of structures in MS-ODRAW.

Let’s get started…

The PowerPoint Document Stream is defined in MS-PPT section 2.1.2:

A required stream whose name MUST be "PowerPoint Document".

The contents of this stream are specified by a sequence of top-level records.

Let a top-level record be specified as any one of the following: DocumentContainer,

MasterOrSlideContainer, HandoutContainer, SlideContainer, NotesContainer, ExOleObjStg,

ExControlStg, VbaProjectStg, PersistDirectoryAtom, or UserEditAtom record.

You may build on what we did previously by using the same code snippet and replacing the stream name with “PowerPoint Document” and/or follow the steps defined in MS-PPT section 2.1.2 to parse the “PowerPoint Document” stream to the Document Container, section 2.4.1 then to the DrawingGroupContainer (OfficeArtDgg), section 2.4.3. The definition of OfficeArtDgg is in MS-ODRAW, section 2.2.12 and is called OfficeArtDggContainer: This record specifies the container for all OfficeArt file records containing document-wide data. The OfficeArt record types are defined in MS-ODRAW section 2.2.

This is where we’ll start with the OfficeArtDggContainer structure (highlighted fields follow in sequence/color with the definitions):

73A0h: 0F 00 00 F0 30 01 00 00 00 00 06 F0 78 00 00 00 ...ð0......ðx...

73B0h: 01 34 00 00 0E 00 00 00 15 00 00 00 0D 00 00 00 .4..............

73C0h: 01 00 00 00 07 00 00 00 0D 00 00 00 05 00 00 00 ................

73D0h: 0C 00 00 00 01 00 00 00 0B 00 00 00 01 00 00 00 ................

73E0h: 0A 00 00 00 01 00 00 00 09 00 00 00 01 00 00 00 ................

73F0h: 08 00 00 00 01 00 00 00 07 00 00 00 01 00 00 00 ................

7400h: 06 00 00 00 01 00 00 00 05 00 00 00 01 00 00 00 ................

7410h: 04 00 00 00 01 00 00 00 03 00 00 00 01 00 00 00 ................

7420h: 02 00 00 00 01 00 00 00 2F 00 01 F0 58 00 00 00 ......../..ðX...

7430h: 52 00 07 F0 24 00 00 00 05 05 B1 A6 19 08 D8 C3 R..ð$.....±¦..ØÃ

7440h: 0B 6F B5 6C A3 98 8C 9E F4 65 FF 00 B6 03 00 00 .oµl£˜Œžôeÿ.¶...

7450h: 01 00 00 00 00 00 00 00 00 00 00 00 32 00 07 F0 ............2..ð

7460h: 24 00 00 00 03 04 27 CF 5A 3B 2E DC 9E 2E 16 A1 $.....'ÏZ;.Üž..¡

7470h: A1 59 52 1B 76 E9 FF 00 40 6D 00 00 01 00 00 00 ¡YR.véÿ.@m......

7480h: B6 03 00 00 00 00 00 00 ¶.......


2.2.12 OfficeArtDggContainer

2.2.48 OfficeArtFDGGBlock

Referenced by: OfficeArtDggContainer

2.2.20 OfficeArtBStoreContainer

Referenced by: OfficeArtDggContainer

This record specifies the container for all BLIPs used in all drawings associated with the parent OfficeArtDggContainer record.

2.2.22 OfficeArtBStoreContainerFileBlock

Referenced by: OfficeArtBStoreContainer, OfficeArtBStoreDelay, OfficeArtInlineSpContainer

2.2.32 OfficeArtFBSE

Referenced by: OfficeArtBStoreContainerFileBlock

This record specifies a File BLIP Store Entry (FBSE) that contains information about the BLIP.

Field Meaning

rh.recVer MUST be 0x2.

rh.recInstance MUST be the BLIP type.

** section 2.4.1 MSOBLIPTYPE:

   msoblipJPEG: 0x05 JPEG format.

   msoblipWMF: 0x03 WMF format.

rh.recType MUST be 0xF007.

rh.recLen An unsigned integer that specifies the number of bytes following the header. MUST be the size of nameData in bytes plus 36 if the BLIP is not embedded in this record or the size of nameData plus size plus 36 if the BLIP is embedded

btWin32 (1 byte): An MSOBLIPTYPE enumeration value that specifies the BLIP type.

From the example above (section 2.4.1 MSOBLIPTYPE, msoblipJPEG: 0x05 JPEG format, and 0x03 WMF format)

rgbUid (16 bytes): An MD4 digest, as specified in [RFC1320], that specifies the unique identifier of the pixel data in the BLIP.

size (4 bytes): An unsigned integer that specifies the size of the BLIP in bytes in the stream. (0x000003B6 = 950 bytes, and 0x00006D40 = 27,968 bytes) ** You will note these are the same JPEG and WMF pictures used in the previous blog with the Pictures Stream, as it is the same document.

cRef (4 bytes): An unsigned integer that specifies the number of references to the BLIP. A value of 0x00000000 specifies an empty slot in the OfficeArtBStoreContainer.

foDelay (4 bytes): An MSOFO data type that specifies the file offset into the associated OfficeArtBStoreDelay (delay stream). A value of 0xFFFFFFFF specifies that the file is not in the delay stream and cRef MUST be 0x00000000.

2.2.21 OfficeArtBStoreDelay

This record specifies the delay loaded container of BLIPs in the host application. There is no OfficeArtRecordHeader for this container.

rgfb (variable)

...

rgfb (variable): An array of OfficeArtBStoreContainerFileBlock records that specifies BLIP data. The array continues while the rh.recType field of the OfficeArtBStoreContainerFileBlock record is equal to 0xF007 or between 0xF018 and 0xF117, inclusive.

 

Having detailed, above, how to manually browse these Office binary files with a Hex editor, yet again, the following tool I mentioned at the beginning of this blog will make this exercise a lot easier. They’ve done all the parsing for you! And, the application includes file Defragmentation and Repair tools!

Introducing, Microsoft Security Response Center team’s OffVis!

If you followed along in a Hex editor and IStorage up to this point, I doubt you need much help with OffVis since it is a drill down GUI. Everything we did with the hex editor above in parsing you can do with OffVis using a hierarchical GUI and simply drill down on the objects and see the hex view of the data on the left side of the window. Note, you can also double click on the hex view window area and it will adjust your position automatically to the appropriate hierarchy/record on the hierarchical view window. To get started just pick your file from the menu ribbon (File-> Open) then pick your Parser from the list box (in this case PowerPoint97_2003BinaryFormat) and click Parse, then drill down on the structures:

You can find more information about OffVis, including the download link, on the MSRC OffVis Blog site.

MSRC even provides a training video for OffVis. Although this tool is great for assisting with understanding the Office binary file formats the original purpose of the tool, and the focus of the video training, is for researching potential Office binary file exploits.

I hope this helps advance your understanding as you investigate the Office binary file formats.

Stay tuned for more…