Compartir a través de


Extracting Microsoft Office Application Properties without automation

Every file created by a Microsoft Office application supports a set of built-in document properties. In addition, you can add your own custom properties to an Office document either manually or through code. You can use document properties to create, maintain, and track information about an Office document such as when it was created, who the author is, where it is stored, and so on. To get or set the properties you can use automation to extract the Microsoft Office application properties.

Take a look at the following links for samples:

https://support.microsoft.com/default.aspx?scid=KB;EN-US;Q303296&

https://msdn2.microsoft.com/en-us/library/4e0tda25.aspx

But what happens if you are working with a Web-based application and you want to avoid the use of automation in a Web server…

I found a nice workaround to extract Office document properties without using automation. You can use the Dsofile, an in-process ActiveX component that allows you to read and to edit the OLE document properties that are associated with Microsoft Office files, such as the following:
• Microsoft Excel workbooks
• Microsoft PowerPoint presentations
• Microsoft Word documents
• Microsoft Project projects
• Microsoft Visio drawings
• Other files without those Office products installed

If you are working with a managed application follow the next steps:

  1. Download and install the DSO File control.

  2. Add a reference to InteropDSOfile.dll to your managed Web application.

  3. Create a new Web form and copy the following code.
    <%@ Page Language="C#" %>

    <script runat="server">
        protected void btnLoadFile_Click(object sender, EventArgs e)
    {
            // Define a path to save the file in the server
            string serverTempFilePath = Server.MapPath(@"/yourpath/" + FileUpload1.FileName);
            FileUpload1.PostedFile.SaveAs(serverTempFilePath);

            // Create the DSOFile document
            DSOFile.OleDocumentPropertiesClass oleDocument = new DSOFile.OleDocumentPropertiesClass();
            DSOFile.SummaryProperties summaryProperties;

            oleDocument.Open(serverTempFilePath,
                    true,
    DSOFile.dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess);

            // Extract the properties
            summaryProperties = oleDocument.SummaryProperties;
            tbTitle.Text = summaryProperties.Title;
            tbAuthors.Text = summaryProperties.Author;
            tbCompany.Text = summaryProperties.Company;
            tbNumPages.Text = summaryProperties.PageCount.ToString();
            tbWordCount.Text = summaryProperties.WordCount.ToString();

            // Close the DSOFile.OleDocumentPropertiesClass
            oleDocument.Close(false);
        }
    </script>

    <html xmlns="https://www.w3.org/1999/xhtml">
    <head runat="server">
        <title>DSOFileDemo</title>
    </head>
    <body>
        <form id="form1" runat="server">
            <div>
                <strong>
    DSOFileDemo</strong><br />
    <br />
    <table border="1">
                    <tr>
                        <td valign="top">
    File upload:</td>
                        <td>
                            <asp:FileUpload ID="FileUpload1" runat="server" />
    <asp:Button ID="btnLoadFile" runat="server" OnClick="btnLoadFile_Click" Text="Load File Properties" /><br />
    </td>
                    </tr>
                    <tr>
                        <td>
    Title:</td>
                        <td>
                            <asp:TextBox ID="tbTitle" runat="server"></asp:TextBox> 
                        </td>
                    </tr>
                    <tr>
                        <td>
    Author:</td>
                        <td>
                            <asp:TextBox ID="tbAuthors" runat="server"></asp:TextBox> 
                        </td>
                    </tr>
                    <tr>
                        <td>
    Company:</td>
                        <td>
                            <asp:TextBox ID="tbCompany" runat="server"></asp:TextBox> 
                        </td>
                    </tr>
                    <tr>
                        <td>
    Number of Pages:</td>
                        <td>
                            <asp:TextBox ID="tbNumPages" runat="server"></asp:TextBox></td>
                    </tr>
                    <tr>
                        <td>
    Word count:</td>
                        <td>
                            <asp:TextBox ID="tbWordCount" runat="server"></asp:TextBox> 
                        </td>
                    </tr>
                </table>
            </div>
        </form>
    </body>
    </html>

  4. If you run the previous Web form you will get something like this:

You can also extract custom properties using the DSOFile control.

Have a peek and enjoy!

Comments

  • Anonymous
    January 05, 2006
    G'day,

    Just wondering if you had any luck with setting or extracting OLE properties for PDF or even Outlook Message files ?

    cheers
    Bill

  • Anonymous
    January 06, 2006
    Hi Bill,

    I only tried using DSOControl for Office files. However, you can always try extracting generic file properties using the System.IO.FileInfo class:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemiofileinfoclasstopic.asp

  • Anonymous
    February 13, 2006
    It's is possible to change (custom)properties of other files (like pdf, txt, bmp) BUT when these files are compressed or burned on cd, these properties are lost.

    This is not the case for MS-Office files.
    WHY???

  • Anonymous
    March 02, 2006
    hi,

    how do i extract document properties for pdf files in C#? FileInfo class doesnot give details like author, keywords, comments and other properties in which i am interested.

    Regards
    Namrata

  • Anonymous
    March 03, 2006

    Looks like the microsoft DSOfile DLL V2.0 (09 feb 06)is bugged : i can update file summary fields only if they've been set manually before (in particular for the "title" field). Otherwise, i get a stupid "persmission is denied" error message, though i'm running locally with admin rights.

    The DLL VB6 and .NET demos crash the same if these fields were not manually set before !

    This is pretty annoying. I've been looking for an explanation on the web for hours but couln't find any. Microsoft should care more about the quality of its code.


  • Anonymous
    May 03, 2006
    The comment has been removed

  • Anonymous
    May 04, 2006
    Just a little comment to clarify the scope of the DSOfile dll. The Microsoft Developer Support OLE File Property Reader 2.0 is a code sample that demonstrates how to use the OLE IPropertyStrorage interface to read and write the document properties of OLE files, such as the properties of native Microsoft Word, Microsoft Excel, Microsoft PowerPoint, Microsoft Publisher, and Microsoft Visio files.

    The sample was not intended to work with PDF files...

  • Anonymous
    May 04, 2006
    The comment has been removed

  • Anonymous
    May 05, 2006
    Thanks for your help Erika, but had the same problems with Word files: could'nt update the "title" and "category" fields if they were not manually set before.

    But I find the solution and here it is:
    http://www.codecomments.com/message813451.html

    All you have to do is to debug the dsofile by yourself. Just follow the instruction given in the link, it's not very difficult and it works fine, even for pdf files.

  • Anonymous
    May 11, 2006
    Emeric and everyone,

    I am sorry that the DSOFile control has problems to set fields (title and summary) that were not manually set before.

    I loved this control because it was great for extracting properties and my intentions to share this with the community were the best. I am sure some people might find this useful however.

    I also want to share that Ken Getz just wrote a new column on how to extract document properties using Office 2007.

    http://msdn.microsoft.com/msdnmag/issues/06/06/AdvancedBasics/default.aspx

    You will see it's quite interesting and I love the fact that the new file formats offer better ways to extract/update document properties and override the need to use automation or the DSOControl.

    Extracting/writing document properties contained in an XML document is very simple, I am sure everyone will be delighted with this new option.



  • Anonymous
    June 22, 2006
    If you want to access the files that are on a remote machine and you are trying to use impersonation then you will need to use AspCompat="true" for the page trying to access the file. Take a look at this KB for more information http://support.microsoft.com/kb/325791

    -Sunith Nair

  • Anonymous
    July 12, 2006
    The comment has been removed

  • Anonymous
    July 18, 2006
    What is the best way t0 get PDF properties ? Can this be used ?

  • Anonymous
    July 19, 2006
    The comment has been removed

  • Anonymous
    September 25, 2006
    The comment has been removed

  • Anonymous
    September 27, 2006
    I got it working - had to fix security for com applications.

  • Anonymous
    November 15, 2006
    Since I've received a few emails on how I fixed my issue, I'll drop the link to my solution here --> http://forums.asp.net/thread/1409599.aspx

  • Anonymous
    December 01, 2006
    Please forgive as this may be a very simple fix. I am very new to programming but have beeb spending many late nights on this.  I keep getting the following error: Compiler Error Message: CS0246: The type or namespace name 'DSOFile' could not be found (are you missing a using directive or an assembly reference?) I do have the reference and the dll is registered both on my local machine and server.  I even tried to Import Namespace="DSOFile" and "Interop.DSOFile" and "InteropDSOFile" and none of that worked. Please help. Thanks, James

  • Anonymous
    December 05, 2006
    Maybe a simple thought but, if there's a fix for this issue, why doesn't ms release a new version of the DSOFile.dll with the fix embedded? Seems logical to me...

  • Anonymous
    December 22, 2006
    So how can I extract, let's say, a document's subject in a docx (2007) Word file. DSOfile doesn't seem to work against a docx file. Thanks, Roy

  • Anonymous
    December 23, 2006
    Part 2: I have studied Erika's letter and Ken Getz' article, but am still confounded as to how I can, within a VBA project, extract, let's say, the 'subject' of an unopened Word2007 document. (I can do it with an open document without any problem. Its the unopened ones that give me a problem.) Given how simple 2007 is supposed to make investigating the various parts of a docx document, it would seem that it should be an easier process.  (Ken's article (for a VBA person such as myself) is way beyond me.) Is there a simple 'replacement' for dsofile in 2007?    --Roy

  • Anonymous
    January 11, 2007
    Sorry my English isn't very well, but i'll try. Does anybody use the dsofile.dll under Windows XP64? Does it work? I have some Problems with it and i want to fix it. Rif

  • Anonymous
    January 18, 2007
    Has anyone tried using DSO for files on a mapped drive or a  remote network location .. Would be obliged if one could suggest how it works... Thanks, -Liesha.

  • Anonymous
    February 26, 2007
    Hello, I am trying to get the digital Signiture with DSO File. It is enough for me to find out weather an office file is signed or not (I don't need to proof the signiture or get the signiture...) is this possible? regards Alex

  • Anonymous
    February 27, 2007
    Hello, another question: with DSOFile 1.4 it was possible to find out if a macro was attached to an Office File - with DSOFile 2.0 I have not found this opportuity?! Am I doing something wrong? regards Alex

  • Anonymous
    February 27, 2007
    Hi Erika   I have some question. I need add Summary Properties for file(xls,pdf,txt,.....) by C#. But When I complie this program completed and right click this file. file summary properties it is enable. it can't change summary properties by manual.  when open file with DSOFile it Error:"A lock violation has occurred. (Exception from HRESULT: 0x80030021 (STG_E_LOCKVIOLATION))" Thank you Pachara

  • Anonymous
    February 27, 2007
    Hi again: has anyone a full list of the Documentproperties and their types? than I can extend the DSOFile source... cheers alex

  • Anonymous
    February 27, 2007
    Hi! I make program use C# for edit summury properties file. it can't open file .txt. --code exsample--- String filePath = "C:myfile.txt"; myDSOOleDocument.Open(filePath, false, DSOFile.dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess) myDSOOleDocument.SummaryProperties.Keywords = txtKeywords.text; myDSOOleDocument.save();

  • Anonymous
    March 09, 2007
    Hello. I'm afraid the sample doesn't work for me, either. It doesn't throw any exception when I simply open a file, but it can't write anything. Neither with the sample application nor with my own c# crogramme. I get an access denied error every time I try and the programme crashes. The file I tried to use is a simple txt file I created using C#. When using the OleDocumentPropertiesClass to open a file in my c# code I get an exception that the file has no ole storage. ???

  • Anonymous
    March 13, 2007
    Hello! You can't show your sample code for me. thank you very much.

  • Anonymous
    March 29, 2007
    Hi, I don't have a C complier to rebuild the DSOFILE.dll as described by Emeric in May. Can anyone e-mail me the patched DSFILE.DLL to tjg001@tpg.com.au Regards ..... Trevor G

  • Anonymous
    May 16, 2007
    Hi, dso Document Properties does not match with document properties shows in File->property window in application. PLease check it. i think this bug in dso dll. Thanks Amit

  • Anonymous
    July 07, 2007
    Column handlers in XP do not look at the same property set fields as in Windows 2K. I have an application that allows you update propery fields like author and comments for a variety of file types including PDF's. The information is properly displayed for all file types in windows 2K. In windows XP, the author and comments fields are not displayed for multi-media files (jpg, gif etc.) in Windows Explorer although if you reinvoke my shell extension the values are there. Is there any documentation on other property sets? and which property sets are referenced by different column hanlders in Explorer?? Thanks, Tom

  • Anonymous
    July 18, 2007
    Hi, I have a problem with DSOFile 2.0,I am trying to identify that whether a macro is attached to my document using DSOFile. How this can be obtained. Regards Sandeep

  • Anonymous
    July 19, 2007
    Hi Erika, I have a problem with DSOFile 2.0,I am trying to identify that whether a macro is attached to my document using DSOFile. How this problem can be solved.Coz there is no direct property in DSOFile 2.0 to identify a macro in my document. looking for your reply,its urgent. Do any one else know the solution to this? Regards Sandeep

  • Anonymous
    August 06, 2007
    I installed the latest version of dsofile and find that using the FilePropDemoVB7 program and merely looking at a file adds ADS files to the file being looked at even while it shows you there are no extended properties. At least it did for me for .txt, .rtf and .htm files. Has anyone else experienced this? Hopefully dsofile will be fixed to not do this. This is a bug in either the demo program of dsofile is it not? Thanks, Dave

  • Anonymous
    August 06, 2007
    Also, is there anyway to get Vista explorer to show the new properties? I added a "description" to a .txt file and turned on "description" in the explorer view for that folder but did not see the description data I entered. Dave

  • Anonymous
    August 16, 2007
    Hi, I have used DSOFile.dll v2.1 to read and write the custom properties of the document. When opened the custom properties tab by right-clicking on the file, no properties are visible even though they are present with the file. These properties are visible if I open up the document and view the properties from the File menu. Why cannot I see them by right clicking on the file itself? That too, this is happening with only few documents. If I re-open the file using the following code snippet and closes it, it shows up the same custom properties by right-clicking them. But this solution doesn't help me in my project.


       If strWindowsFilePath.Substring(strWindowsFilePath.IndexOf(".") + 1, 3) = "doc" Then            Dim oWord As Word.Application            Dim oDoc As Word.Document            Dim oBuiltInProps As Object            Dim oCustomProps As Object            Dim oProp As Object            'Create Word Application            oWord = CreateObject("Word.Application")            'Open the document            oDoc = oWord.Documents.Open(strWindowsFilePath)            'Get custom properties collection            oCustomProps = oDoc.CustomDocumentProperties            'This will let the word know that the document should be saved            oWord.ActiveDocument.Saved = False            'Save changes in the document            oWord.ActiveDocument.Save()            'Quit Word            oWord.Quit(savechanges:=True)


Your help greately appreciated!!!

  • Anonymous
    October 03, 2007
    Thanks for this information, this was very useful to me, because I didn't really want to use Word Interop to simply set and get some document properties. Using DSO is a much cleaner and leaner solution.

  • Anonymous
    October 11, 2007
    Hi Pachara, The error that you (and probably others facing problems with Office 2007) are facing might be due to registration of the dll. If you change the location of dsofile.dll, after extracting it, you need to register it using: regsvr32 [File path]dsofile.dll (in Windows -> Run). Hope this helps.

  • Anonymous
    October 30, 2007
    The comment has been removed

  • Anonymous
    October 31, 2007
    replying to my previous issue. I have resovled it i was not using the correct file uploader so i jsut changed it to the  Mediachase.FileUploader and it worked. yeppie doooooooooooooooo Happy programming

  • Anonymous
    April 07, 2008
    Hi, I tried to Extract Text from Excel File, after Open the Excel file ,some files get extracted but some other not. now i am using VS2005 language C# am new for Automation Process

  • Anonymous
    May 14, 2008
    I am using dso.dll (ver 2.1) to read ole file property. But we did not use com dso dll. Actually we re-wrote our won general dll(not com). Using our own dll, we were able to read ole property of doc file but we had problem reading docx property. Here is code snippet extern "C" TZDSOFILE_API IStorage* GetIStorage(char *filename) { //Translate filename to Unicode. WCHAR wcFilename[1024]; setlocale( LC_ALL, "" ); int i = mbstowcs(wcFilename, filename, strlen(filename)); setlocale( LC_ALL, "C" ); wcFilename[i] = 0; IStorage *pStorage = NULL; HRESULT hr; // Open the document as an OLE compound document. hr = ::StgOpenStorageEx(wcFilename, NULL, STGM_READ | STGM_SHARE_EXCLUSIVE, NULL, 0, &pStorage); if(SUCCEEDED(hr)) return pStorage; else return NULL; } For doc file, this function successfully return IStorage but for docx file it returns null. Should we use different approach for docx file? Any idea... Highly Appreciated... Thank You

  • Anonymous
    May 15, 2008
    Sorry my previous function was wrong.  Here is the correct one extern "C" TZDSOFILE_API IStorage* GetIStorage(char *filename) { //Translate filename to Unicode. WCHAR wcFilename[1024]; setlocale( LC_ALL, "" ); int i = mbstowcs(wcFilename, filename, strlen(filename)); setlocale( LC_ALL, "C" ); wcFilename[i] = 0; IStorage *pStorage = NULL; HRESULT hr; // Open the document as an OLE compound document. hr = ::StgOpenStorage(wcFilename, NULL, STGM_READ | STGM_SHARE_EXCLUSIVE, NULL, 0, &pStorage); if(SUCCEEDED(hr)) return pStorage; else return NULL; }

  • Anonymous
    May 30, 2008
    Erika, I have been trying to query the custom properties of a word document using DSOFile, but I cannot get it to work correctly. Do you have any example C# code, which would allow me to query and edit these custom properties? Cheers, Intel96

  • Anonymous
    July 03, 2008
    Hello i m doing coding in C#. Is it possible to make Custom property (by programming)(word,excel,ppt)to hide from user or make read only - When user right click on file ,see custom property OR open file in word and see property. regards Urvish

  • Anonymous
    July 22, 2008
    Hi, I am using dsofile.dll to retrieve page count property for doc, xls and ppt documents. It provides default page count as 1 for all Word documents and 0 for all excel documents I need page count for Word document and excel sheet count for excel documents. //Sample Code int noofPages; DSOFile.OleDocumentPropertiesClass objDocument = null; objDocument.Open (uploadedFile, true, dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess); noofPages = objDocument.SummaryProperties.PageCount

  • Anonymous
    July 22, 2008
    Hi, I am using dsofile.dll to retrieve page count property for doc, xls and ppt documents. It provides default page count as 1 for all Word documents and 0 for all excel documents I need page count for Word document and excel sheet count for excel documents. //Sample Code int noofPages; DSOFile.OleDocumentPropertiesClass objDocument = null; objDocument.Open (uploadedFile, true, dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess); noofPages = objDocument.SummaryProperties.PageCount Pls let me know, How can i get the Page Count for the above documents Thanks and Regards Ganesh

  • Anonymous
    October 09, 2008
    Do anyone tried this in Vista. It does not seem to work in Vista. Please Suggest....

  • Anonymous
    November 08, 2008
    The comment has been removed

  • Anonymous
    November 10, 2008
    The comment has been removed

  • Anonymous
    May 20, 2009
    hi there, I am trying to read the file content i.e. text inside the word document to save it into a database, however it returns error saying : document.Template = null following is my code for the same : // dsofile operation to read word docs             DSOFile.OleDocumentPropertiesClass mDoc = null;             DSOFile.SummaryProperties docContent;             //  DSOFile.CustomProperties dcContent;             mDoc = new DSOFile.OleDocumentPropertiesClass();             mDoc.Open(filePath, true, DSOFile.dsoFileOpenOptions.dsoOptionDefault);             docContent = mDoc.SummaryProperties;             strMsg = Convert.ToString(docContent.Template);             mDoc.Close(false);             strMsg = strMsg.Replace("r", "<br>");             strMsg = strMsg.Replace("n", "<br>");

  • Anonymous
    September 28, 2009
    I am able to set the properties into a given file. The problem is if I send that file via email to anybody, the properties as well as the custom properties I would have written are getting lost. My requirement is one shoudl be able to set the properties programatically and then they should persist across email exchanges. How do we handle this?

  • Anonymous
    January 11, 2010
    Context:  a Visual Basic 6 program to modify the custom properties of a Word document using dsofile.dll as described at http://support.microsoft.com/kb/224351 Symptom:  attempting to add properties to a document that initially has no custom properties fails when calling dsofile.OleDocumentProperties.save.  Yes, I mean save not add! My assertion of a bug:  dsofile.dll treats files differently depending upon whether they have zero custom properties or at least one custom property present when the program runs. Narrative: My program opens a Word document with Private m_oTargetDocProps As dsofile.OleDocumentProperties m_oTargetDocProps.Open filename, False, dsoOptionDontAutoCreate (reminder:  the file just opened has no custom properties) it attempts to add properties with m_oTargetDocProps.CustomProperties.Add propertyName, propertyValue This call to add does NOT cause an error.  Inspection of the object named m_oTargetDocProps with the Visual Basic object browser shows custom properties present (because the program added them). it attempts to save the properties into the file with m_oTargetDocProps.Save Save fails with Error -2147217147 (&H80041105): The command is not available because document was opened in read-only mode. Code workaround:  I know of no workaround in code unless one is inclined to build a modified dsofile.dll as suggested at http://www.codecomments.com/message813451.html Manual workaround:  use Word to add a custom property, the first and only custom property to the document.  I just named the property "dummy" with a value of "dummy text". Expedient suggestion:  have a program trap the situation just described and ask the human to use Word to add a dummy custom property. A related mystery: Ironically, if the above problem is avoided (because there was at least one custom property in the file) and if, according to Windows Explorer, the file is read-only, dosfile.dll is happy to ignore the file system's read-only attribute and modify the custom properties!  How does that work?  Isn't the file system supposed to prevent writing to a file with the read-only flag set? search words: bug dsofile.dll version 2.1 modifies document properties even though Windows Explorer says file is read-only. dsofile.OleDocumentProperties.save fails with read only error even though file is writable empty dsofile.CustomProperty list Author can be reached by reversing the letters of my name: ttoillej at validatedsoftware dot com

  • Anonymous
    March 19, 2010
     It may be my imagination but I just upgraded from Windows XP to Windows 7 and the dsofile.dll fails to even display, much less change, the extended properties for JPG files. I had a great utility to change batches of photo properties with 'Subject' and 'Comments' to document them and now it is broke.  Tell me again why we use Microsoft at all?