Partilhar via


Binary to Open XML (B2X) Translator: Interoperability for the Office binary file formats

[05/18- Update:
this translator is highlighted in today's Document Interoperability Inititice (DII) event that just happened in London ]

In support of Microsoft’s ongoing efforts to increase the interoperability of its various technologies, we have partnered with Dialogika to create a translator that converts the Microsoft Office binary file formats (.DOC, .XLS, and .PPT) into the Office Open XML standard format (.DOCX, .XLSX, .PPTX).

A majority of the world’s documents are available in the binary Office formats and, for developers working with these formats (including .DOC, .PPT, and .XLS.), Microsoft published the specifications under the Open Specification Promise (OSP) in June 2008.

1

A new version of the Binary to Open XML (B2X) Translator has just been released ; this version adds support for PowerPoint (.PPT) and Excel (.XLS) files:

Supported .XLS Features

Supported .PPT Features

  • Shared Formulas
  • String Formatting
  • Data Type Formatting (number, date, currency, etc.)
  • Cell Formatting

 

  • Textbox Formatting
  • Shapes
  • Animations
  • Notes (including Formatting)

(Detailed features http://b2xtranslator.sourceforge.net/architecture.html#mapping )

From an architectural point of view, the translator can be seen as a series of pipelines during which transformation steps are applied to translate from the binary to Open XML format:

2

(more details on http://b2xtranslator.sourceforge.net/architecture.html )

While it has been possible to manually convert documents between formats by opening the file in the relevant application and saving in the other format, before the release of the translator there was no software tool to automate this task as a stand-alone application, or in a batch mode.

So from the end-user point of view the translator offers two options:

3

While using Windows’ context menus to translate the files is self-explanatory (right-click, convert to…) doing so from the command line warrants a bit more study. The command line utility consists of three separate executables, one for each file type (ppt2x.exe for spreadsheet, doc2x.exe for document, and xls2x.exe for presentation). The executables use the same command line syntax, and support the usual basic command line options:
4
This includes the input filename, output filename, and the level of debug verbosity. The resulting command is easy to include in automation scripts, and batch processes.

The command-line architecture allows the translators to be integrated into existing systems such as document management systems running on a server.

Using the source of B2X translator (ppt2x.exe, doc2x.exe, xls2x.exe), you can rebuilt them using the .NET Framework on Windows or Mono on Linux, thus ensuring portability across operating systems and platforms.

As an open source project, the Translator is a solid foundation for engineering work around the Office binary format. Dialogika’s development team has put together a few “how to” guides, including the Freeform Shapes in the Office Drawing Format guide, that helps to explain the specification and give some valuable tips. For developers and ISVs the code of this translator can be reused in their own applications enabling a wide range of document interoperability solutions.

We’re excited by this latest release making the translators more functional and addressing practical document conversion scenarios. Of course, there’s still work ahead of us! We are currently in the planning stage for the next version. In addition to the goals outlined above, it is very important to us that the translator adequately addresses practical user scenarios. To this end, we would love to hear feedback on this release as well as your feature requests for the next version. Please provide your feedback on the Sourceforge site.

Comments

  • Anonymous
    May 11, 2009
    PingBack from http://asp-net-hosting.simplynetdev.com/binary-to-open-xml-b2x-translator-interoperability-for-the-office-binary-file-formats/

  • Anonymous
    May 12, 2009
    I installed it on Windows 7 RC.  The doc2x program crashed as soon as it was started.

  • Anonymous
    May 13, 2009
    Hi Ian, We are not able to reproduce the problem 32-bit Windows 7 RC (build 7100). Can you please share the details of your environment - the platfom (32 bit vs 64 bit), build of Windows, build of B2X Translator that you are using, and the steps that lead to the error. Also, do you get an error message with the crash? Thanks, Sumit

  • Anonymous
    May 13, 2009
    I just tried it again.  It failed, but for a different reason. I am running 32-bit Windows 7 RC. The build of B2X translator is the one available just after your blog posting came out. Here is what I did:

  • Installed the software, with no errors.
  • Ran cmd as an administrator.
  • CD's to a directory with some .doc files in it.
  • Tried the doc2x.exe command, but it wasn't in my PATH, so it didn't find it.
  • I didn't feel like updating PATH variable, so instead I just copied it from the installation directory to the directory I was in with the .doc files
  • Issued the command "doc2x.exe", with no paramaters. The first time I tried it (yesterday), it just crashed with no messages. Today, I got the message "doc2x.exe is not recognized as an internal or external command, operable program or batch file." As a result, I checked permissions on it, and everything seems OK.
  • Anonymous
    May 13, 2009
    Ian, thanks for the additional information. A couple things to check -
  1. doc2x.exe has dependencies that won't be loaded if you copy just doc2x.exe... can you please try running doc2x.exe from the directory that it is installed in. The default location is "C:Program Files (x86)DIaLOGIKab2xtranslator"
  1. Also, can you also verify that you chose the b2xtranslator_setup_r478.msi package for installing the translator on x86 Win7 RC? Thanks, Sumit
  • Anonymous
    May 14, 2009
    The comment has been removed

  • Anonymous
    May 14, 2009
    Hi Ian, I am glad it worked for you! We'll look into the right-click selection problem. You may find it easier to create batch script to convert multiple docs. If you run into further issues, I request that you start a discussion on the project page on Source Forge. The URL is http://sourceforge.net/forum/forum.php?forum_id=781705 Thanks for using the B2X translator. We appreciate your feedback. Sumit

  • Anonymous
    May 29, 2009
    The comment has been removed

  • Anonymous
    June 03, 2009
    Hi Iain, How was the document created? Are you able to share the document with the project team for the sole purpose of reproducing the issue? Please submit your comments and any files you can share with the B2X Translator team on project's forum at  http://sourceforge.net/forum/forum.php?forum_id=781705 We appreciate your feedback. Thanks, Sumit

  • Anonymous
    August 15, 2009
    Microsoft has gone to great length to provide the interoperability of its technologies. This conversion of Microsoft Office binary file formats into the Office Open XML standard format is yet another step. I'm comfortable with the default .doc, .xls and .ppt formats and doesn't need to go additional step to adopt Open standard format.

  • Anonymous
    August 23, 2009
    The comment has been removed

  • Anonymous
    February 09, 2010
    Hi, i have problem with doc to docx convertion in amd64 architec. my debug message Welcome to doc2x (r649) Copyright (c) 2009, DIaLOGIKa. All rights reserved. 02/10/2010 14:18:41 [W] Unexpected length of DOP (544 bytes) in input file. 02/10/2010 14:18:41 [I] Converting file /tmp/Validation.doc into /tmp/Validation.docx 02/10/2010 14:18:41 [E] Conversion of file /tmp/Validation.doc failed. 02/10/2010 14:18:41 [D] System.DllNotFoundException: zlibwapi.dll  at (wrapper managed-to-native) DIaLOGIKa.b2xtranslator.ZipUtils.ZipLib:zipOpen (string,int)  at DIaLOGIKa.b2xtranslator.ZipUtils.ZlibZipWriter..ctor (System.String path) [0x00000]  at (wrapper remoting-invoke-with-check) DIaLOGIKa.b2xtranslator.ZipUtils.ZlibZipWriter:.ctor (string)  at DIaLOGIKa.b2xtranslator.ZipUtils.ZipFactory.CreateArchive (System.String path) [0x00000]  at DIaLOGIKa.b2xtranslator.OpenXmlLib.OpenXmlWriter.Open (System.String fileName) [0x00000]  at DIaLOGIKa.b2xtranslator.OpenXmlLib.OpenXmlPackage.Close () [0x00000]  at DIaLOGIKa.b2xtranslator.OpenXmlLib.OpenXmlPackage.Dispose () [0x00000]  at DIaLOGIKa.b2xtranslator.WordprocessingMLMapping.Converter.Convert (DIaLOGIKa.b2xtranslator.DocFileFormat.WordDocument doc, DIaLOGIKa.b2xtranslator.OpenXmlLib.WordprocessingML.WordprocessingDocument docx) [0x00000]  at DIaLOGIKa.b2xtranslator.doc2x.Program.Main (System.String[] args) [0x00000] Please help me on this Thank Manoj

  • Anonymous
    June 11, 2012
    Dear team, I am currently converting my files with excelcnv.exe and ppcnvcom.exe & wordcnv.exe from the FileConversionPack. Now, your solution is easier to use (context-menu integration), but it seems that you have developed your own conversion engine and that, I quote, "there’s still work ahead of us!" Now, I suppose not all the file features are supported (cfr mapping)? My question is, what tool is better to use? Is the end-result the same between the bt2x translator and the converters from the FileConversionPack? If not, which one is better to use to have a 100% OpenXML file? Thank you in advance, Quentin Denis

  • Anonymous
    May 26, 2013
    Hi, I am not able to find out the sample code to convert the DOC to DOCX file. Please share the sample code if available Thanks