Compartir a través de


More on cleaning up Word's HTML

Brian Alvey points to a Word HTML cleaner, as well as explains why Word puts all those extra tags into HTML when you save a word doc as HTML, or copy from Word into another HTML editor.

Also see my previous post about reducing the size of the HTML that Word creates when used as an email editor.

[via Scoble]

Comments

  • Anonymous
    January 29, 2004
    Too bad there is no source code to clean up Word's HTML. Or is there somewhere? Someone?
  • Anonymous
    January 29, 2004
    I think HtmlTidy can clean up word HTML, and they have .NET and COM bindings. You can look at the source too if you really want to.

    http://tidy.sourceforge.net/
  • Anonymous
    January 29, 2004
    I found that the combination of the Office 2000 HTML Filter 2.0 (http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN) and the Textism web tool worked well for me. I had more than 20K, so the Office tool got me part way and the Textism filter did the remaining cleanup.

    The tool says it's for Office 2000, but worked okay on Office XP.

    This should be a feature of Word - export as clean HTML or something.
  • Anonymous
    January 29, 2004
    The comment has been removed
  • Anonymous
    January 29, 2004
    Oops. Lucky. Thanks. You can delete my post to cover for me any time... (in my defence, I'm stuck using Office 2000 at work)
  • Anonymous
    March 23, 2004
    I already used the service (Textism) by paying annual subscription and that service is not what it promises.

    The servers are always down and it only cleans part of the code.

    There are still a lot of problems with login issues and misrepresentation.

    Because og being a web-based service, he cuts the service whenever he desires manipulating access. I only got to use it 4 months and we were cut off without previous notice.

    He does not have ethics and professionalism.

    I would never recommend it. But I will recommend Word HTML Cleaner by Mambosoft; at last you get to keep the software and servers down won't interfere. You can also have lifetime FREE update.

    My suggestion: use Textism only to clean small files for free, that is, one-page documents, but not worthy to pay.

    Magnolia