Dreamweaver

Clean up Microsoft Word HTML files

You can open documents saved by Microsoft Word as HTML files, and then use the Clean Up Word HTML command to remove the extraneous HTML code generated by Word. The Clean Up Word HTML command is available for documents saved as HTML files by Word 97 or later.

The code that Dreamweaver removes is primarily used by Word to format and display documents in Word and is not needed to display the HTML file. Retain a copy of your original Word (.doc) file as a backup, because you may not be able to reopen the HTML document in Word once you’ve applied the Clean Up Word HTML feature.

To clean up HTML or XHTML that was not generated by Microsoft Word, use the Cleanup HTML command.

  1. Save your Microsoft Word document as an HTML file.
    Note: In Windows, close the file in Word to avoid a sharing violation.
  2. Open the HTML file in Dreamweaver.

    To view the HTML code generated by Word, switch to Code view (View > Code).

  3. Select Commands > Clean Up Word HTML.
    Note: If Dreamweaver is unable to determine which version of Word was used to save the file, select the correct version from the pop‑up menu.
  4. Select (or deselect) options for the cleanup. The preferences you enter are saved as default cleanup settings.

    Dreamweaver applies the cleanup settings to the HTML document and a log of the changes appears (unless you deselected that option in the dialog box).

    Remove All Word Specific Markup
    Removes all Microsoft Word-specific HTML, including XML from HTMLtags, Word custom meta data and link tags in the head of the document, Word XML markup, conditional tags and their contents, and empty paragraphs and margins from styles. You can select each of these options individually using the Detailed tab.

    Clean Up CSS
    Removes all Word-specific CSS, including inline CSS styles when possible (where the parent style has the same style properties), style attributes beginning with “mso,” non-CSS style declarations, CSS style attributes from tables, and all unused style definitions from the head. You can further customize this option using the Detailed tab.

    Clean Up <font> Tags
    Removes HTML tags, converting the default body text to size 2 HTML text.

    Fix Invalidly Nested Tags
    Removes the font markup tags inserted by Word outside the paragraph and heading (block-level) tags.

    Set Background Color
    Allows you to enter a hexadecimal value to set the background color of your document. If you do not set a background color, your Word HTML page will have a gray background. The default hexadecimal value is white.

    Apply Source Formatting
    Applies the source formatting options you specify in HTML Format preferences and SourceFormat.txt to the document.

    Show Log On Completion
    Displays an alert box with details about the changes made to the document as soon as the cleanup is finished.

  5. Click OK, or click the Detailed tab if you want to further customize the Remove All Word Specific Markup and Clean Up CSS options, and then click OK.