7 Converting Microsoft Office Files to HTML

Inbound Refinery can convert native Microsoft Office files to HTML by using the native Microsoft Office applications installed on a Windows system. Content Server can be installed on either a Windows or UNIX platform, but for Microsoft Office to HTML conversions to work, Inbound Refinery must be configured on the Windows system where the Microsoft Office native applications are installed. Microsoft Office to HTML conversions require the following components to be installed and enabled on the specified server:

Component Name Component Description Enabled on Server
HtmlConverter Enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, Powerpoint and Visio to HTML using the native Office application. Inbound Refinery Server
MSOfficeHtmlConverterSupport Enables Content Server to support HTML conversions of native Microsoft Office files converted by Inbound Refinery and returned to Content Server in a ZIP file. Requires that ZipRenditionManagement component be installed on the Content Server. Content Server
ZipRenditionManagement Enables Content Server access to HTML renditions created and compressed into a ZIP file by Inbound Refinery. Content Server

This section discusses how to work with Microsoft Office to HTML conversions and how to troubleshoot the conversion process. The section contains the following sections:

7.1 About HTML Converter

HTML conversion automates opening Microsoft office files in their native application, saves them out as HTML pages, then collects the HTML output into a compressed ZIP file that gets returned to Content Server.

HTML conversion can process the following types of files:

  • Microsoft Word 2003 through 2010

  • Microsoft Excel 2003 through 2010

  • Microsoft Powerpoint 2003 through 2010

  • Microsoft Visio 2007

Because Microsoft applications are Windows only, the Inbound Refinery used for HTML conversion must be installed on a Windows system. The Content Server connecting to the Inbound Refinery provider can be installed on either Windows or UNIX.

7.1.1 HTML Converter Process Overview

When HTML Converter is enabled to work with Inbound Refinery, native Microsoft Office files checked into Content Server are sent to Inbound Refinery for conversion. Inbound Refinery automates the process of converting the files to HTML using the native Microsoft Office applications. If a single HTML page is returned to Content Server, it is used as the web-viewable file. If conversion results in multiple HTML pages, the following files are returned to Content Server:

  • An HCSP page as the primary web-viewable rendition

  • A ZIP file that includes the HTML output from the Office application

  • Optionally, a thumbnail rendition of the native Microsoft Office file

When a user clicks on the web-viewable link in Content Server of a document converted to multiple HTML pages by Inbound Refinery, the HCSP page redirects the server to the HTML rendition.

7.2 Configuring HTML Converter Settings

This section covers the following topics:

7.2.1 Configuring Content Servers to Send Jobs for HTML Conversion

File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options. Installing and enabling the MSOfficeHtmlConverterSupport component on a content server adds four HTML conversion options on the File Formats Wizard page:

  • Word HTML

  • PowerPoint HTML

  • Excel HTML

  • Visio HTML

For a Microsoft Office document to be processed by Inbound Refinery, its file extension must be mapped to a format name that is associated with the HTML Conversion method. The added conversion options for HTML Conversion are not automatically mapped: they must be mapped manually. They can be set either using the File Formats Wizard or the Configuration Manager applet. The Configuration Manager applet gives you greater control over which file extenstions are mapped to which conversion options. Information on setting these mappings is covered in the following sections:

7.2.1.1 Using the File Formats Wizard

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. To make changes, complete the following steps:

  1. Make sure you are logged into Content Server as an administrator.

  2. In the navigation menu, select Administration, Refinery Administration and File Formats Wizard. The File Formats Wizard for <server name> page is displayed.

  3. Select the Microsoft Office document file types you want to convert to HTML. The Conversion column lists the appropriate conversion option according to the file type. For example:

    • Word for doc, docx, dot, dotx

    • PowerPoint for ppt, pptx

    • Excel for xls, xlsx

    • Visio for vsd

  4. Click Update to save your changes.

  5. Make sure you are logged into the Inbound Refinery as an administrator.

  6. In the navigation menu, select Conversion Settings and then Primary Web Rendition. The Primary Web Rendition page is displayed.

  7. Enable Convert selected MS Office formats to MS HTML.

  8. Click Update.

7.2.1.2 Using the Configuration Manager

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes, complete the following steps:

  1. Make sure you are logged into Content Server as an administrator.

  2. In the navigation menu, select Adminstration, Admin Applets. The Administration Applets for <server name> page is displayed.

  3. Click Configuration Manager. The Configuration Manager applet is started.

  4. Select File Formats from the Options menu.

  5. Select the application format for the Office document type you want to convert from the Format column. For example, for Microsoft Word, you would select application/msword.

  6. Click Edit. The Edit File Format dialog box is displayed.

  7. Select the HTML conversion option from the Conversion list appropriate to the Office document format you selected. For example, for application/msword, you would select the conversion option Word HTML.

  8. Click OK.

  9. Repeat steps 5 through 8 for all Microsoft Office formats you want to convert to HTML.

  10. When finished, click Close to close the File Formats page and then close the Configuration Manager.

  11. Restart Content Server and Inbound Refinery.

7.2.2 Setting Accepted Conversions

When installed on the refinery, the HTML Converter adds the Word HTML, PowerPoint HTML, Excel HTML, and Visio HTML option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the content server.