2 Document Handling

This chapter describes document handling.

The following topics are covered in this section:

2.1 About Document Handling

Document Handling is one of the four main menu options in the HTML Conversion Editor. This page contains tabs that allow you to set format options for the converted HTML output of six different types of files.

  • Text/Word Processing options apply to text files, word-processing documents, and emails.

  • Image options apply to graphic file types, such as JPEG, BMP, and PDF.

  • Presentation options apply to presentation file types.

  • Spreadsheet options apply to spreadsheet file types such as Excel and Quattro Pro.

  • Database options apply to database file types.

  • Archive options apply to archive file types such as ZIP and PST.

Email messages are handled as a separate document type, but are currently treated as a subset of word-processing documents. The following WP options are applied to email messages:

  • layout

  • Ignore unnamed character styles

  • Generate list bullets using <li> tags

  • Break page when hard page break is encountered

  • Page size

Also, attachments are always converted as separate output files, with a link to each attachment displayed in the email header.

Unsupported file formats are processed using the word-processing options.

A layout can be specified for each document type, to allow the template author to customize the appearance of the output to the type of document being converted.

For multi-section documents such as graphic files, presentations, spreadsheets and databases, the template author can specify a default section label to use if the document does not specify one, as well as specifying formatting for the section label. A simple numbering scheme is used; for example, if "Sheet" is specified as the default title of a spreadsheet, unnamed sections will be titled "Sheet 1", "Sheet 2", and so on, according to the index of the section.

2.2 Formatting a Text/Word Processing File

The Text/Word Processing Tab lets you set options that define the layout of HTML output from conversion of a text or word processing file. The following sections describe how you set up the layout:

2.2.1 Creating a Layout

The first section of the Text/Word Processing Tab allows you to control the overall look of a converted text/word processing file.

  1. Navigate to Output Pages > Output Page Layouts, and click Add. The Output Page Layouts page appears.
  2. In the Name field, enter a name to define this layout, for example, text/wp.
  3. If you want to include a navigation layout with the page layout, click the box in front of Include navigation layout. This option allows the user to create a "table of contents" page.
  4. Navigate back to the Text/Word Processing Tab of the Document Handling page. Click the down arrow in the Layout drop-down box, and select the name of the page layout that you just created (text/wp).
  5. If you place a check in front of Generate list bullets without <li> tags (default), bullets/numbers for each list entry will be generated instead of using HTML list tags. Generated bullets/numbers tend to have higher fidelity to the look of the original document, but do so at the expense of not placing the list in HTML list tags. The generated bullets/numbers as well as the paragraph text will be treated as being ordinary paragraph text. If this option is not checked, then HTML list tags are used, but the types of bullets and numbers seen in the browser are severely limited.
  6. If you place a check in front of Include footnotes and endnotes (default), footnotes and endnotes from the input document are included in the output. Footnotes are always placed at the end of the HTML page on which the note is referenced. Endnote placement is determined by the Separate endnotes option, shown next.
  7. If you place a check in front of Separate endnotes (default), endnotes are placed on a separate page at the end of the document. Otherwise, endnotes are handled as being a continuation of the last page of output.  The Break on pages option must be enabled in the Page Layout for this option to take effect.
  8. If you place a check in front of Ignore unnamed character styles, formatting from unnamed character styles in the source document is ignored and will not be present in the output. This is useful when the template author wishes to override the formatting of paragraphs in the output via format mapping. Without this option, formatting specified in the template could potentially be overridden if the source document was not well-formatted, for example by the document writer selecting a block of text and applying a character format to it rather than specifying a style.
  9. Page width and Page margins can be specified, inherited from the source document, or omitted.
  10. In the Note separator field, enter any HTML markup that you want to put between the body and the footnotes/endnotes at the end of the page (such as a rule). The default is to do nothing unless HTML markup is specified here.

2.2.2 Handling Embedded Graphics

These options allow you to specify how to handle embedded graphics on HTML output from text/word processing files.

  1. In the Set maximum width (in pixels) to x field, enter the number of pixels by which you want to shrink overly large images to no more than this many pixels wide. If a change is needed, the aspect ratio of the source image is preserved. Setting this to zero (0) means there is no limit.
  2. In the Set maximum height (in pixels) to x field, enter the number of pixels by which you want to shrink overly large images to no more than this many pixels high. If a change is needed, the aspect ratio of the source image is preserved. Setting this to zero (0) means there is no limit.
  3. In the Set maximum size (in pixels) to x field, enter the number of pixels by which you want to shrink overly large images to no more than this many pixels in area. If a change is needed, the aspect ratio of the source image is preserved. Setting this to zero (0) means there is no limit.

2.2.3 Setting Pagination Options

The pagination options allow you to control how page breaks are handled in the converted HTML output.

  1. Navigate to Output Pages > Output Page Layout > [your layout name, e.g., text/wp] > Page Layout. Place a check in front of the Break on pages option so that pagination may occur in the HTML output.
  2. Navigate back to the Text/Word Processing Tab of the Document Handling page. If you place a check in front of Split file into multiple pages, and place a check in front of Split file at hard page breaks, the output will be split into additional output files every time a hard page break is encountered in the source document. Note that tables will never be split across page boundaries.
  3. Pages can be defined by hard page breaks and/or a word count. In the x characters per page field, you may set a suggested page size in characters for the output generated. This means that the output is broken up into pages of approximately the requested size in numbers of characters in the source document.

    This feature is particularly useful when converting documents that are poorly structured. Many documents lack the kind of style information that Dynamic Converter uses to break the document into "output markup items." By setting this option, the exported document can be presented as a set of pieces that are more manageable rather than a single large output file. It is also useful with documents that are structured but have large pieces in the structure.

    Only text inserted from the source document is counted in this size; markup and text inserted from the template are not counted. In addition, HTML markup tags generated by this product are not counted in the page size. Setting this option to zero effectively turns it off.

    When this option is turned on, the page breaking rules are as follows:

    • A page boundary will be created if the current paragraph causes the page size to be exceeded. Therefore, the normal operation of this feature may cause pages to exceed the specified page size by small amounts, i.e., less than a paragraph.

    • A page boundary will never be created in the middle of a table, list, footnote, or endnote.

2.3 Formatting an Image File

The Image Tab lets you set options that define the layout of HTML output from conversion of an image file. The following sections describe how you set up the layout:

2.3.1 Creating a Layout

The first section of the Image Tab allows you to control the overall look of a converted image file.

  1. Navigate to Output Pages > Output Page Layout, and click Add. The Output Page Layouts page appears.
  2. In the Name field, enter a name to define this layout (for example, grafx).
  3. Navigate back to the Image Tab of the Document Handling page. Click the down arrow in the Layout drop-down box, and select the name of the page layout that you just created (grafx).
  4. If you want to define a format to use for a Section Title (instead of using the default paragraph style), first navigate to Output Pages > Output Text Formats, and click Add. The Output Text Formats page appears.
  5. In the Name field, enter a name to define this format, for example, Section Title. You can also specify a tag to use for this paragraph format, and custom markup to be inserted before or after the paragraph.
  6. Navigate back to the Image Tab of the Document Handling page. Click the down arrow in the Section title format drop-down box, and select the name of the text format that you just created (Section Title).
  7. In the Default section label text box, specify the default label to be used in section-based navigation.

2.3.2 Sizing Images

The following options enable you to control the sizing of images in the HTML output of image conversions.

  1. Place a check in front of Set exact width (in pixels) to x and specify a value by which the conversion will shrink or enlarge the image so that the width exactly matches the specified value. If Set exact height (in pixels) to x is not set, then the image will be scaled without altering its original aspect ratio. Setting this option to zero causes it to be ignored.
  2. Place a check in front of Set exact height (in pixels) to x and specify a value by which the conversion will shrink or enlarge the image so that the height exactly matches the specified value. If Set exact width (in pixels) to x is not set, then the image will be scaled without altering its original aspect ratio. Setting this option to zero causes it to be ignored.
  3. Place a check in front of Set maximum width (in pixels) to x and specify a value by which the conversion will shrink very large images to no more than this many pixels wide. If a change is needed, the aspect ratio of the source image is preserved. The default (0) means there is no maximum width.
  4. Place a check in front of Set maximum height (in pixels) to x and specify a value by which the conversion will shrink very large images to no more than this many pixels high. If a change is needed, the aspect ratio of the source image is preserved. The default (0) means there is no maximum height.
  5. Place a check in front of Set maximum size (in pixels) to x and specify a value by which the conversion will shrink very large images to no more than this many pixels in area. If a change is needed, the aspect ratio of the source image is preserved. The default (0) means there is no maximum size.
  6. Under Pagination, place a check in front of Split file into multiple pages to cause each image in the file to be output on a separate page.

2.4 Formatting a Presentation File

The Presentation Tab lets you set options that define the layout of HTML output from conversion of a presentation file. The following sections describe how you set up the layout:

2.4.1 Creating a Layout

The first section of the Presentation Tab allows you to control the overall look of a converted presentation file.

  1. Navigate to Output Pages > Output Page Layout, and click Add. The Output Page Layouts page appears.
  2. In the Name field, enter a name to define this layout (for example, slides).
  3. Navigate back to the Presentation Tab of the Document Handling page. Click the down arrow in the Layout drop-down box, and select the name of the page layout that you just created (slides).
  4. If you want to define a format to use for a Section Title (instead of using the default paragraph style), first navigate to Output Pages > Output Text Formats, and click Add. The Output Text Formats page appears.
  5. In the Name field, enter a name to define this format (for example, Section Title). You can also specify a tag to use for this paragraph format, and custom markup to be inserted before or after the paragraph.
  6. Navigate back to the Presentation Tab of the Document Handling page. Click the down arrow in the Section title format drop-down box, and select the name of the text format that you just created (Section Title).
  7. In the Default section label text box, specify the default label to be used in section-based navigation.

2.4.2 Sizing Slides

The following options enable you to control the sizing of the slides in the HTML output of presentation file conversions.

  1. Place a check in front of Set exact width (in pixels) to x and specify a value by which the conversion will shrink or enlarge the slide so that the width exactly matches the specified value. If Set exact height (in pixels) to x is not set, then the slide will be scaled without altering its original aspect ratio. Setting this option to zero causes it to be ignored.
  2. Place a check in front of Set exact height (in pixels) to x and specify a value by which the conversion will shrink or enlarge the slide so that the height exactly matches the specified value. If Set exact width (in pixels) to x is not set, then the slide will be scaled without altering its original aspect ratio. Setting this option to zero causes it to be ignored.
  3. Place a check in front of Set maximum width (in pixels) to x and specify a value by which the conversion will shrink very large slides to no more than this many pixels wide. If a change is needed, the aspect ratio of the source slide is preserved. The default (0) means there is no maximum width.
  4. Place a check in front of Set maximum height (in pixels) to x and specify a value by which the conversion will shrink very large slides to no more than this many pixels high. If a change is needed, the aspect ratio of the source slide is preserved. The default (0) means there is no maximum height.
  5. Place a check in front of Set maximum size (in pixels) to x and specify a value by which the conversion will shrink very large slides to no more than this many pixels in area. If a change is needed, the aspect ratio of the source slide is preserved. The default (0) means there is no maximum size.
  6. Under Pagination, check the box in front of Split file into multiple pages to cause each slide in the document to be output on a separate page.

2.5 Formatting a Spreadsheet File

The Spreadsheet Tab lets you set options that define the layout of HTML output from conversion of a spreadsheet file. The following sections describe how you set up the layout:

2.5.1 Creating a Layout

The first section of the Spreadsheet Tab allows you to control the overall look of a converted spreadsheet file.

  1. Navigate to Output Pages > Output Page Layouts, and click Add. The Output Page Layouts page appears.
  2. In the Name field, enter a name to define this layout (for example, ss).
  3. Navigate back to the Spreadsheet Tab of the Document Handling page. Click the down arrow in the Layout drop-down box, and select the name of the page layout that you just created (ss).
  4. If you want to define a format to use for a Section Title (instead of using the default paragraph style), first navigate to Output Pages > Output Text Formats, and click Add. The Output Text Formats page appears.
  5. In the Name field, enter a name to define this format (for example, Section Title). You can also specify a tag to use for this paragraph format, and custom markup to be inserted before or after the paragraph.
  6. Navigate back to the Spreadsheet Tab of the Document Handling page. Click the down arrow in the Section title format drop-down box, and select the name of the text format that you just created (Section Title).
  7. In the Default section label text box, specify the default label to be used in section-based navigation.

2.5.2 Formatting Tables

There are two options on the Spreadsheet Tab that apply to how spreadsheet tables are formatted in HTML output.

  1. Check the box in front of Show grid lines to force the table representing the spreadsheet to be output with a "border" value of "1."
  2. Under Pagination, check the box in front of Split file into multiple pages to cause each table in the document to be output on a separate page. If the rows per page option is set to a nonzero value, the spreadsheet will be rendered as a series of tables containing up to the specified number of rows on each page.

2.5.3 Handling Embedded Graphics

These options allow you to specify how to handle embedded graphics on HTML output from spreadsheet files.

  1. In the Set maximum width (in pixels) to x field, enter the number of pixels by which you want to shrink overly large images to no more than this many pixels wide. If a change is needed, the aspect ratio of the source image is preserved. Setting this to zero (0) means there is no limit.
  2. In the Set maximum height (in pixels) to x field, enter the number of pixels by which you want to shrink overly large images to no more than this many pixels high. If a change is needed, the aspect ratio of the source image is preserved. Setting this to zero (0) means there is no limit.
  3. In the Set maximum size (in pixels) to x field, enter the number of pixels by which you want to shrink overly large images to no more than this many pixels in area. If a change is needed, the aspect ratio of the source image is preserved. Setting this to zero (0) means there is no limit.

2.6 Formatting a Database File

The Database Tab lets you set options that define the layout of HTML output from conversion of a database file. The following sections describe how you set up the layout:

2.6.1 Creating a Layout

The first section of the Database Tab allows you to control the overall look of a converted database file.

  1. Navigate to Output Pages > Output Page Layout, and click Add. The Output Page Layouts page appears.
  2. In the Name field, enter a name to define this layout (for example, db).
  3. Navigate back to the Database Tab of the Document Handling page. Click the down arrow in the Layout drop-down box, and select the name of the page layout that you just created (db).
  4. If you want to define a format to use for a Section Title (instead of using the default paragraph style), first navigate to Output Pages > Output Text Formats, and click Add. The Output Text Formats page appears.
  5. In the Name field, enter a name to define this format, for example, Section Title. You can also specify a tag to use for this paragraph format, and custom markup to be inserted before or after the paragraph.
  6. Navigate back to the Database Tab of the Document Handling page. Click the down arrow in the Section title format drop-down box, and select the name of the text format that you just created (Section Title).
  7. In the Default section label text box, specify the default label to be used in section-based navigation.

2.6.2 Setting Pagination Options

This section allows you to set pagination options for the HTML output of a converted database file.

  1. Under Pagination, check the box in front of Split file into multiple pages to cause each record in the document to be output on a separate page.
  2. Enter a value in the x records per page text box to specify the number of records to output on each page.

2.7 Formatting an Archive File

The Archive Tab lets you set options that define the layout of HTML output from conversion of an archive file. The following sections describe how you set up the layout:

2.7.1 Creating a Layout

The first section of the Archive Tab allows you to control the overall look of a converted spreadsheet file.

  1. Navigate to Output Pages > Output Page Layout, and click Add. The Output Page Layouts appears.
  2. In the Name field, enter a name to define this layout (for example, archive).
  3. Navigate back to the Archive Tab of the Document Handling page. Click the down arrow in the Layout drop-down box, and select the name of the page layout that you just created (archive).

2.7.2 Setting Display Method

There are two options in the Display method drop-down box:

  • If you select file names, the names of files and folders in the archive will be output.

  • If you select decompressed files, the file names will be output as links to the exported files.