9 Output Pages

The following topics are covered in this section:

9.1 About Output Pages

The Output Pages section allows you to set options for general HTML output, as well as for Markup Items, Text Formats, and Page Layouts.

9.2 Configuring HTML Settings

The top-level Output Pages page allows you to set general options for HTML output.

  1. Place a check in front of Use DOCTYPE to insert a DOCTYPE statement at the top of each output file. If CSS is enabled, the HTML generated conforms to the XHTML 1.0 Transitional DTD. If CSS is disabled, HTML 4.0 Transitional-conformant HTML is output. This option is enabled by default.

  2. In the Language string text box, enter the value of the "lang" attribute of the <html> tag. This indicates the primary natural language of the document.

  3. In the CSS generation drop-down box,you can specify whether Cascading Style Sheet ("CSS") formatting will be used, and if so, the method of CSS presentation. CSS enables HTML Converter to create output with higher fidelity with respect to the source document. Please note that CSS support varies from browser to browser.

    This option can be set to the following values:

    • none - only HTML tags will be used for formatting.

    • embedded - CSS styles will be included in the head of each output file.

    • external - CSS styles will be output in a separate file, which is referenced by all output files generated during the conversion.

    • inline - CSS style information will be included in every paragraph. This option can drastically increase the size of the output file, but is necessary when the template is used to generate fragments.

    By default, the CSS is embedded in the HTML of each output file. This reduces the total number of files generated by the conversion, as there is no need to create a separate CSS file. Note that if a style is needed anywhere in the final output of the conversion, it will appear in the style definitions of every HTML file.

  4. In the External user stylesheet text box, enter the URI of that CSS file and the appropriate reference to it will be placed in the output HTML file(s). An example of where this would be useful is if you need to maintain a site requiring multiple templates with a common look and feel to the site. Without using this feature, a change to the site's look would require that multiple templates be updated. With a common .css file, however, you only need to change the style definition in one place and all conversions are updated at once.

    If this option is left blank, the system will generate CSS styles based on how elements are defined under Text Elements.

    This option is ignored if the CSS generation option above is set to none.

  5. In the Output character set drop-down box, select which character set should be used in the output file. The HTML Conversion editor will then translate or "map" characters from the input document's character set to the output character set as needed. This character mapping is limited by the need for the character to be in both the input and the output character sets. If a character cannot be mapped, the character will show up in the output as the "unmappable character" determined by the Value for unmappable characters option described below. If the resulting output contains an excessive number of these unmapped characters, selecting a more appropriate output character set should improve the situation.

    The HTML standards currently limit documents to a single output character set. That character set is specified in an output file using the CONTENT attribute of the <meta> tag. This limits what the technology can do with documents that have multiple character sets. In general, documents that are a mix of a single Asian language and English characters will translate correctly (although with some possible loss of non-alphanumeric characters) if the appropriate DBCS, UTF-8, or Unicode output character set is selected. This is because most DBCS character sets include the standard 7-bit Latin 1 characters. Documents that contain more than one DBCS character set or a DBCS character set and a non-English character set (such as Cyrillic) may not export with all the character glyphs intact unless Unicode or UTF-8 is used.

    Source documents that contain characters from many character sets will look best when this option is set to the default, Unicode (UTF-8). This is because the Unicode and UTF-8 character sets contain almost all characters for the most common languages.

    While the W3C recommends using Unicode, there is a downside to it at this time. Not all systems have the appropriate fonts needed for using Unicode or UTF-8. Many editors do not understand these character sets, as well. In addition, there are some differences in the way browsers interpret the byte order of 2-byte Unicode characters (which is why both big and little endian Unicode are available settings for this option).

  6. In the Value for unmapped characters text box, enter the hex value for the character used when a character cannot be found in the output character set. The default value of 2A corresponds to the asterisk ("*") character.

    Note that when the CSS generation option described above has been set, there will be no unmappable characters in the HTML. Instead, the unmapped character will be written out in &#....; notation using the decimal representation of the character's Unicode value. Newer browsers support this representation and will convert it to the appropriate character if it is available in the font being used. If the character is not available in that font, the browser's unmappable character symbol (typically a rectangular box) will be seen. Also, note that there may still be unmapped characters in text rendered to graphics. This is because the graphic file is generated at conversion time rather than being rendered by the browser.

  7. In the Graphics format drop-down box, specify the format of the graphics produced by the system. The following values are allowed:

    • GIF

    • JPEG (default)

    • PNG

    • None

    When setting this option, remember that the JPEG file format does not support transparency. Though the GIF file format supports transparency, it is limited to using only one of its 256 available colors to represent a transparent pixel ("index transparency").

    PNG supports many types of transparency. The PNG files written are created so that various levels of transparency are possible for each pixel. This is achieved through the implementation of an 8-bit "alpha channel". However, at this time the technology will ignore transparency data when metafiles with multiple layers are converted.

  8. If you selected GIF, the Interlaced GIF checkbox becomes available. This is checked by default. Interlaced images cause less monitor flicker than non-interlaced ones.

  9. In the Output DPI text box, enter a value from 0 to 2400 to specify the output graphics device's resolution in DPI. This only applies to objects whose size is specified in physical units (in/cm). For example, consider a 1-inch square, 100 DPI graphic that is to be rendered on a 50 DPI device (Output DPI option set to 50). In this case, the size of the resulting JPEG, GIF, BMP or PNG will be 50 x 50 pixels.

    While this option is used to help compute table sizes, it is primarily a graphics option. Early browsers and versions of the HTML standard limit the specification of image sizes to dimensions in pixels. For images in particular, this is somewhat natural as GIF, JPEG, BMP, and PNG are bitmap formats whose sizes are defined in pixels. However, many of the source graphics and tables converted specify their size in physical units such as inches or centimeters, and there is no way to know how big a pixel is on the target device for the converted document. In fact, a single document may ultimately be viewed on many devices, each with a different number of pixels or dots per inch (DPI). If graphics are converted too small, image detail will be lost. Conversely, if the graphics are converted too large, conversion times will suffer and files will take longer to download.

    Setting this option to 0 may result in the creation of extremely large images. Be aware that there may be limitations in the system running this technology that could result in undesirably large bandwidth consumption or an error message. Additionally, an out-of-memory error message will be generated if system memory is insufficient to handle a particularly large image.

    Also note that setting this option to 0 will force the technology to use the DPI settings already present in raster images. For any other type of input file, the current screen resolution will be used as the DPI setting .

  10. If you have selected JPEG as your graphics format, the JPEG quality text box becomes available. The default (100) delivers the highest quality and largest file size.

  11. In the Image sizing method drop-down box, you can select from the following options:

    • quick

    • smooth

    • grayscale

    Each of these options involves some degree of trade-off between the quality of the resulting image and the speed of conversion.

  12. In the Custom target attribute drop-down box, select your preference for how the browser should select which frame or window in which to open source document links. The following values specify the target attribute of the links the system generates in these cases. This target value will be applied to all such links encountered in the source document.

    • No setting - This means that no target attribute will be included in links from the source document.

    • _self - This means that the document is loaded in the same frame as the element that refers to this target (essentially the same as not specifying a target at all).

    • _parent - This means that the document is loaded into the immediate FRAMESET parent of the current frame. This value is equivalent to _self if the current frame has no parent.

    • _top - This means that the document is loaded into the full, original window (thus canceling all other frames). This value is equivalent to _self if the current frame has no parent.

    • _blank - This means that links are opened in a new, unnamed window.

  13. Place a check in front of Format HTML source for readability (default) to write newlines to the output strictly to make the generated HTML more readable and visually appealing. These newlines only appear in the places where you have set Add a newline before this paragraph under Output Text Formats.

    It is important to note the things that setting this option does not do:

    • While setting this option will make it easier for a human to read the generated markup in a text editor, it does not affect the browser's rendering of the document.

    • This option does not affect the contents of the .css files, since they do not contain any text from the source document.

    • The option does not affect spaces or newlines copied from the template, as the contents of the template are already under the control of the user.

  14. Place a check in front of Show style information to include information about source document style names and how they are mapped by the template being used. The user can see what format has been mapped to a particular paragraph or text sequence by mousing over it.

9.3 Configuring Output Markup Items

Markup items are HTML fragments that may be inserted directly into the output HTML as part of a page layout. Each markup item is a name/value pair. The name is what will appear in the screens for editing page layouts. The value is a block of HTML that will be inserted into the output HTML wherever the markup item appears in a page layout. There is one default name in this section, break, whose value is defined as <br />.

  1. Click Add to display the Output Markup Items page. A new item called Output Markup Item appears in the lefthand tree view under Output Markup Items, below the default item, break.

  2. In the Name field, assign a name by which you will refer to this item, for example, rule. The display on the lefthand tree view changes to reflect this name.

  3. In the Markup text box, enter the HTML fragment to be associated with this item, for example, <br />.

  4. To remove a markup item from the template, highlight the name of the item in the tree and click Remove.

9.4 Configuring Output Text Formats

Output text formats define text and formatting attributes of output document text. These formats will define such attributes as the font family, size, and color, standard text attributes (bold, italic, underline, etc) and border attributes. This allows the template author to standardize the look of the output despite differing formatting styles used by the various authors of the source documents. There is one default format in this section, Default Paragraph, whose tag is p. Output Text Formats created here can then be be organized according to Format Mapping Rules, which pick the formatting based on checking the type of source document text.

Note:

Users should be aware that text formats are only applied to text from word processing files. They cannot be used to change the formatting of text that is rendered as part of any graphics generated by the conversion. They are also not applied to text inside spreadsheets.
  1. Click Add to display the Output Text Format - Markup Tab. A new item called Output Text Format appears in the lefthand tree view under Output Formats, below the default item, Default Paragraph. To remove a paragraph item from the template, highlight the name of the item in the tree and click Remove.

  2. In the Name field, assign a name by which you will refer to this format, for example, Bold Italic. This name then appears in the lefthand tree view.

  3. In the Tag name text box, enter the HTML paragraph-level tag to put around paragraphs using this format. Note that any "tag" name may be entered here, whether it is legal or not. Only the tag name should be entered, not the surrounding angle brackets ("<" and ">"). The paragraph tag ("p") is the default.

  4. Click Add Attribute to add a new name/value pair to the Custom Attributes table. These attributes apply to the tag whose name was specified above. To set the name and value of the new attribute, just click on them in the Custom Attributes table. By default, this table is empty. To remove an attribute, highlight the name in the table and click Remove attribute.

  5. In the Custom Markup text boxes, enter any HTML (or regular text) to be placed before and/or after the paragraph.

  6. Check the box in front of Insert a newline in the HTML before this paragraph to insert a blank line into the HTML before the paragraph to make it easier to view the HTML output of the conversion. This option does not affect how the output looks in the browser. This new line is only written if the Format HTML source for readability option is set under Output Pages. By default, this option is not set.

  7. Check the box in front of This paragraph should begin a new page to create a new output page every time this text format is used. By default, this option is off. If you check that option, by default, there is a check in the box in front of Don't begin a page for the first paragraph of this type. The purpose of this option is to avoid empty or mostly empty pages at the beginning of the output.

  8. Click on Output Text Format - Formatting Tab. If you place a check in front of Use external CSS class, then you must enter the name of a class from an external CSS file here.

  9. If you leave the box empty in front of Use external CSS class (default), the first section of the page allows you to specify Character Formatting. The types of character-level formatting are Bold, Italic, Underline, Strikeout, Superscript, Subscript, Upper Case, and Small Caps. Each type of formatting can be set to one of four values:

    • Always off - Forces the attribute to always be off when formatting the text.

    • Always on - Forces the attribute to always be on when formatting the text.

    • Inherit (default) - Takes the state of the attribute from the source document. In other words, if the source document had the text rendered with bold, then the technology will create bold text.

    • Do not specify - Leave the formatting unspecified. In certain cases, this will produce different HTML than Always off.

    There are also three font settings: Font family, Size, and Color. These are only available when you set them to Always on. The defaults for these three settings are Arial, 12pt, and 000000 (hexadecimal for black).

  10. The next section of the page deals with Paragraph Formatting. The types of paragraph-level formatting are Alignment, Line height, Background color, and Indent. They can each be set to the four values specified above for character formatting, and like the font settings, can only be changed if they are set to Always on. The defaults for these settings are left, single, FFFFFF (hexadecimal for white), and 0, respectively.

  11. The bottom section of the page deals with Borders. These options become available when you set Border Use to Always on. For each paragraph border side (Top, Right, Bottom, Left), the following attributes can be specified:

    • Border style - The default is None, and you can select one of the allowable border style from the drop-down box: dotted, dashed, solid, double, groove, ridge, inset, outset.

    • Border color - The default is 000000 ((hexadecimal for black), and you can specify a color in a valid CSS format.

    • Border width - The default is 1pt, and you can specify a border width in a valid CSS format.

9.5 Setting Format Mapping Rules

Format mapping rules allow you to specify output document formatting and the sequence in which rules are checked.

  1. On the Add Format Mapping Rules page, click Add Format Mapping Rule. The Format Mapping Rules page is displayed, and a new item shows up in the left-hand navigation pane called outline level = x. Once rules have been defined, this page will allow you to arrange the rules by using the Move Up and Move Down buttons. The mapping rules are ordered so that the first rule that matches is the one, and only one, that is applied.

    For example, a rule may be created for mapping paragraphs in the My Style style. Below that rule, another rule may exist for mapping paragraphs with outline level 1 applied. An input document may have one or more paragraphs in the My Style style that also have outline level 1 applied. In this example the technology will only apply the My Style formatting to such paragraphs and ignore the outline level 1 rule for them.

  2. On the Format Mapping Rules page, click the down arrow on the Format drop-down box. Choose one of the previously defined output text formats to apply when this mapping rule is in effect. The Default Paragraph format is always available and is always the default. See Configuring Output Text Formats for details on how to create new formats to use here.

  3. Under Match on, click the down arrow on the drop-down box to define the paragraph formatting information to have the rule check.

    • Outline level - Match the outline level specified in the source document. Application-predefined "heading" styles typically have corresponding outline levels applied as part of the style definition.

    • Style name - Match the paragraph or character style name.

    • Is footnote - Match any footnote.

    • Is endnote - Match any endnote.

    • Is header - Match any document header text.

    • Is footer - Match any document footer text.

  4. For Paragraph outline level, if Match on above is set to Outline level, then this defines which outline level to match. This option cannot be set/is ignored for all other matching rules.

  5. For Paragraph or character style, if Match on above is set to Style name, then this defines which source document paragraph or character style name to match. When matching on style names, the template author must supply a style name here, and no default value is provided. The name must exactly match the style name from the source document. Style name matching is done in a case-sensitive manner. This option cannot be set/is ignored for all other matching rules.

Application Example: Map Source Document Styles to CSS Styles

This example shows how to map the heading styles found in the source document to CSS styles for the HTML display.

  1. To map the heading style, first you must add an Output Text Format. Navigate to Output Pages > Output Text Format, click Add and give it a name, Heading 1.

  2. In the Tag name text box, enter the HTML tag to use in the output, in this case, h1.

  3. On the Formatting tab, allow all formatting to be inherited from the source.

  4. Next, you must define a Format Mapping Rule that matches on the style name. Mapping rules can use different criteria to map document text to a format. Navigate to Format Mapping Rules in the left-hand pane, and click Add Format Mapping Rule.

  5. In the Format drop-down menu, select the name of the Output Text Format you created, Heading 1.

  6. In the Match on drop-down menu, select Style name.

  7. In the Paragraph or character style text box, enter a name for the style exactly as it appears in the source document, such as Heading 1.

  8. When you exit the editor, you will be prompted to save the template.

9.6 Configuring Output Page Layouts

Page layouts are used to organize how the various parts of the output are arranged. This includes such items as where to place a Table of Contents in the output document. A default layout has been provided for users who need output that is pleasing to the eye, but are not particular about the details of their output.

Users may create multiple page layouts, each optimized for a specific file type. The Document Formatting page allows you to specify which page layout to use.

  1. In the Add Output Page Layouts page, click Add. The newly added layout will appear on the left side navigation screen and you will be auto-navigated to the new layout for editing. The Output Page Layouts page is displayed.

  2. In the Name field, fill in the name to use to refer to this layout (required). Once this has been done, click on the icon on the left side of the editor to expand the levels underneath this one. There are three items: <title> Source, Navigation Layout, and Page Layout.

  3. Check the Include navigation layout option if you want a table of contents page preceding the document content.

  4. Click on the Title Source page to select where to get the value to use for the HTML <title> tag. By default, this is empty for new layouts, although the <title> tag will always be output in order to conform to the DTD. There are four options you can select for the source of the <title> tag:

    • Section Name - Use the title for the current document section. Section titles are not available in all document formats, such as word processing files. Two examples of where this is very useful are presentations, where this corresponds to the slide title; and spreadsheets/database files, where this corresponds to the sheet name. Using the section name works well with output layouts that place one slide/sheet in each output HTML page. In this situation, each page would have a title that matches the title of its contents. If the page layout does not break the document by sections, the name of the first section will be used as the title text.

    • Text Element - Use a text element already defined under Text Elements. Using a text element for the title makes a good fail-safe entry at the bottom of the list, just in case all other title sources are undefined/unavailable.

    • Property - Use any document property already defined under Document Property.

    • Output Text Format - Use an output paragraph format already defined under Output Text Formats. The first non-empty instance of a paragraph in this format is used.

  5. In the Navigation Layout page, you can select navigation elements previously defined for inclusion in the output page layout. The purpose of the navigation layout is to generate a separate Table of Contents HTML page based on mapping rules defined on Link Mapping Rules. This navigation page becomes the first output file of the conversion and is completely independent of other page breaking rules in the output.

    Only navigation items and markup can be inserted into the body of the navigation layout. Page navigation is not supported in the navigation layout; it is intended for use with document and section layout items.

    Use of a navigation layout is optional. If one is used, you can specify markup items to be placed in either the Head and/or the Body.

    • For the Head item, you can define the content that will be placed in the HTML <head> of all of the output files this layout applies to. The following items are placed by default in the head:

      A <meta> tag stating the character set in which the HTML file is encoded.

      A <title> tag, the contents of which are defined in the Title Source page.

      A <meta> tag stating that the HTML was generated.

      <meta> tags for all document properties defined under Document Property that specify a Meta tag name.

      If the CSS generation option was selected on Output Pages, then CSS style definitions generated by the technology are included, either in a <style> tag (for embedded CSS) or with a <link> tag to a CSS file generated by the conversion.

      If the External CSS stylesheet option was selected on Output Pages, then an HTML <link> tag to the user specified CSS file is included.

      Also, any markup items previously defined under Output Markup Items can be inserted here. The markup items specified on this screen will appear in the head after all of the auto-generated items listed above. By default, there is nothing listed here.

    • For the Body item, you can define the content that will be placed in the body of the navigation page created by this layout. The following items may be placed in the top:

      Markup Item - A text and HTML markup item defined under Output Markup Items.

      Navigation Element - A navigation element defined under Add Navigation Elements.

  6. In the Page Layout, you can enable document pagination with the following options:

    • Break on sections - If checked, enables page breaking on document sections. Only applicable for multi-section documents.

    • Break on pages - If checked, enables page breaking based on the document-specific pagination options.

  7. If you click on Head, you can define the content that will be placed in the HTML <head> of all the output files generated from this layout. The following items are placed by default in the head:

    • A <meta> tag stating the character set in which the HTML file is encoded.

    • A <title> tag, the contents of which are defined in the Title Source page.

    • A <meta> tag stating that the HTML was generated.

    • <meta> tags for all document properties defined under Document Property that specify a Meta tag name.

    • If the CSS generation option was selected on Output Pages, then CSS style definitions generated by the technology are included.

    • If the External CSS stylesheet option was selected on Output Pages, then an HTML <link> tag to the user specified CSS file is included.

    • Also, any markup items previously defined under Output Markup Items can be inserted here. The markup items specified on this screen will appear in the head after all of the auto-generated items listed above. By default, there is nothing listed here.

  8. If you click on Page Top, you can define the content that will be placed at the top of every output file to which this layout is applied. This content will appear just after the HTML <body> tag. In order to place anything in the page top, those items first need to be defined elsewhere in the template. An example of typical Page Top content might be a navigation bar with links to the first page, previous page, and next page in the output. By default, the HTML Conversion editor leaves the Page Top empty. The following items may be placed:

  9. If you click on Before Content, you can define the content to be placed before the start of output for the converted document, but not on every page. An example of such content might be a standard cover page to be created for all converted documents. By default, the HTML Conversion editor leaves Before Content empty. The following items may be placed:

  10. If you click on Before Section, you can define content to be placed before each section of a document. The following document formats support multiple sections: spreadsheets, presentations, images, and databases. Users should note that although a format may support multiple sections, the specific file being converted may still only have one section. An example of a typical use of this layout area is to insert a couple of blank lines and a horizontal rule to prevent sections from running together. By default, the HTML Conversion editor leaves Before Section empty. The following items may be placed:

    • Section Name - The name of the current section. If the name of the current section is not specified in the source document or is undefined (such as in word processing documents), then nothing will be inserted. Adding this type of item brings up a simple screen where the author selects which Output Format to use for the element.

    • Document Property - A document property defined on Document Property.

    • Markup Item - A text and HTML markup item defined on Output Markup Items.

    • Text Element - A text element defined on Text Elements.

  11. If you click on After Content, you can define the content to be placed after the end of document content. This content will come after any footnotes or endnotes if the template specifies them. By default, the HTML Conversion editor leaves After Content empty. The following items may be placed:

    • Header - The header from the input document.

    • Footer - The footer from the input document.

    • Document Property - A document property defined on Document Property.

    • Markup Item - A text and HTML markup item defined on Output Markup Items.

    • Text Element - A text element defined on Text Elements.

  12. If you click on Page Bottom, you can define the content that will be placed at the bottom of every output file to which this layout is applied. This content will appear just before the closing HTML <body> tag. An example of such content might be a copyright notice. By default, the HTML Conversion editor leaves Page Bottom empty. The following items may be placed: