Oracle® Outside In HTML Export Developer's Guide Release 8.4.0 Part Number E12884-03 |
|
|
View PDF |
Much of the power, flexibility and complexity of Export products are realized through its use of templates to drive the export process. Templates give the developer (or the developer's customer) flexibility in the visual and navigational properties of the resulting output. Templates also isolate the HTML Export code from the ever-changing face of HTML and its associated plug-ins, components and scripting languages.
The template macros and the elements they reference are so tightly intertwined that discussing one without the other is almost impossible. Before either is read in-depth, it is recommended that the reader skim Section 10.2, "The Included Sample Templates," and Section 10.4, "Macro Reference."
This chapter includes the following sections:
A template is simply an HTML file that may include a special macro language. This language allows the template writer to insert, repeat through, condition on, and link to various elements in the source document.
The following is the code for a very simple template:
{## unit}{## header} <html> <body> {## /header} <p>Here is the document you requested. {## insert element=property.title} by {## insert element=property.author}</p> <p>Below is the document itself</p> {## insert element=body} {## footer} </body> </html> {## /footer}{## /unit} {## unit}{## header} <html> <body> {## /header} <p>Here is the document you requested. {## insert element=property.title} by {## insert element=property.author}</p> <p>Below is the document itself</p> {## insert element=body} {## footer} </body> </html> {## /footer}{## /unit}
The {## unit}, {## /unit}, {## header}, {## /header}, {## footer} and {## /footer} macros can be ignored for the moment. Their purpose is described in Section 10.6, "Units - Breaking Documents by Content Size."
The remainder of the file is a regular HTML with the exception of three macros in the form {## insert element=xxx}. HTML Export uses this template plus the source file to create its output. To accomplish this, HTML Export reads through the template file, writing it byte for byte to the output file unless character mapping is performed on the template (for an explanation of template character mapping, see Section 10.9, "Unicode Templates"). This continues until the template contains a properly formatted macro. HTML Export reads the macro and executes the macro's command. Usually this means inserting an HTML version of some element from the source file into the output file. HTML Export then continues reading the template and executing macros until the end of the template file is reached.In the previous example, the first {## insert} macros use the element syntax (described in Section 10.4, "Macro Reference") to insert the title of the document. The second macro inserts the author of the document and the third macro inserts the entire body of the document. The resulting HTML might look like this (HTML that is the result of a macro is in bold):
<html> <body> <p>Here is the document you requested.</p> <p>A Poem by Phil Boutros</p> <p>Below is the document itself</p> <p>Roses are red</p> <p>Violets are blue</p> <p>I'm a programmer</p> <p>and so are you</p> </body> </html>
By default, the templates included with HTML Export convert files of type PR into images that are always 640 pixels wide. Users who wish to change this setting will need to edit the templates to remove the ## option macro that sets this limit.
When you install HTML Export, a template directory is created that contains sample templates. These templates (with the exception of those in the tutorial directory) are tailored for publishing and indexing applications, and they are completely brandable. To brand a template, you can alter its .CSS file so that the template's color scheme matches your company's color scheme. You can also overwrite the existing logo.gif file with your company's logo. Some of the template directories contain readme.txt files that contain more information about modifying those templates.
The following is a list of templates contained in this directory:
\template\HTML Export\standard: The standard template features convenient navigation elements, including a table of contents and a preview window, to help users quickly access a document's information.
\template\HTML Export\navigation: The navigation template has many of the same features as the standard template, such as convenient navigation elements, and adds a drop-down table of contents.
\template\HTML Export\newsletter: The newsletter template supports all document types except archives. It displays the content in a style similar to a news web site. The table of contents contains each top level heading (the "Heading 1" style). When a user clicks these hyperlinks, the corresponding section's content fills the main window.
\template\HTML Export\noframes: The noframes template displays an entire document in a single frame, with table-of-contents style navigation. It is ideal for use in the most straightforward publishing applications.
\template\HTML Export\tableofcontents: The tableofcontents template is simpler than the standard or the navigation templates, and contains fewer navigation elements. It shows a table of contents on the left side of the screen, and the selected document content on the right.
\template\HTML Export\textonly: The textonly template is designed for use by developers wishing to convert documents for inclusion in an index for a search engine. It should not be used in publishing applications. All of the document's elements, including properties, headers and footers, are converted.
\template\HTML Export\tutorial: This is a directory of templates containing comment text intended to help users interested in more thoroughly understanding the HTML Export template language.
HTML Export uses the concept of a document tree to make various pieces and attributes of the source file individually addressable from within a template. The nodes of the document tree are used to generate a path to a specific element in the tree. A period is used to separate the nodes in this path. For example, the path of the author property of a document is property.author. There are two types of elements: leaf elements and repeatable elements.
Leaf elements are single identifiable pieces of the source file, like the author property (property.author) or the preface of the document (body.contents.preface). This type of element is a valid target for inserting, testing and linking using the {## insert}, {## if} and {## link} macros. The last node in this type of path must be a valid leaf node in the document tree. Valid leaf nodes are shown in italics.
Repeatable elements have multiple instances associated with them, like the footnotes in a document (sections.1.footnotes). This type of element may not be directly inserted, tested or linked to but its instances may be looped through using the {## repeat} macro. The last node in this type of path must be a valid repeatable node in the document tree. Valid repeatable nodes are shown in bold.
Some templates use {## repeat} loops to generate one output file per repeatable element. For example, a template may render a presentation file as a group of output files, with one output file for each slide. When an input file contains an exceptionally large number of sections, it is possible for an operating system to run out of file handles. See your operating system's documentation or system administrator to find out how many open file handles are allowed. To avoid this extremely rare problem, set a value for the maxreps attribute of the {## repeat} macro or configure the operating system to allow more file handles.
The following is a list of all elements and a short description of each (for a description of valid values for x, see Section 10.5.1, "Indexes and Structure-Based Breaking"):
sheets
Type: Repeatable
Description: See sections later in this list.
slides
Type: Repeatable
Description: See sections later in this list.
sections
Type: Repeatable
Description: Sections are used to represent the highest level of abstraction within the source file. In general, word processor documents will have only one section, the document itself. Spreadsheets have one section for each sheet or chart. Presentations have one section for each slide. Archives have one section for each item in the archive. Graphics generally have one section but may have more as in a multi-page TIFF. For convenience and readability, sheets and slides are synonymous with sections.
sections.x.body
Type: Leaf
Description: This element represents the main textual area of the source file.
For word processing documents, it includes the entire document excluding footnote, endnotes, headers, footers and annotations. (Footnote/endnote references are always included automatically in the body. If the template includes footnotes/endnotes, then these references provide a link to the note. Annotation references are not placed in the body unless the template includes annotations, in which case they provide links to the annotations.)
For emails, this is the message itself.
For spreadsheets, it includes the entire sheet.
For graphics, it includes any text that actually appears as text in the file format.
For multimedia files, the body does not exist at this time.
For archive formats, the meaning is arctype-specific. When arctype is file, this is the summation (as needed):
sections.x.path +
the directory separator character being used +
sections.x.basename
Note that sections only exist for entries in the archive file that have files associated with them. In particular, entries in the archive file that are for directories are ignored.
Also note that directory separators are OS-dependent. For example, Windows uses back slashes (\) and allows forward slashes (/), UNIX uses the forward slash, and Macs use a colon (:). The directory separator being used depends on how the directory separator is coded in the archive file.
When arctype is message, cal, task or journal, this is the subject of the file. When arctype is contact, this is the name of the contact. When arctype is note, this is the contents of the note. When arctype is attach, this is the filename of the attachment or a link to the extracted and converted attachment. When arctype is fieldsfile, this is the list of fields.
This element is empty when the input file is a multimedia file.
sections.x.to
Type: Leaf
Description: "To" addresses from an email or email archive.
sections.x.from
Type: Leaf
Description: "From" addresses from an email or email archive.
sections.x.cc
Type: Leaf
Description: "CC" addresses from an email or email archive.
sections.x.content
Type: Leaf
Description: Same as sections.x.decompressedfile. For archive files, the meaning is arctype-specific. When arctype is file, the file in the archive is extracted and converted. For all other arctypes, this is the contents of the item.
Note that this element may not be inserted into a document. If it is used with the {##insert} template macro, a template error will be returned.
sections.x.image
Type: Leaf
Description: This element represents a graphic image of the content of the section. It is valid only for bitmap, drawing, chart and presentation sections.
sections.x.bodyorimage
Type: Leaf
Description: This element exists to make it easy to build templates that handle a range of section types. In word processor documents, spreadsheet and database sections, and archive elements, bodyorimage is synonymous with body. In bitmap, drawing, multimedia, chart and presentation sections, bodyorimage is synonymous with image. For multimedia files, bodyorimage does not exist at this time.
sections.x.type
Type: Leaf
Description: This element is normally used only for query purposes, but it may be inserted as well. For further details on how to use this in an {## if} macro, see Section 10.4.3, "Conditional: {## if}, {## elseif}, and {## else}."
sections.x.arctype
Type: Leaf
Description: For archive formats, this describes what kind of archive. Currently defined archive types include:
file
message
contact
cal
note
task
journal
attach
fieldsfile
sections.x.fullname
Type: Leaf
Description: This is the full name (including path, if applicable) of a file in an archive if the arctype for the archive is file. For archive formats, this is synonymous with body. For all other formats, it is not defined.
sections.x.basename
Type: Leaf
Description: For archive formats where the arctype is file, this is the file name for the item in the archive without any path info. This element is undefined for all other input file types.
sections.x.title
Type: Leaf
Description: Same as sections.x.body.title. For word processor documents, this element is the text marked with the title style. This may be different than the property.title. For archive files, this is the same as sections.x.body. For all other types, this element will be the "name" of the section. For example, if the source file is a spreadsheet, this element will be the name of the sheet as it appears on the spreadsheet application's navigation tabs.
For archive formats, this is synonymous with body.
For email and email archive sections, this is the subject of the subject field of the email.
For multimedia files, this does not exist at this time.
sections.x.path
Type: Leaf
Description: For archive files where the arctype is file, this is any path information provided for the current archive item. Does not include a trailing directory separator character. This element may be the empty string (" "). This element is undefined for all other input file types.
sections.x.itemnum
Type: Leaf
Description: For archive formats, this is the (unsorted) entry number of the current file in the archive. The first entry is itemnum one ("1"), not zero ("0"). All entries in archive files have an associated itemnum. However, not all entries in archive files have an associated section number. This is because archive entries for directories are skipped when sections are generated by HTML Export. Therefore, inserting this element is not functionally equivalent to {## insert number=sections.x.value}. This element is undefined for all other input file types.
sections.x.reflink
Type: Leaf
Description: For archive formats, this is a URL composed of
the input file name
+
the subdocument spec for the archive entry
The intent of this element is to provide a string that can be passed to DAOpenDocument in a future export to a specific entry in the archive file currently being exported. The target of the reflink is not necessarily converted into HTML. In this usage scenario:
The original export is run producing the reflink.
The user clicks on the reflink in the output document
The OEM's program interprets the reflink and passes it to a DAOpenDocument. It then runs HTML Export and serves the output back to the user.
Users of redirected IO should also note that they must handle the IOGetInfo call for IOGETINFO_PATHNAME. It must return a path name for the archive file that HTML Export can use to build the reflink. In addition, the calling program will need to be able to correctly interpret the resultant reflink to be sure it can subsequently be passed to a future call to DAOpenDocument.
This element is undefined for all other input file types.
sections.x.decompressedfile
Type: Leaf
Description: For archive formats, this extracts the file in the archive and converts it. Note that this element may not be inserted into a document. If it is used with {## insert}, a template error will be returned.
This element is undefined for non-archive input file types.
For archive formats, this is arctype-specific. When arctype is file, the file is converted to the designated output format. When arctype is message, this is the contents of the email. When arctype is contact, this is the contents of the contact info. When arctype is cal, this is the contents of the calendar entry. When arctype is note, this is the contents of the note. When arctype is task, this is the contents of the task. When arctype is journal, this is the contents of the journal entry. When arctype is attach, this is the contents of the attachment. When arctype is fieldsfile, this is the list of fields.
sections.x.size
Type: Leaf
Description: This applies to all archive types except those of type fieldslist.
This is the uncompressed file size of the entry in the archive.
This element is undefined for all other input file types.
sections.x.date
Type: Leaf
Description: For archive formats, this is arctype-specific. When arctype is file, this is the file modification time stamp for this entry in the archive. When arctype is message, this is the time the message was last modified. When arctype is cal, this is the start time/date of the event. When arctype is task, this is the due date for the task. When arctype is journal, this is the start time. When arctype is attach, this is the date of the attachment. This value is undefined for the contact and note arctypes.
For email sections, this is the submitted time field from the email.
This element is undefined for archives of type fieldsfile.
sections.x.mailfields
Type: Repeatable
Description: For email sections, this is used to iterate through the complete set of fields available in emails. This includes all of the named fields (like sections.x.date) as well as fields that are not explicitly named (like "bcc"). This is undefined for all other section types.
sections.x.mailfields.x.body
Type: Leaf
Description: For email sections, this element is the value of a field from the email. This is undefined for all other section types.
sections.x.mailfields.x.name
Type: Leaf
Description: For email sections, this element is the name of a field from the email. This is undefined for all other section types.
sections.x.body.title
Type: Leaf
Description: For word processor documents, this element is the text marked with the title style. This may be different than the property.title.
For archive formats, this is synonymous with body.
For multimedia formats, this does not exist at this time.
For all other document types, this element will be the "name" of the section. For example, if the source file is a spreadsheet, this element will be the name of the sheet as it appears on the spreadsheet application's navigation tabs.
sections.x.body.contents
Type: Leaf
Description: For word processor documents, this is the same as sections.x.body. This is to maintain backwards compatibility with templates written before sections.x.body.title was legal for word processor documents, a feature added in the 7.0 release.
For multimedia files, this does not exist at this time.
For all other document types, this is the same as the body minus the title, if a title exists.
sections.x.body.contents.preface
Type: Leaf
Description: Text between the top of the body and the first heading.
sections.x.body.contents.headings
Type: Repeatable
Description: Headings are labels in a word processor document inserted by the author to give a document structure (for further details of headings, see Section 10.5, "Breaking Documents by Structure"). HTML Export reads this structure and, through the use of the headings element, allows the developer to access it.
sections.x.body.contents.headings.x.body…
Type: Leaf with Leafs and Repeatables below
Description: Under each heading, the structure of a complete document from body down is repeated. For more information on how these elements map to parts of a document, see Section 10.5, "Breaking Documents by Structure."
sections.x.body.contents.headings.x.footnotes…
Type: Repeatable with Leafs below
Description: Only footnotes contained in this heading.
sections.x.body.contents.headings.x.endnotes…
Type: Repeatable with Leafs below
Description: Only endnotes contained in this heading.
sections.x.body.contents.headings.x.annotations…
Type: Repeatable with Leafs below
Description: Only annotations contained in this heading.
sections.x.grids
Type: Repeatable
Description: Only valid for spreadsheet and database formats. This permits access to the "grids" inside a section or sheet of a spreadsheet or database file.
sections.x.grids.x.body
Type: Repeatable
Description: Only valid for spreadsheet and database formats. This permits access to the "grids" inside a section or sheet of a spreadsheet or database file.
sections.x.arcfields
Type: Repeatable
Description: All of the supported fields in the archive including the named fields such as sections.x.date and sections.x.basename. Each arcfield is a name/value pair.
sections.x.arcfields.x.body
Type: Leaf
Description: Value of the data for a given field in an archive file. Not defined for non-archive files.
sections.x.arcfields.x.name
Type: Leaf
Description: Name of the data field from an archive file. Not defined for non-archive files.
sections.x.footnotes
Type: Repeatable
Description: All footnotes.
sections.x.footnotes.x.body
Type: Leaf
Description: The complete footnote reference and content text.
sections.x.footnotes.x.reference
Type: Leaf
Description: The reference number for the footnote.
sections.x.footnotes.x.content
Type: Leaf
Description: The content text for the footnote.
sections.x.endnotes…
Type: Repeatable with Leafs below
Description: Same definitions as footnotes.
sections.x.annotations
Type: Repeatable
Description: All annotations. In templates, the term "annotations" refers to annotations made inside an authoring application (for example, "comments" in a Microsoft Word document) and do not refer to the annotations created via the Export Annotation API.
sections.x.annotations.x.body
Type: Leaf
Description: The complete annotation reference and content text.
sections.x.annotations.x.reference
Type: Leaf
Description: The reference text for the annotation.
sections.x.annotations.x.content
Type: Leaf
Description: The content text for the annotation.
sections.x.slidenotes
Type: Repeatable
Description: All slide notes.
It should be noted that exporting the slide notes will slow down the conversion process for PowerPoint files.
sections.x.slidenotes.x.body
Type: Leaf
Description: The notes for the current slide.
Developers are encouraged to write slide notes at the end of the output file for performance reasons (PowerPoint files keep slide notes at the end of the file, not next to each slide). Not doing so will slow conversion, as the technology will be forced to perform excessive seeking in the input file.
sections.x.slidenotes.x.reference
Type: Leaf
Description: The slide note text for the annotation.
sections.x.slidenotes.x.content
Type: Leaf
Description: The content text for the slide note.
sections.x.headers
Type: Repeatable
Description: All headers.
sections.x.headers.x.body
Type: Leaf
Description: Text of the header.
sections.x.footers
Type: Repeatable
Description: All footers.
sections.x.footers.x.body
Type: Leaf
Description: Text of the footer.
property.all
Type: Repeatable
Description: This permits access to all properties including those specifically accessible through property elements described in this table, and includes both the " name" and the " body" of the property. The properties supported depend on file format. See the Outside In Content Access Developer Guide for a list of possible predefined properties. Some file formats also allow for additional user-definable properties.
At this time, only properties may be extracted from multimedia files.
property.all.x.name
Type: Leaf
Description: Descriptive name for the property.
property.all.x.body
Type: Leaf
Description: Text of the property.
property.album
Type: Leaf
Description: Album property of the source file. Valid only for multimedia files.
property.artist
Type: Leaf
Description: Artist property of the source file. Valid only for multimedia files.
property.author
Type: Leaf
Description: Author property of the source file.
property.title
Type: Leaf
Description: Title property of the source file.
property.subject
Type: Leaf
Description: Subject property of the source file.
property.keywords
Type: Leaf
Description: Keywords property of the source file.
property.comment
Type: Leaf
Description: Comment property of the source file.
property.others
Type: Repeatable
Description: This permits access to all properties not specifically accessible through property elements described in this table, and includes both the "name" and the " body" of the property. The other properties supported depend on file format. See the Outside In Content Access Developer Guide for a list of possible predefined properties. Some file formats also allow for additional user-definable properties.
At this time, only properties may be extracted from multimedia files.
property.others.x.name
Type: Leaf
Description: Descriptive name for the property.
property.others.x.body
Type: Leaf
Description: Text of the property.
pragma.charset
Type: Leaf
Description: The text string associated with the character set of the characters that HTML Export is generating. In order for HTML Export to correctly code the character set into the output it generates, all templates should include a <meta> tag that uses the {## insert} macro as follows:
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset={## insert element=pragma.charset}" />
If the template does not include this line, the user will have to manually select the correct character set in their browser.
pragma.cssfile
Type: Leaf
Description: This element is used to insert the name of the Cascading Style Sheet (CSS) file into HTML documents. This name is typically used in conjunction with an HTML <link> tag to reference styles contained in the CSS file generated by HTML Export.
When used with the {## insert} macro, this pragma will generate the URL of the CSS file that is created. This macro must be used with {# insert} inside every template file that inserts contents of the source file and when the selected HTML flavor supports CSS. The CSS file will only be created if the selected HTML flavor supports CSS.
When used with the {## if} macro, the conditional will be true if the selected HTML flavor supports Cascading Style Sheets or not.
NOTE: If CSS is required for the output, the following code must be used:
{## if element=pragma.embeddedcss}
or
{## if element=pragma.cssfile}
However, HTML Export does not differentiate between the two, as the choice of using embedded CSS vs. external CSS is the template author's decision and the author may even wish to mix the two in the output.
An example of how to use this pragma that works when exporting either CSS or non-CSS flavors of HTML would be as follows:
{## if element=pragma.cssfile}
<link rel="stylesheet"
href="{## insert
element=pragma.cssfile}">
</link>
{## /if}
pragma.embeddedcss
Type: Leaf
Description: This element is used to insert CSS style definitions in a single block in the <head> of the document.
When used with the {## insert} macro, this pragma will insert the block of CSS style definitions needed for use later in the file. This macro must be used inside every output HTML file where {## insert} is used to insert document content.
When used with the {## if} macro, the conditional will be true if the selected HTML flavor supports CSS.
NOTE: If CSS is required for the output,the following code must be used:
{## if element=pragma.embeddedcss}
or
{## if element=pragma.cssfile}
However, HTML Export does not differentiate between the two, as the choice of using embedded CSS vs. external CSS is the template author's decision and the author may even wish to mix the two in the output.
If a style is used anywhere in the input document, that style will show up in the embedded CSS generated for all the output HTML files generated for the input file. Consider a template that splits its output into multiple HTML files. In this example, the input file contains the "MyStyle" style. It does not matter if during the conversion only one output HTML file actually references the "MyStyle" style. The "MyStyle" style definition will still show up in the embedded CSS for all the output files, including those files that never reference this style.
pragma.jsfile
Type: Leaf
Description: This element is used to insert the name of the JavaScript file into HTML documents. This name is typically used in conjunction with an HTML <script> tag to reference JavaScript contained in the .js file generated by HTML Export.
When used with the {## insert} macro, this pragma will generate the URL of the JavaScript file that is created. This macro must be used with {## insert} inside every template file that inserts contents of the source file when:
The selected HTML flavor supports JavaScript.
The javaScriptTabs option has been set to true.
The JavaScript file will only be created if the selected HTML flavor supports JavaScript.
When used with the {## if} macro, the conditional will depend upon whether the selected HTML flavor supports JavaScript or not.
pragma.sourcefilename
Type: Leaf
Description: The name of the source document being exported. Note that this does not include the path name. When exporting documents inside of archive files, this is the name of the file inside the archive. For example, if the first file inside of archive.zip is myfile.doc, then exporting archive.zip?item.1 would use myfile.doc as the pragma.sourcefilename.
For convenience, certain nodes in an element path may be skipped because they represent the obvious default behavior. These nodes include the sections node (sections.current.body.title is equivalent to body.title), and the body and contents nodes (body.contents.headings.1.body is equivalent to headings.1.body). Please note that these nodes may not be skipped if they are the last node in the path (headings.1.body is not equivalent to headings.1). For further examples, see Section 10.5, "Breaking Documents by Structure."
Macros are commands to HTML Export within the template. Despite their casual similarity to HTML tags, they are not bound by any of the rules tags would usually follow inside an HTML file. Macros may appear anywhere in the template file, except inside another macro.
In the documentation and examples, the pieces of a macro are always shown delimited by spaces, however semicolons may also delimit them. This option was added to accommodate certain editors. In these editors, URLs entered into dialog boxes may not have non-quoted spaces. This makes it difficult or impossible to use the {## link} macro in these situations.
For example:
{## insert element=sections.1.body}
may also be written
{##;insert;element=sections.1.body}
Note that template macro string parameters and options support sprintf style escaped characters. This means that characters such as \x22, \r and %% are supported. Also note that most template attribute values may be quoted. The exception is template element strings, which may not be quoted at this time.
For example:
{## anchor aref="next" format="<a href=\"%url\">Next</a><br/>\r\n"}
If a template file is going to make use of the {## unit} macro at all, a {## unit} macro must be the first macro in the template file. It delimits the beginning and end of each unit. Unit boundaries are used when determining where to break the document when breaking based on content size.
A unit consists of a header, a footer (both of which are optional), and a body (which may be empty). To ensure that the header is the first item in the template and the footer is the last item, text between the {## unit} tag and the {## header} tag will be ignored, as will text between the {## /footer} tag and the {## /unit} tag, including whitespace. The header and footer of a unit will be output in every page containing that unit, enclosing that portion of the unit's body that is able to fit in a particular page. The entire template is a unit that may contain additional units.
An overview of using units in templates with examples is provided in Section 10.6, "Units - Breaking Documents by Content Size."
Syntax
{## unit [BREAK]} [{## header} any HTML {## /header}] any HTML [{## footer} any HTML {## /footer}] {## /unit}
Attributes
BREAK
This optional attribute of the unit macro will force page actions in HTML Export and non-page actions in other export products. It forces a break (page break in HTML Export) before inserting the unit contents unless doing so would cause the body of the first page to be empty. One situation where this attribute would be useful would be to force a page break between each section of a document, perhaps to get one presentation slide per page.
The {## unit} macro and its BREAK attribute are ignored when SCCOPT_EX_PAGESIZE or pagesize (Transformation Server) is set to zero.
It is sometimes important to make sure that a break does not occur in the midst of text that is intended to be on the same page. To prevent breaks like this from occurring, enclose the text that should be kept on the same page inside a nested {## unit}{## header} pair. For example, to prevent a page break from occurring while a link is being created, the template author might write something like the following:
{## unit}{## header} <a href="{## link element=sections.current.body}">Link</a> {## /header}{## /unit}
This macro inserts an element of the source document into the output file at the current location.
Syntax
{## insert [ELEMENT=element [WIDTH=width] [HEIGHT=height] [SUPPRESS=suppress] [TRUNCATE=truncate]] | [NUMBER=number] [URLENCODE]}
Attributes
ELEMENT
This attribute describes which part of the source document should be placed in the output file at the location of the macro. For the possible values for this attribute, see Section 10.2, "The Included Sample Templates."
Note the name of the element being inserted may not be enclosed in quotes.
Example:
{## insert element=sections.1.body}
WIDTH
This optional attribute defines the width in pixels of the element being inserted. It is currently only valid for the image element. If the WIDTH attribute is not present but the HEIGHT attribute is, the width of the image will be calculated automatically based on the shape of the element. If both the WIDTH and HEIGHT attributes are not present, the image's original dimensions are used. If the image's original dimensions are unknown, the defaults assume a HEIGHT and WIDTH of 200.
Example:
{## insert element=slides.1.image width=400}
HEIGHT
This optional attribute defines the height in pixels of the element being inserted. It is currently only valid for the image element. If the HEIGHT attribute is not present, but the WIDTH attribute is, the height of the image will be calculated automatically based on the shape of the element.
Example:
{## insert element=slides.1.image height=400}
SUPPRESS
This optional attribute allows certain things to be suppressed from the output. This is very useful if elements need to be inserted in contexts where HTML is not appropriate, such as passing information to Java applets, ActiveX controls or populating parts of a form.
Possible values are as follows:
TAGS: All HTML tags will be suppressed from the output of the element, however the text may still contain HTML character codes like " or {
For embedded graphics such as those found in word processing sections and spread sheets, both the URL and the <img> tag will be suppressed. Because there would be no way to access the resulting converted embedded graphic, conversion of the graphic is not performed.
Example:
<form method="POST"> <input type="text" size="20" name="author" value="{## insert element=property.author suppress=tags}"> </form>
BOOKMARKS: Turns off all bookmarks in the inserted section. Bookmarks automatically precede many inserted elements so that other template elements may link to them. suppress=bookmarks is provided to prevent problems with nested <a> tags. Note that this represents a subset of the suppression behavior provided by suppress=tags.
INVALIDXMLTAGCHARS :Drops from the output all characters that are not allowed in XML tag names. This is designed to allow template authors to {## insert} custom document property names inside angle brackets ("<" and ">") to create XML tags. Most characters in Unicode and its subset character sets may be used as part of XML tag names. Illegal tag characters include "control" characters such as line feed and carriage return. Additionally, there are special rules for what characters can be the first character in a tag name. See the XML specification for a description of legal tag name characters.
Example:
{## repeat sections.property.others} <{## insert element=property.others.current.name suppress=invalidxmltagchars}> <{## insert element=property.others.current.body suppress=invalidxmltagchars}> </{## insert element=property.others.current.name suppress=invalidxmltagchars}> {/## repeat}
produces something similar to the following:
<MyProperty>PropertyValue</MyProperty>
TRUNCATE
When set, this attribute forces a maximum length in characters for the inserted element. This allows elements to be truncated rather than broken across pages when the page size option is in use. Truncated elements will end with the truncation identifier which is "…" (three periods). All elements that have a truncate value will be no more than the specified number of characters in length including the length of the truncation identifier. In HTML Export, elements are inserted in their entirety if no truncation size is specified. The value of this attribute must be greater than or equal to 5 characters. In other products, elements are simply specified.
An example of a situation where element truncation is useful is to limit the size of entries when building a table of contents.
The TRUNCATE attribute implies suppression of tags for the insert. It also auto applies the no source formatting option for the insert.
Note that the TRUNCATE attribute cannot be used with custom elements, because the custom element definition precludes the existence of any other attributes to {## insert}.
The TRUNCATE attribute has three special aspects to its behavior when grids are being inserted:
When truncation is in effect, the truncation size refers to the number of characters of content in each cell - not the number of characters in the grid as a whole.
While truncation normally causes all markup tags to be suppressed, when grids are in use, the table tags are retained (assuming that the output flavor supports tables).
Users are reminded that only one grid size may be selected for each spreadsheet sheet or database inserted. The size of the grid will be based in part on the TRUNCATE value if one or both the grid dimensions are not specified and the SCCOPT_EX_PAGESIZE or pageSize option (Transformation Server) is in use. In this situation, if a grid from a single sheet is inserted in more than one place in the template, and there are differing TRUNCATE values, then the grid dimensions will be based on the largest TRUNCATE value specified.
NUMBER
This attribute allows the developer to retrieve the total instance count or the current index value of any repeatable element. This can be very useful for writing JavaScript, BasicScript, etc. Four special keywords ("count", "countb0", "value" and "valueb0") don't appear in the document tree but can be used as nodes in the following special cases:
count / countb0: When appended to a repeating element and used with the NUMBER attribute, these nodes allow the developer to insert a text representation of the number of instances of the given repeatable element. count gives the count assuming the first index is 1 and countb0 gives it assuming the first index is 0. For example, if a presentation has three slides, the following template fragment:
<p>{## insert number=slides.count}</p> <p>{## insert number=slides.countb0}</p>
will produce the following text:
<p>3</p> <p>2</p>
value / valueb0: When appended to a repeating element and used with the NUMBER attribute these nodes allow the developer to insert a text representation of the current value of the index of the given repeatable element. value gives the count assuming the first index is 1 and valueb0 gives it assuming the first index is 0. For example, if the current value of the index on slides is 2, the following template fragment:
<p>{## insert number=slides.current.value}</p> <p>{## insert number=slides.current.valueb0}</p>
will produce the following text:
<p>2</p> <p>1</p>
URLENCODE
This optional attribute causes the inserted element to be URL encoded. As such, it is ignored unless it is specified as part of an insert that contains a file name. The following elements may be URL encoded:
pragma.sourcefilename
pragma.cssfile
pragma.embeddedcss
pragma.jsfile (HTML Export only)
In addition, the following elements will be URL encoded when the section type is "Archive" or "AR":
sections.x.fullname
sections.x.basename
sections.x.body
sections.x.title
sections.x.reflink
For all other {## insert}s, this attribute is ignored. As such, OEMs should note that HTML Export does not modify any URLs coming out of the input documents being converted. These URLs continue to be passed through as is. This attribute is also ignored if the URL was created using the EX_CALLBACK_ID_CREATENEWFILE callback. Such URLs are assumed to already be URL encoded.
A Note on Inserting Properties
Because of the special ways that properties are used in documents, property strings are inserted into the output files a little differently than other {## insert} macros. First, the property is always inserted as if the SCCOPT_EX_NOSOURCEFORMATTING or noSourceFormatting (Transformation Server) option were set. This prevents formatting characters such as newlines from interfering with the property strings. Second, the property is always inserted as if the template specified suppress=tags. This provides the template writer with maximum control over how property strings are presented.
These macros allow areas of the template to be used or ignored based on information about an element of the source file.
Syntax
{## if ELEMENT=element [CONDITION=Exists|NotExists] [VALUE=value]} any HTML {## /if}
or
{## if ELEMENT=element [[CONDITION=Exists|NotExists] | [VALUE=value]]} any HTML {## else} any HTML {## /if}
or
{## if ELEMENT=element [[CONDITION=Exists|NotExists] | [VALUE=value]]} any HTML {## elseif ELEMENT=element [[CONDITION=Exists|NotExists] | [VALUE=value]]}} any HTML {## else} any HTML {## /if}
Note that multiple instances of {## elseif} may be used after {## if}. In addition, {## else} is not required when using {## elseif}.
Attributes
ELEMENT
This attribute describes which part of the source file should be tested. For the possible values for this attribute, see Section 10.3, "The Document Tree and Its Elements." If neither the CONDITION nor VALUE attribute exists, the element is tested for existence.
CONDITION
Defines the condition the element is tested for, possible values are Exists and NotExists.
VALUE
Defines the values the element should be tested against. The VALUE attribute is currently valid only for the sections.x.type element for testing of the type of a section of the source file. Possible values include:
ar: Archive
bm: Bitmap
ch: Chart
db: Database
dr: Drawing
em: Email
mm: Multimedia
pr: Presentation
ss: Spreadsheet
wp: Word processor document
Example 1:
{## if element=property.comment} <p><b>Comment property exists</b></p> {## else} <p><i>Comment property does not exist</i></p> {## /if} {## if element=sections.1.type value=wp} <p><b>The source file is a word processor file</b></p> {## /if} {## if element=sections.1.type value=ss} <p>Spreadsheet</p> {## elseif element=sections.1.type value=ar} <p>Archive</p> {## elseif element=sections.1.type value=ch} <p>Chart</p> {## else} <p>Not ss, ar, or ch</p> {## /if}
Example 2:
{## if element=sections.current.type value=pr condition=notexists} <p>We can do something here for all document types other than presentations.</p> {## else} <p>This is used only for presentations.</p> {## /if}
This macro allows an area of the template to be repeated, once for each occurrence of an element.
Syntax
{## repeat ELEMENT=element [MAXREPS=maxreps] [SORT=sort]} any HTML {## /repeat}
Attributes
ELEMENT
This attribute describes which part of the source file should be repeated on. It must be a repeatable element. For the possible values for this attribute, see Section 10.3, "The Document Tree and Its Elements."
When using HTML Export, any HTML may be defined between the {## repeat} macro and its closing {## /repeat} macro. This HTML will be repeated once for each instance for the element specified. In addition, the index variable current may be used in any other {##} macro as the element-index of the element being repeated. For instance, the following HTML in the template will produce a list of the footnotes in a document:
<html> <body> <p>Here are the footnotes</p> {## repeat element=footnotes} {## insert element=footnotes.current.body} {## /repeat} <p>No more footnotes</p> </body> </html>
Similarly, the following HTML in the template will insert the names of all the items in an archive:
{## repeat element=sections} {## insert element=sections.current.fullname} {## /repeat}
MAXREPS
This attribute limits the total number of loops the repeat statement may make to the value specified. It is useful for preventing exceptionally large documents from producing an unwieldy amount of output.
SORT
This optional attribute defines whether to sort the output or not. This attribute is ignored if the input file is not an archive file of arctype file. All sorts are done based on the character encoding of the values in the input file. The sorts are also case insensitive at this time. Valid values of the sort attribute are:
fullname: Sort by sections.current.fullname
basename: Sort by sections.current.basename
none: No sorting is done. This is the default.
This macro generates a relative URL to a piece of the document produced by HTML Export. Normally this URL would then be encapsulated by the template with HTML anchor tags to create a link. {## link} is particularly powerful when used within a {## repeat} loop.
Syntax
{## link ELEMENT=element [TOP]}
or
{## link TEMPLATE=template}
or
{## link ELEMENT=element TEMPLATE=template [TOP]}
Attributes
ELEMENT
Defines the element that is the target for the link. The URL that the {## link} macro generates will point to the first instance of this element in the output file. If this attribute is not present, the resulting URL will link to any output file that was produced with the specified template. If such a file does not exist, the specified template will be used to generate a file.
Remember that each element has one or more index values, some of which may be variables. An example of this type of index variable is the "current" in sections.current.body. Use of {## link} affects the value of those index variables, which may cause subtle side effects in the behavior of the linked template file. For a description of how {## link} affects the index of inserted elements, see Section 10.5.1, "Indexes and Structure-Based Breaking."
TEMPLATE
The name of a template file which must exist in the same directory as the original template file. If this attribute is not present, the current template will be used. If an element was specified in the {## link}, then the template must contain a {## insert} statement using that element.
It is important to note that while the template language is normally case insensitive, the case of the template file names specified here is important. The file name specified for the template is passed as is to the operating system. On operating systems such as UNIX, if the wrong case is given for the template file name, the template file will not be found and an error will be returned.
TOP
This attribute is only meaningful if an element is specified in the {## link} command. When this attribute exists, the generated URL will not contain a bookmark, and therefore the resulting link will always jump to the top of the HTML file (HTML Export) or file containing the specified element. This is useful if the top of the template has navigation or other information that the developer would like the user to see.
{## link} Usage Scenarios
Using the first syntax shown at the beginning of this section, a URL for the element bookmark is inserted in the document. Normally this syntax is used to create intradocument links to aid navigation. An example would be creating a link to the next section of the document.
In the second syntax, a URL is created to an output file generated by the specified template. This template is run on the same source document, but may extract different parts of the document. Normally, in this syntax, the "main" template contains a link to a second HTML file. This second file is generated using the template specified by the {## link} command and contains other document elements. As an example, the "main" template could produce a file containing the body of the document and a link to the second HTML file, which contains the footnotes and endnotes.
The third and most powerful syntax also produces the URL of a file generated by the specified template. This template is then expected to contain an insertion of the specified element. Normally this syntax is used with repeatable elements. It allows the author to generate multiple output files with sequential pieces of the document. As such it provides a way to break large documents up into smaller, more readable pieces. An example of where this syntax would be used is a template that generates a "table of contents" in one HTML file (perhaps a separate HTML frame). The entries in the table are then links to other HTML files generated by different templates.
Note that a {## link} statement which specifies a template does not always result in a new file being created. New files are only created if the target of the link does not exist yet. So if for example two {## link} statements specify the same element and template, only one HTML file is produced and the same URL will be used by both {## link} statements.
{## link} Archive File Example
The following template generates a list of links to all the extracted and converted files from the source archive file (represented by decompressedFile in the following example):
{## repeat element=sections} <p><a href="{## link element=sections.current.decompressedFile}"> {## insert Element=sections.current.fullname}</a></p> {## /repeat}
{## link} Presentation File Example
The following example (template.htm) uses the first syntax to generate a set of HTML files, one for each slide in a presentation. Each slide will include links to the previous and next slides and the first slide. Note the use of {## if} macros so the first and last slides do not have Previous and Next links respectively:
template.htm <html> <body> {## insert element=slides.current.image width=300} <hr /> {## if element=slides.previous.image} <p><a href={## link element=slides.previous.image}> previous</a></p> {## /if} {## if element=slides.next.image} <p><a href={## link element= slides.next.image}>Next</a></p> {## /if} </body> </html>
Due to the side effects of {## link} using the element attribute, there can be some confusion over what values "current", "previous" and "next" have when each {## link} is processed. To better illustrate how this template works, consider running it on a presentation that contains three slides:
First Output File
Because no template is specified in the {## link} statements, template.htm is (re)used as the template for all {## link} statements. For the first slide, nothing interesting happens until slides.next is encountered. Because slides.current is 1 in this case, slides.next refers to slides.2 and the {## link} is performed on slides.2.image. This {## link} fills in the anchor tag with the URL for the output file containing the second slide. Because no file containing slides.2 exists, {## link} opens a new file.
Second Output File
For the second slide the template is rerun. slides.current now refers to slides.2, slides.previous refers to slides.1 and slides.next refers to slides.3. The {## insert} statement will insert the second slide.
The {## if} statement referring to slides.previous succeeds. Because the file containing slides.1 already exists, no additional file is created. The anchor tag will be filled in with the URL for the first output file.
The {## if} statement referring to slides.next also succeeds and the anchor tag will be filled in with the URL for the output file containing the third slide. Because no file containing slides.3 exists, {## link} opens a new file.
Third Output File
For the third slide the template is rerun. slides.current now refers to slides.3 and slides.previous refers to slides.2. slides.next refers to slides.4, which does not exist. The {## insert} statement will insert the third slide.
The {## if} statement referring to slides.previous succeeds. Because the file containing slides.2 already exists, no additional file is created. The anchor tag will be filled in with the URL for the second output file.
The {## if} statement referring to slides.next fails. At this point processing is essentially complete.
This macro generates a relative URL to a piece of the document produced by HTML Export when doing document breaking based on content size.
Syntax
{## anchor AREF=type [STEP=stepval] FORMAT="anchorfmt" [ALTLINK="element"] [ALTTEXT="text"]}
Attributes
AREF
Indicates the relation of the target of the link to the current file. Allowable values for this attribute are:
InsertStart: First page of the inserted element
InsertEnd: Last page of the inserted element
Next: Next page in the inserted element
Prev: Previous page in the inserted element
FirstFile: First page created for the entire document
LastFile: Last page created for the entire document
STEP
This attribute is used to insert a link to "fast forward/rewind" through the output pages. This attribute may only be used if AREF is "next" or "prev". It is specified as a non-zero positive integer. For example, to insert a link to skip ahead 5 pages in a document, the following statement could be used:
{## unit aref="next" step="5" format="<p><a href=\"%url\">Next</a></p>"}
If not specified, the default value of "step" is one (1), which corresponds to the next/previous page. This attribute has no meaning when aref equals "insertstart", "insertend", "firstfile" or "lastfile".
FORMAT
This is an sprintf style format string specifying the text to output as the link. HTML Export replaces the %url format specifier with the target URL into the format string. For example:
{## anchor aref="next" format="<a href=\"%url\">Next</a><br/>\r\n"}
ALTLINK
An attribute used to specify the target of the anchor if it cannot be resolved based on the anchor type. For example, the final file of a breakable element has no "next" file, and thus would resolve to nothing. However, if the altlink attribute is specified, the anchor will be generated using a URL to the first file found containing the specified element.
Note that no EX_CALLBACK_ID_ALTLINK callback will be made if an EX_CALLBACK_ID_ALTLINK attribute is specified in the {## anchor} statement.
For example:
{## anchor aref=next format="<a href=\"%url\">Next</a>" altlink=headings.next.body}
ALTTEXT
Text to be output if the anchor cannot be resolved. If this attribute is not specified, no text will be output if the anchor target does not exist. For example:
{## anchor aref=next format="<a href=\"%url\">Next</a>" alttext="Next"}
This macro causes {##} statements in an area of the template file to be ignored by the template parser. Any text between the {## ignore} and {## /ignore} tags will be written to the output file as-is. This macro allows {##} statements in an area of the template to be commented out for debugging purposes, or to actually write out the text of another {##} macro. However, the browser will parse any HTML tags inside the ignored block and the text will be formatted accordingly. This macro can ignore all {##} macros except for an {## /ignore} macro. No escape sequence has been implemented for this purpose. As a result, {## ignore} statements cannot be nested. If they are nested, a run time template parser error will occur.
Syntax
{## ignore} any HTML or other {##} macros {## /ignore}
To fully comment out a section of the template, surround the {## ignore} statements with HTML comments.
For example:
<!--{## ignore} everything between here and the end HTML comment will be commented out. {/## ignore}-->
The {## comment} macro allows the template writer to include comments in the template without including them in the final output files. {## comment} provides the functionality of {## ignore}, but the text inside the {## comment} block is not rendered to the output files and is not included in page size calculations. Like {## ignore}, {## comment} macros may not be nested.
Syntax
{## comment} any HTML or other {##} macros {## /comment}
This command allows other templates to be inserted into the current template. It works in a manner similar to the C/C++ # include directive.
Syntax
{## include TEMPLATE=template}
Attributes
TEMPLATE
This attribute gives the name of the template to insert.
This macro sets an option to a given value. All {## option} statements are executed in the order in which they are encountered. Remember when using this template macro that the {## unit} tag must be the first template macro in any template.
Options set in the template have template scope. This means that, for example, if a {## link} macro references another template, options in the referenced template are not affected by the option settings from the parent template. Similarly, when the files contained in an archive file are converted, Export recursively calls itself to perform the exports of the child documents in the archive. Each child document is converted using a copy of the parent template, and that copy does not inherit the option values from the parent template.
The strings used to specify options from inside templates correspond to the option names. See the Options documentation for more details.
Options set using {## option} in the template are not inherited by the exports performed on files within archives. Each child export receives a fresh copy of all option values as originally set with DASetOption.
Remember that setting an option in the template overrides any option value set by an application within the scope of the template.
See Appendix B, "HTML Export Options" for a description of how to treat a hyperlink in a Word input document, using the {## option} in the template.
Syntax
{## option OPTION=value}
The supported OPTION attributes and their values are listed in a table in the "Attributes" section that follows.
Attributes
OPTION
Transformation Server values (SOAP) are indicated in parentheses following the capitalized text (C++).
SCCOPT_GRAPHIC_TYPE (graphicType) FI_GIF (gif), FI_JPEGFIF (jpeg), FI_PNG(png), FI_NONE(noGraphics)
SCCOPT_GIF_INTERLACED( graphicGifInterlaced): 0, 1, TRUE (true), FALSE (false)
SCCOPT_JPEG_QUALITY (graphicJpegQuality: ) Integer from 1 to 100
SCCOPT_GRAPHIC_SIZEMETHOD (graphicSizeMethod): SCCGRAPHIC_QUICKSIZING (quick), SCCGRAPHIC_SMOOTHSIZING (smooth), SCCGRAPHIC_SMOOTHGRAYSCALESIZING (smoothGray)
SCCOPT_GRAPHIC_OUTPUTDPI (graphicOutputDPI): Integer from 0 to 2400
SCCOPT_GRAPHIC_SIZELIMIT (graphicSizeLimit) :Integer greater than or equal to zero.
SCCOPT_GRAPHIC_WIDTHLIMIT (graphicWidthLimit) :Integer greater than or equal to zero.
SCCOPT_GRAPHIC_HEIGHTLIMIT (graphicHeightLimit): Integer greater than or equal to zero.
SCCOPT_EX_FONTFLAGS (fontFlags): SUPPRESS_SIZE (suppressSize), SUPPRESS_COLOR (suppressColor), SUPPRESS_SIZECOLOR, SUPPRESS_FACE (suppressFace), SUPPRESS_SIZEFACE, SUPPRESS_COLORFACE, SUPPRESS_ALL, SUPPRESS_NONE ( 0 )
SCCOPT_EX_GRIDROWS (gridRows): Integer greater than or equal to zero.
SCCOPT_EX_GRIDCOLS (gridCols): Integer greater than or equal to zero.
SCCOPT_EX_GRIDADVANCE (gridAdvance): DOWN (advanceDown), ACROSS (advanceAcross)
SCCOPT_EX_GRIDWRAP (gridWrap): TRUE (true), FALSE (false)
The {## copy} macro is used to copy extra, static files into the output directory along with the output from the converted document. For example, if a template author has added a company logo that was not in the original input document, {## copy} can be used to make it a part of the converted output document. Other examples include graphics used to mimic "buttons" for navigation, outside CSS files, or a piece of Java code to be run.
Syntax
{## copy FILE=file}
Attributes
FILE
This is the name of the file to be copied. If a relative path name is specified as part of the file, then it must be relative to the directory containing the root template file.
For example:
{## copy FILE=uparrow.gif}
The {## copy} macro may occur anywhere inside a template. If the {## copy} is inside a {## if}, then the {## copy} will only be executed if the condition is TRUE. In {## repeat} loops, the {## copy} will only be performed if the loop is executed one or more times. In addition, if the {## repeat} loops more than once, HTML Export detects this and the {## copy} is executed only once.
As its name suggests, the {## copy} macro is a straight file copy. Therefore, no conversions are performed as part of the copy. For example, graphics formats are not changed and graphics are not resized. Template authors should also remember to use {## graphic} when graphics and other files are copied so that space will be created for the external graphic in the text buffer size calculations.
Because the only action HTML Export takes is to copy the requested file, it is up to the template author to make use of the copied file at another point in the template. For example, a graphic file may be copied and then the template can use an <img> tag which references the copied graphic. The following snippet of template code would do this:
{## copy FILE=Picture.JPG {## graphic PATH=Picture.JPG} <img src="Picture.JPG">
The OEM should also know that if the file copy fails, HTML Export will continue and no error will be reported back to the OEM.
Previous releases of HTML Export used different macro syntax where template macros were expected to start with {Inso} rather than {##}. In addition some words that had been abbreviated must now be spelled out ("insert" instead of "ins"). The old syntax will continue to be supported for the foreseeable future. However, it has been deprecated. The old Inso macros and their new equivalents are as follows:
{insoins} is now {## insert}
{insoif} ... {/insoif} is now {## if} ... {## /if}
{insoelseif} ... {/insoelseif} is now {## elseif} ... {## /elseif}
{insoelse} ... {/insoelse} is now {## else} ... {## /else}
{insoignore} ... {/insoignore} is now {## ignore} ... {## /ignore}
{insolink} is now {## link}
{insorep} ... {/insorep} is now {## repeat} ... {## /repeat}
It should be noted that templates may not mix the old style of Inso macro in with the new {##} style in the same template.
It should also be noted that no new or future features that export will include support the old syntax. Thus for example, the old syntax has not been extended to include support for the new {## unit} macros.
One of the most powerful features of the template architecture is the ability to break long word processor documents up into logical pieces and create powerful navigation aids to access them.
To understand how this is done, the developer must first understand the document tree as it relates to word processor documents. The somewhat complex graphic that follows attempts to show how the elements in the tree relate to a real-world document.
The following are some examples of elements and the data they would produce if run against the document shown in the preceding image. Note the omission of the default nodes body and contents in the second two examples:
body.contents.headings.2.body.title: would produce "Present Day."
body.contents.headings.2.body.contents.headings.1.body.title: would produce "Commercial."
body.contents.preface: would produce "The History of Flight" and the text below it, up to but not including "Introduction."
headings.2.headings.1.headings.3.title: would produce "McDonnell-Douglas."
headings.2.headings.1.headings.3.contents: would produce the text below "McDonnell-Douglas" but above "Military."
Breaking documents requires that HTML Export understand the logical divisions in the structure of a document. Currently the only formats that can give HTML Export this information in an unambiguous manner are Microsoft Word 95 and higher and WordPerfect 6.0 and higher. In these formats, the breaking information is available if the author placed Table of Contents information in the document. Refer to the appropriate software manual for information on the necessary procedure for including this information. That is not to say that the document must have a TOC, only that the information to build one must be present.
It should be noted that some word processing formats, including Microsoft Word 2002 (XP), allow users to specify TOC entries in multiple ways. HTML Export only supports two of these methods if the TOC is specified through:
Applied heading styles: Yes
Custom styles with outline levels: Yes
Outline level applied as a paragraph attribute: No
TOC entries: No
Additionally, if a heading style is applied to text inside a table in the original document, HTML Export will not break on that heading. This is because HTML Export will not break within tables.
The sample templates that ship with the HTML Export SDK use document breaking extensively and are probably the best way to understand the uses of the structure-based breaking feature.
All repeatable nodes have an associated index variable that at any given time in the export process has a current value. For elements that contain repeatable nodes as part of their path, the instance of the repeatable element must be specified by using a number or one of several index variable keywords. The possible values for this index variable (referred to as x in Section 10.3.3, "Element Definitions") are as follows:
A whole number (integer). HTML Export indexes begin counting with 1 (not 0).
current
next
previous
first
last
For numeric values, the number is simply inserted as another node in the path. For example, slides.1.image references the first slide in a presentation and footnotes.2.body references the second footnote in a document.
Elements that cannot be guaranteed to be within the document to which the template is applied should not be explicitly referenced. For example, referencing sections.4.body may result in unexpected behavior in documents that have less than 4 sections. Requesting a non-existent element won't cause an error in HTML Export; the insertion will just be ignored. However, if other HTML surrounding the insertion depends on the results of the insert, the output may be invalid HTML.
The current, next, previous, first and last keywords are fairly self-explanatory. For example, slides.current.image references the current slide and slides.next.image refers to the next slide. When the template is processed, the current, next, previous, first and last variables are replaced with the appropriate index value.
next and previous do not change the value of the index, as was the case in versions of HTML Export prior to the 1.2 release. As a result, the only places where the index is changed are inside of a {## repeat} loop and as the result of a {## link} statement. For more information, see Section 10.4.4, "Loop: {## repeat}," Section 10.4.5, "Linking with Structured Breaking: {## link}," and Section 10.5, "Breaking Documents by Structure."
{## repeat…}
The initial value of the index variable for any given repeatable element typically is 1. For {## repeat} loops, the index is incremented with each iteration. Termination of a {## repeat} loop resets the counter to its initial value. Actually, it is more accurate to say that the scope of the index is the repeat loop.
The following template fragment uses current in a repeat loop, which outputs all the footnotes in the source file:
{## repeat element=footnotes} {## insert element=footnotes.current.body} {## /repeat}
When a template containing a repeat statement is the target of a {## link} statement that specifies the element to be used as the repeat element, the initial value of the index will be determined by the {## link} processing.
{## link…}
The {## link} statement does not affect the index variable in the context of the current template. The {## link} statement can only affect index variables when both an element and a template are specified. In this case only the index variables in the target for the specified element are affected.
If the element specified in the {## link} contains a next or previous keyword, the value of current in the target file will be affected. The initial value of current in the target will be the value of (current in the source)+1 for next. Similarly, previous has the effect of decrementing the value of current.
The following example uses a single template file and the {## link} macro to create a set of HTML files, one for each slide in a presentation. The {## link} does the dual job of driving the generation of the HTML files and providing a "next" link for navigation. Notice the use of the next keyword in the {## if} macro that checks to see if there is a next slide:
{## unit} <html> <body> <!-- insert the current slide --> {## insert element=slides.current.image width=300} <hr /> <!-- Is there a next slide? --> {## if element=slides.next.image} <!-- If yes, generate a URL to an HTML file containing the next slide. The HTML file is generated using the current template (because there is no template attribute). While generating the new HTML file, the value of the index on slides will be its current value plus 1 once control returns to this template, the value of the index on slides is unchanged. --> <p><a href="{## link element= slides.next.image}">Next</a></p> {## else} <!-- If no, create a link to the HTML containing the first slide. --> <p><a href="{## link element= slides.1.image}">First</a></p> {## /if} </body> </html> {## /unit}
HTML Export has a system for breaking up documents. In addition to being able to break documents according to their structure, template writers can now break documents based on the amount of content to be placed in each output file or "page." Documents can even be broken based on both their structure and content size.
To break documents by content size, two things must be done. First, the SCCOPT_EX_PAGESIZE (pageSize with Transformation Server) option must be set (see the Options documentation for details). The second thing that must be done is that the template used must be equipped with the {## unit} construct.
The basic idea behind the unit template construct is to tell Export what things should be repeated on every "page" and what pieces should only be shown once. In other words, the unit template construct provides a mechanism for grouping template text and document elements. Unit boundaries are used when determining where to break the document when spanning pages.
Here are some examples of the kinds of things the template author might want to appear on every page:
The <meta> tag inserting the output document character set.
A company copyright message.
Navigational elements to link the previous/next pages together.
Typical examples of things that wouldn't go on every page would be:
The actual content of the document.
Structural navigational elements like the links for a table of contents.
A unit consists of a header, a footer (both of which are optional), and a body. Items that are to be repeated at the beginning or end of every unit should be placed in the header or footer respectively.
A unit is delimited by the {## unit} template macro. Similarly, the {## header} and {## footer} template macros delimit the header and footer respectively. The body is everything that is left between the header and the footer. The {## unit} macro must be the first macro in the template. The body frequently contains nested units. The body may be empty.
To ensure that the header is the first item in the template and the footer is the last item, text between the {## unit} tag and the {## header} tag will be ignored, as will text between the {## /footer} tag and the {## /unit} tag, including whitespace. The header and footer of a unit will be output in every page containing that unit, enclosing that portion of the unit's body that is able to fit in a particular page. The entire template is a unit that may contain additional units.
By way of example, let's take another look at the very simple template from Section 10.1, "What Is a Template?" To make things more interesting, let's insert the character set into the template with a <meta> tag. Let's also insert some better navigation to improve movement between the pages. The modified version of the template is as follows:
{## unit}{## header} <html><head> <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset={## insert element=pragma.charset}" /></head> <body> {## anchor aref="prev" format="<p><a href=\"%url\">Prev</a></p>"} {## /header} <p>Here is the document you requested. {## insert element=property.title} by {## insert element=property.author}</p> <p>Below is the document itself</p> {## insert element=body} {## footer} {## anchor aref="next" format="<p><a href=\"%url\">Next</a></p>"} </body> </html> {## /footer}{## /unit}
A very small value (about 20 characters) is used for the page size option. The resulting HTML might look like this (HTML that is the result of a macro is in bold):
file1.htm
<html><head> <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset= us-ASCII"/></head> <body> <p>Here is the document you requested.</p> <p>A Poem by Phil Boutros</p> <p><a href="file2.htm">Next</a></p> </body> </html>
file2.htm
<html><head> <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ASCII" /></head> <body> <p><a href="file1.htm">Next</a></p> <p>Below is the document itself</p> <p>Roses are red</p> <p>Violets are blue</p> <p><a href="file3.htm">Prev</a></p> </body> </html>
file3.htm
<html><head> <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ASCII" /></head> <body> <p><a href="file2.htm">Prev</a></p> <p>I'm a programmer</p> <p>and so are you</p> </body> </html>
There are several things to note:
The page size option value does not apply to the text from the template, only the text inserted from the source document. Each page contains roughly 20 characters of visible input document text.
The {## insert} of the character set is part of the {## header} and therefore is inserted into all the output pages.
Text from the body of the unit is inserted sequentially. Thus "as is" template text such as the line "<p>Below is the document itself</p>" is only inserted once.
The {## anchor} tags only insert links to the previous/next page if there actually is a previous/next page. Thus the first page does not have a link to the non-existent previous page.
Finally, the output of the document is split according to the page breaking rules.
The {## unit} macro is only required in templates that are designed to break pages based on size using the SCCOPT_EX_PAGESIZEpageSize option. An example of a template that would not perform any size-based breaking is one that defines an HTML <frame>, but does not include any document content. Another example where size-based breaking might not be desired is a table of contents page, even though a table of contents page does contain document content.
A template that does not conform to the {## unit} format is a not a size-based breaking template. Support for this type of template will continue for the indefinite future. The template will be considered to not be a size-based breaking template if the first macro tag encountered is something other than {## unit}. This means that there cannot be any {## unit}, {## header} or {## footer} macros later in the template. The value of the SCCOPT_EX_PAGESIZEpageSize option will be ignored for this type of template.
All repeatable nodes have an associated index variable. For information about using index variable keywords such as "Next" and "Last," see Section 10.5.1, "Indexes and Structure-Based Breaking." In addition to those index variable keywords, repeatable grid elements have four additional keywords. They are:
up
down
left
right
These keywords may only appear immediately after the grids node in the document tree. For example grids.up.body is legal, but sections.left.grids.1.body is not. Use of these keywords is otherwise self-explanatory.
Note too that individual grids are only addressable relative to each other. In other words, while it is possible to specify the "up" grid, it is not possible to arbitrarily specify a grid directly (for example., "5, 7").
In order to support spreadsheets (and database files, though they are not as common), a new template-based navigation concept known as a "grid" has been introduced. Grids offer a way to consistently navigate a spreadsheet or database in an intuitive fashion.
Grids can be used to present the output of large spreadsheets in smaller pieces, so that less scrolling is necessary. It can also be used to help prevent the HTML versions of large spreadsheets from overwhelming browsers, potentially causing them to lock up. Grids can also be used to halt processing of large spreadsheets before they waste too much CPU time.
To use grids, the template author should use the new grid template element (see Section 10.3.3, "Element Definitions"). Grids may only be used in templates that have been enabled with the {## unit} template macro. It is also important to set the grid-related options. See the Options documentation for details).
The grid support has some important limitations:
The output file format and flavor are expected to supports tables, although this is not required.
Grids are only used when converting spreadsheets and database input files. Grids are not available for word processing files at this time.
Due to size constraints, grid support works best if the contents of the cells in the input file do not make use of a lot of formatting (bold, special fonts, text color, etc.).
To further explain the grid system, consider a multi-sheet spreadsheet workbook as an example. Each sheet in the spreadsheet workbook is broken into a collection of grids. Each grid has a fixed maximum size and is a rectangular portion of the spreadsheet. The size of the grid is specified as a number of spreadsheet cells. For example, consider the following 7x10 spreadsheet:
If the OEM wanted to break it up into 3x4 grids, 9 grids would be produced as shown in the following diagrams:
Normally, all grids have the same number of cells. The exception is that grids at the right or bottom edge of the spreadsheet may be smaller than the normal size. Grids will never be larger than the requested size. For this reason, grids can easily be navigated by using "up", "down", "left" or "right". One thing that grids cannot do is address individual cells in a spreadsheet (except, of course, in the degenerate case of a grid whose size is 1 x 1).
HTML Export does not force deck/page breaks between each grid. Therefore, if the template writer wants to limit each deck/page to only one grid, they should force the break in the template.
Not all output flavors supported by HTML Export support the creation of tables. If the output flavor does not support tables, HTML Export will still support grids. However, HTML Export's normal non-table output will be what is presented in grid form. For example, if "[A1]" represents the contents of cell A1, then we would export the following for a grid of size (2x2):
If grids.1.body is:
[A1]
[A2]
[B1]
[B2]
then grids.right.body is:
[C1]
[C2]
[D1]
[D2]
and grids.down.body is:
[A3]
[A4]
[B3]
[B4]
Through the use of templates, HTML Export users have infinite flexibility in the way they can present converted documents. Users typically use one of the following four strategies to select a template:
The simplest method is to use the internal template, which is built into HTML Export. This is the template used when the SCCOPT_EX_TEMPLATE (using Transformation Server, template) option is not set. This template produces a very basic, rudimentary presentation of the input document. The template is an external approximation of this internal document.
There are also sample templates shipped with HTML Export. These templates are designed to meet different needs for HTML Export users (polished navigation, simple HTML for document indexing engines, etc.).
With a bit more effort, the user can modify one of the sample templates shipped with HTML Export. Simple changes, such as adding graphics or static text, should be easily accomplished by someone with a willingness to experiment with these templates.
Advanced users may choose to write a template of their own design, customized specifically to their needs. Such templates can incorporate elements from a wide range of Web standards, such as Java. Needless to say, users who go this route should have strong technical skills at the outset. They should begin the process of creating templates by reading through this chapter in its entirety and looking at the template tutorial.
For non-Unicode templates, the content of the template is copied byte for byte to the output files as needed. Of particular note is the fact that no character mapping takes place on the text in the template file. However, this can create problems when the source input document overrides the requested SCCOPT_EX_OUTPUTCHARACTERSET (using Transformation Server, outputCharacterSet) option setting. To solve this problem, users may use templates written in Unicode.In order for HTML Export to know that a template is encoded in Unicode, the template file must begin with the Unicode Byte Order Mark (BOM). All files beginning with the BOM are assumed to be encoded in Unicode. HTML Export automatically converts Unicode templates to the output character set as needed.