Sun Java System Portal Server 7.1 Technical Reference

extract-html-toc

The extract-html-toc function extracts the table-of-contents from the HTML headers and add it to the resource description.

Parameters

The parameters used with the extract-html-toc function and their description are:

truncate: The maximum number of bytes to extract.
level: Maximum HTML header level to extract. This parameter controls the depth of the table of contents.

Robot HTML Summarizer does not generate description and partial text for some of the documents, such as text/HTML, application/x-maker, or x-frame. There are three causes for Robot not generating the description and partial text for the following:

For HTML or text - Unclosed JavaScript tag. This is an error that you need to fix in the HTML page itself.
Robot does not index the part of the HTML page that falls between stopindex and startindex.

For any file other than HTML or text, such as application/x-maker, or x-frame, Robot uses a third party Convertor to convert them into HTML. Then, Robot indexes them. In some cases, the Convertor might not able to generate the HTML or it may generate an empty HTML body. In this case, Sun will report to the third party for a fix or a patch to solve the issue.

Example

Generate fn=extract-html-toc truncate=255 level=3