Sun Java System Portal Server 7.1 Administration Guide

extract-html-text

The extract-html-text function extracts the first few characters of text from an HTML file, excluding the HTML tags, and adds the text to the resource description. This function permits the first part of a document’s text to be included in the RD. A content-type may be specified to restrict the kind of URLs that are generated.

Properties

truncate

The maximum number of bytes to extract

skip-headings

Set to true to ignore any HTML headers that occur in the document

type

Optional property. If omitted, all URLs are generated

Example

Generate fn=extract-html-text truncate=255 type=text/html skip-headings=true