extract-html-text

The extract-html-text function extracts the first few characters of text from an HTML file, excluding the HTML tags, and adds the text to the resource description. This permits the first part of a document’s text to be included in the RD. A content-type may be specified to restrict the kind of URLs that are generated.

Parameters

The parameters usedwith the extract-html-text function and their description are:

truncate: The maximum number of bytes to extract.
skip-headings: Set to true to ignore any HTML headers that occur in the document.
type: Optional parameter. If omitted, it will generate all URLs.

Example

Generate fn=extract-html-text truncate=255 type=text/html skip-headings=true

Previous: extract-html-meta
Next: extract-html-toc