The extract-html-text function extracts the first few characters of text from an HTML file, excluding the HTML tags, and adds the text to the resource description. This permits the first part of a document’s text to be included in the RD. A content-type may be specified to restrict the kind of URLs that are generated.
The parameters usedwith the extract-html-text function and their description are:
The maximum number of bytes to extract.
Set to true to ignore any HTML headers that occur in the document.
Optional parameter. If omitted, it will generate all URLs.
Generate fn=extract-html-text truncate=255 type=text/html skip-headings=true |