![]() | |
Sun Java System Portal Server 6 2005Q1 Technical Reference Guide |
Chapter 55
Robot Application Functions - Generation FunctionsThis chapter contains the following functions:
IntroductionThe following functions are used in the Generate stage of filtering. Generation functions can generate information that goes into a resource description. In general, they either extract information from the body of the resource itself or copy information from the resource’s metadata.
extract-full-textThe extract-full-text function extracts the complete text of the resource and adds it to the resource description.
Parameters
The following table lists the parameters used with the extract-full-text function. The table contains two columns. The first column lists the parameter, and the second column provides a description.
truncate
The maximum number of characters to extract from the resource.
dst
Name of the schema item that will receive the full text.
Example
Generate fn=extract-full-text
extract-html-metaThe extract-html-meta function extracts any <META> or <TITLE> information from an HTML file and adds it to the resource description. A content-type may be specified to restrict the kind of URLs that are generated.
Parameters
The following table lists the parameters used with the extract-html-meta function. The table contains two columns. The first column lists the parameter, and the second column provides a description.
truncate
The maximum number of bytes to extract.
type
Optional parameter. If omitted, it will generate all URLs.
Example
Generate fn=extract-html-meta truncate=255 type=text/html
extract-html-textThe extract-html-text function extracts the first few characters of text from an HTML file, excluding the HTML tags, and adds the text to the resource description. This permits the first part of a document’s text to be included in the RD. A content-type may be specified to restrict the kind of URLs that are generated.
Parameters
The following table lists the parameters usedwith the extract-html-text function. The table contains two columns. The first column lists the parameter, and the second column provides a description.
truncate
The maximum number of bytes to extract.
skip-headings
Set to true to ignore any HTML headers that occur in the document.
type
Optional parameter. If omitted, it will generate all URLs.
Example
Generate fn=extract-html-text truncate=255 type=text/html skip-headings=true
extract-html-tocThe extract-html-toc function extracts the table-of-contents from the HTML headers and add it to the resource description.
Parameters
The following table lists the parameters used with the extract-html-toc function. The table contains two columns. The first column lists the parameter, and the second column provides a description.
truncate
The maximum number of bytes to extract.
level
Maximum HTML header level to extract. This parameter controls the depth of the table of contents.
Example
Generate fn=extract-html-toc truncate=255 level=3
extract-sourceThe extract-source function extracts the specified values from the given sources and adds them to the resource description.
Parameters
The following table lists the parameter used with the extract-source function. The table contains two columns. The first column lists the parameter, and the second column provides a description.
Example
Generate fn=extract-source src="md5,depth,rd-expires,rd-last-modified"
harvest-summarizerThe harvest-summarizer function runs a Harvest summarizer on the resource and adds the result to the resource description.
To run Harvest summarizers, you must have $HARVEST_HOME/lib/gatherer in your path before you run the robot.
Parameters
The following table lists the parameter used with the harvest-summarizer function. The table contains two columns. The first column lists the parameter, and the second column provides a description.
Example
Generate fn-harvest-summarizer summarizer=HTML.sum