Support functions are used during filtering to manipulate or generate information on the resource. The robot can then process the resource by calling filtering functions. These functions can be used in enumeration and generation filters in the file filter.conf.
The assign-source function assigns a new value to a given information source. This function permits editing during the filtering process. The function can assign an explicit new value, or it can copy a value from another information source.
Name of the source whose value is to be change
Specifies an explicit value
Information source to copy to dst
You must specify either a value property or a srcproperty, but not both.
Data fn=assign-source dst=type src=content-type
The assign-type-by-extension function uses the resource’s file name to determine its type and assigns this type to the resource for further processing.
The setup-type-by-extension function must be called during setup before assign-type-by-extension can be used.
Source of file name to compare. If you do not specify a source, the default is the resource’s path
MetaData fn=assign-type-by-extension
The clear-source function deletes the specified data source. You typically do not need to perform this function. You can create or replace a source by using the assign-source function.
Name of the source to delete
The following example deletes the path source:
MetaData fn=clear-source src=path
The convert-to-html function converts the current resource into an HTML file for further processing if its type matches a specified MIME type. The conversion filter automatically detects the type of the file it is converting.
MIME type from which to convert
The following sequence of function calls causes the filter to convert all Adobe Acrobat PDF files, Microsoft RTF files, and FrameMaker MIF files to HTML, as well as any files whose type was not specified by the server that delivered it.
Data fn=convert-to-html type=application/pdf
Data fn=convert-to-html type=application/rtf
Data fn=convert-to-html type=application/x-mif
Data fn=convert-to-html type=unknown
The copy-attribute function copies the value from one field in the resource description into another.
Field in the resource description from which to copy
Item in the resource description into which to copy the source
Maximum length of the source to copy
Boolean property indicating whether to fix truncated text, to not leave partial words. This property is false by default
Generate fn=copy-attribute \\
src=partial-text dst=description truncate=200 clean=true
The generate-by-exact function generates a source with a specified value, but only if an existing source exactly matches another value.
Name of the source to generate
Value to assign dst
Source against which to match
The following example sets the classification to siroe if the host is www.siroe.com.
Generate fn="generate-by-exact" match="www.siroe.com:80" src="host" value="Siroe" dst="classification"
This generate-by-prefix function generates a source with a specified value if the prefix of an existing source matches another value.
Name of the source to generate
Value to assign dst
Source against which to match
Value to compare to src
The following example sets the classification to Compass if the protocol prefix is HTTP:
Generate fn="generate-by-prefix" match="http" src="protocol" value="World Wide Web" dst="classification"
The generate-by-regex function generates a source with a specified value if an existing source matches a regular expression.
Name of the source to generate
Value to assign dst
Source against which to match
Regular expression string to compare to src
The following example sets the classification to siroe if the host name matches the regular expression *.siroe.com. For example, resources at both developer.siroe.com and home.siroe.com are classified as Siroe:
Generate fn="generate-by-regex" match="\\\\*.siroe.com" src="host" value="Siroe" dst="classification"
The generate-md5 function generates an MD5 checksum and adds it to the resource. You can then use the filter-by-md5 function to deny resources with duplicate MD5 checksums.
None
Data fn=generate-md5
The generate-rd-expires function generates an expiration date and adds it to the specified source. The function uses metadata such as the HTTP header and HTML <META> tags to obtain any expiration data from the resource. If none exists, the function generates an expiration date three months from the current date.
Name of the source. If you omit it, the source defaults to rd-expires.
Generate fn=generate-rd-expires
The generate-rd-last-modified function adds the current time to the specified source.
Name of the source. If you omit it, the source defaults to rd-last-modified
Generate fn=generate-last-modified
The rename-attribute function changes the name of a field in the resource description. The function is most useful in cases where, for example, the extract-html-meta function copies information from a <META> tag into a field and you want to change the name of the field.
String containing a mapping from one name to another
The following example renames an attribute from author to author-name:
Generate fn=rename-attribute src="author->author-name"