Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Portal Server 6 2005Q1 Technical Reference Guide 

Chapter 52
Robot Application Functions - Filtering Functions

This chapter contains the following sections:


Introduction

The functions discussed in this chapter operate at the Metadata and Data stages to allow or deny resources based on specific criteria specified by the function and its parameters.

These functions can be used in both Enumeration and Generation filters in the filter.conf file.

Each “filter-by” function performs a comparison, then either allows or denies the resource. Allowing the resource means that processing continues to the next filtering step. Denying the resource means that processing should stop, because the resource does not meet the criteria for further enumeration or generation.


filter-by-exact

The filter-by-exact function allows or denies the resource if the allow/deny string matches the source of information exactly. The keyword all matches any string.

Parameters

The following table lists the parameters used with the filter-by-exact function. The table contains two columns. The first column lists the parameter, and the second column provides a description.

src

Source of information.

allow/deny

Contains a string.

Example

The following example filters out all resources whose content-type is text/plain. It allows all other resources to proceed:

Data fn=filter-by-exact src=type deny=text/plain


filter-by-max

The filter-by-max function allows the resource if the specified information source is less than or equal to the given value. It denies the resource if the information source is greater than the specified value.

This function can be called no more than once per filter.

Parameters

The following table lists the parameters used with the filter-by-max function. The table contains two columns. The first column lists the parameter, and the second column provides a description.

src

Source of information. It must be one of the following: hosts, objects, or depth.

value

Specifies a value for comparison.

Example

This example allows resources whose content-length is less than 1024 K:

MetaData fn-filter-by-max src=content-length value=1024


filter-by-md5

The filter-by-md5 function only allows the first resource with a given MD5 checksum value. If the current resource’s MD5 has been seen in an earlier resource by this robot, the current resource is denied. As a result, duplication of identical resources or single resources with multiple URLs is prevented.

You can only call this function at the Data stage or later. It can be called no more than once per filter. The filter must invoke the generate-md5 function to generate an MD5 checksum before invoking filter-by-md5 function.

Parameters

none

Example

The following example shows the typical method of handling MD5 checksums by first generating the checksum and then filtering based on it:

Data fn=generate-md5

Data fn=filter-by-md5


filter-by-prefix

The filter-by-prefix function allows or denies the resource if the given information source begins with the specified prefix string. The resource doesn’t have to match completely. The keyword all matches any string.

Parameters

The following table lists the parameters used with the filter-by-prefix function. The table contains two columns. The first column lists the parameter, and the second column provides a description.

src

Source of information.

allow/deny

Contains a string for prefix comparison.

Example

The following example allows resources whose content-type is any kind of text, including text/html and text/plain:

MetaData fn=filter-by-prefix src=type allow=text


filter-by-regex

The filter-by-regex function supports regular expression pattern matching. It allows resources that match the given regular expression. The supported regular expression syntax is defined by the POSIX.1 specification. The regular expression \\* matches anything.

Parameters

The following table lists the parameters used with the filter-by-regex function. The table contains two columns. The first column lists the parameter, and the second column provides a description.

src

Source of information.

allow/deny

Contains a string for prefix comparison.

Example

The following example denies all resources from sites in the government domain:

MetaData fn=filter-by-regex src=host deny=\\*.gov


filterrules-process

The filterrules-process function handles in the rules in the filterrules.conf file.

Parameters

none

Example

MetaData fn=filterrules-process



Previous      Contents      Index      Next     


Part No: 817-7696.   Copyright 2005 Sun Microsystems, Inc. All rights reserved.