This chapter contains the following sections
The Rewriter modifies the URL portions of various elements that appear on a web page. The Rewriter comes with a default set of rules to determine the elements of a web page to rewrite. A collection of rules for various categories and subcategories is stored in a .dtd file and is called a ruleset. The Rewriter rulesets are defined in XML.
The DTD is located in /opt/SUNWportal/web-src/WEB-INF/lib/rewriter.jar (resources/RuleSet.dtd). Rulesets are used to identify URLs. By default, all strings in web content starting with characters such as “/”, ../, “http” and “https” are considered to be URLs and are candidates for rewriting.
To configure the Rewriter for your implementation, you create a ruleset and define rules in the Rewriter section of the Portal Server Configuration in the administration console. See Administering the Rewriter Service for details on creating and modifying rulesets. You define multiple rules based on the content type in the web pages. For example, the rule required to rewrite HTML content would be different from the rule required to rewrite JavaScript content. Rewriter rules fall into the following categories:
Rules for HTML Content
Rules for JavaScript Content
Rules for XML Content
As Wireless Markup Language (WML) is similar to HTML, HTML rules are applied for WML content.
No rules are required for CSS content.
The ruleset is an XML document and the XML within it must be properly formed. When defining rules in a ruleset, follow these guidelines:
All rules need to be enclosed within the <ruleset> </ruleset> tags.
Include all rules to rewrite HTML content in the <HTML> </HTML> section of the ruleset.
Include all rules to rewrite JavaScript content in the <JSRules> </JSRules> section of the ruleset.
Include all rules to rewrite XML content in the <XML> </XML> section of the ruleset.
HTML content in web pages can be classified into attributes, JavaScript tokens, forms, and applets. Accordingly, the rules for HTML content are classified as:
Attribute rules identify the basic attribute tags in HTML pages to rewrite. Rewriter modifies the various occurrences of the defined tags by expanding or prefixing the existing URL. The default ruleset rewrites the following attribute tags:
action
background
codebase
code
href
src
value
imagePath
lowsrc
archive
The syntax for attribute rules is:
<Attribute name="name" [tag="tag" valuePatterns="patterns"] |
where name specifies the attribute, tag specifies the tag to which the attribute belongs (set to * to match all tags), and patterns specifies the possible patterns to match with the attribute. The tag and valuePatterns parameters are optional.
Web pages can contain pure JavaScript code within the JavaScript tags, or they can contain JavaScript tokens or functions. For example, a web page can contain an onClick() function that causes a jump to a different URL. In order for the page to function properly, the value of the onClick() function needs to be translated and rewritten. In most cases, the rules provided in the default ruleset are sufficient to rewrite the URLs in JavaScript tokens. The default ruleset rewrites the following JavaScript tokens:
onAbort
onBlur
onChange
onClick
onDblClick
onError
onFocus
onKeyDown
onKeyPress
onKeyUp
onLoad
onMouseDown
onMouseMove
onMouseOut
onMouseOver
onMouseUp
onReset
onSelect
onSubmit
onUnload
The syntax for JavaScript Token rules is:
<JSToken>javascript_function_name</JSToken> |
where javascript_function_name is the name of the function such as onLoad or onClick.
Users can browse HTML pages that contain forms. Form elements, such as input, can take a URL as a value. The default ruleset does not rewrite any form elements. The syntax for form rules is:
<Form source="/source.html" name="form1" field="field1"> [valuePatterns="pattern"] /> |
where /source.html is the URL of the HTML page containing the form, form1 is the name of the form, field1 is the field of the form to be rewritten, and pattern indicates the part of the field to be rewritten. All content that follows the pattern specified is rewritten.
The valuePatterns parameter is optional.
A single web page can contain many applets, and each applet can contain many parameters. The Rewriter rule for URLs in applets should contain pattern matching information for the following:
source, such as filename.htm
code, such as classname.class
parameter name, such as servername
parameter value, such as some_url
Rewriter matches the values specified in the rule with the content of the applet and modifies the URLs as required. This replacement is carried out at the server and not when the user is browsing the particular web page. A wildcard character (*) can also be used as part of the rule. For example, the parameter name could be *, in which case, the Rewriter does not compare the parameter name in the applet.
The default ruleset does not rewrite any applet parameters.
The syntax for applet rules is:
<Applet source="sourcehtml.jsp" code="class" param="parameter_name" [valuePatterns="pattern"] |
where /sourcehtml.jsp is the URL containing the applet, class is the name of the applet class, parameter_name is the parameter whose value needs to be rewritten, and pattern indicates the part of the field to be rewritten. All content that follows the pattern specified is rewritten. The valuePatterns parameter is optional.
URLs can occur in various portions of JavaScript code. The Rewriter cannot directly parse the JavaScript code and determine the URL portion. A special set of rules needs to be written to help the JavaScript processor translate the URL.
JavaScript elements that contain URLs are classified as follows:
JavaScript variables are again classified into five categories:
JavaScript URL Variables
JavaScript EXPRESSION Variables
JavaScript DHTML Variables
JavaScript DJS (Dynamic JavaScript) Variables
JavaScript System Variables
URL variables have a URL string on the right hand side. The default ruleset rewrites the following JavaScript URL variables:
imgsrc
location.href
_fr.location
mf.location
parent.location
self.location
The syntax of URL variables in JavaScript content rules is:
<Variable type="URL">variable_name</Variable> |
where variable_name is the name of the variable to be rewritten.
EXPRESSION variables have an expression on the right hand side. The result of this expression is a URL. The Rewriter appends a JavaScript function for converting the expression to the HTML page as it cannot evaluate such expressions. This function takes the expression as a parameter and evaluates it at the client browser.
The default ruleset rewrites the location JavaScript EXPRESSION variable.
The syntax of EXPRESSION variables in JavaScript content rules is:
<Variable type="EXPRESSION">variable_exp</Variable> |
where variable_exp is the expression variable.
DHTML variables are JavaScript variables that hold HTML content. The default ruleset rewrites the following JavaScript DHTML variables:
document.write
document.writeln
The syntax of DHTML variables in JavaScript content is:
<Variable type="DHTML">variable</Variable> |
where variable is the DHTML variable.
DJS (Dynamic JavaScript) variables are JavaScript variables that hold JavaScript content.
The syntax of DJS variables in JavaScript content is:
<Variable type="DJS">variable</Variable> |
where variable is the DJS variable.
The JavaScript code contained in the variable needs another rule to translate it.
System variables are variables that are not declared by the user, but that are available as a part of the JavaScript standard.
The default ruleset rewrites the window.location.pathname JavaScript system variable.
The syntax of system variables in JavaScript content is:
<Variable type="SYSTEM">variable</Variable> |
where variable is the system variable.
Function parameters are classified into four categories:
JavaScript URL Parameters
JavaScript EXPRESSION Parameters
JavaScript DHTML Parameters
JavaScript DJS Parameters
URL parameters are string parameters that directly contain the URL.
The default ruleset rewrites the following JavaScript URL parameters:
openURL
openAppURL
openNewWindow
parent.openNewWindo
window.open
The syntax for URL parameters is:
<Function type = "URL" name = "function" [paramPatterns="y,y,"] /> |
where function is the name of the function to be evaluated and y indicates the position of the parameter(s) that need to be rewritten. Parameter positions are delimited by commas. For example, in the syntax line the first and second parameters need to be rewritten, but the third parameter should not be rewritten.
EXPRESSION parameters are variables within a function that result in a URL when they are evaluated. The syntax for EXPRESSION parameters is
<Function type = "EXPRESSION" name = "function" [paramPatterns="y,y,"] /> |
where function is the name of the function to be evaluated and y indicates the position of the parameter(s) that need to be rewritten. Parameter positions are delimited by commas. For example, in the syntax line the first and second parameters need to be rewritten, but the third parameter should not be rewritten.
DHTML parameters are native JavaScript methods that generate an HTML page dynamically. For example, the document.write() method falls under this category.
The default ruleset rewrites the following JavaScript DHTML parameters:
document.write
document.writeln
The syntax for DHTML parameters is:
<Function type = "DHTML" name = "function" [paramPatterns="y,y,"] /> |
where function is the name of the function to be evaluated and y indicates the position of the parameter(s) that need to be rewritten. Parameter positions are delimited by commas. For example, in the syntax line the first and second parameters need to be rewritten, but not the third parameter should not be rewritten.
Dynamic JavaScript (DJS) parameters such as Cascading Style Sheets (CSS) in HTML are also translated. There are no rules defined for this translation as the URL appears only in the url() function of the CSS. The syntax for DJS parameters is:
<Function type = "DJS" name = "function" [paramPatterns="y,y,"] /> |
where function is the name of the function to be evaluated and y indicates the position of the parameter(s) that need to be rewritten. Parameter positions are delimited by commas. For example, in the syntax line the first and second parameters need to be rewritten, but not the third parameter should not be rewritten.
Web pages can contain XML content which in turn can contain URLs and Rewriter can rewrite URLS in XML content.
XML content that contains URLs is classified as follows:
Rewriter translates XML content based on the tag name.
The default ruleset rewrites the following tags in XML:
baseroot
img
The syntax for tag text is:
<TagText tag ="attribute" attributePatterns="name=src"/> |
where attribute is the name of the tag and src is the name of the attribute.
The rules for attributes in XML are similar to the rules for attributes in HTML. See Attribute Rules for HTML Content for additional information. Rewriter translates attribute values based on the attribute and tag names.
The default ruleset rewrites the following attributes in XML:
xmlns
href
The syntax for attributes in HTML is:
<Attributes> <Attribute name="attribute" [valuePatterns="name=src"/> </Attributes> |
where attribute is the name of the tag and src is the name of the attribute.