Sun GlassFish Web Space Server 10.0 Secure Web Access Add-On Guide

Rules for HTML Content

HTML content in web pages can be further classified into attributes, forms and applets. Accordingly, the rules for HTML content are classified as:

Attribute Rules for HTML Content

This rule identifies the attributes of a tag whose value needs to be rewritten. The attribute values can be a simple URL, JavaScript, or DHTML content. For example:

This section describes the following:

Attribute Rule Syntax

<Attribute name="attributeName" [tag="*" valuePatterns="" source=”*” type=”URL|DHTML|DJS”]/>

where,

attributeName is the name of the attribute (mandatory)

tag is the tag to which the attribute belongs (optional, default * , meaning any tag)

valuePatterns See Using Pattern Matching in Rules.

source specifies the URI of the page in which this attribute is defined ( optional, default * , meaning in any page)

type specifies the type of the value (optional). They can be:

URL - a simple URL (default value).

DHMTL - DHTML content. This kind of content is seen in standard HTML content and is used in Microsoft’s HTC format files.

DJS - JavaScript content. All HTML event handlers such as onClick and onMouseover have JavaScript inlined with the HTML attribute.

Attribute Rule Example

Assume the base URL of the page is:

http://mymachine.intranet.com/mypage.html

Page Content:

<a href="http://mymachine.intranet.com/mypage.html">

Rules

<Attribute name="href"/>
or
<Attribute name="href" tag="a"/>

Output

<a href=gateway-URL/http://mymachine.intranet.com/myhome.html>

Description

Because the URL to be rewritten is already an absolute URL, only the Gateway URL is prefixed to the URL.

DJS Attribute Example

Assume the base URL of the page is:

http://abc.sesta.com/focus.html

Page Content:

<Form>

<input TYPE=TEXT SIZE=20 value=focus onClick="Check(\q/focus.html\q,\qfocus\q);return;">

</Form>

Rules

<Attribute name=”onClick” type=”DJS”/>
<Function type="URL" name="Check" paramPatterns="y,"/>

Output

<Form>

<INPUT TYPE=TEXT SIZE=20 value=focus onClick="Check(\q
gateway-URL
/http://abc.sesta.com/focus.html\q,\qfocus\q);return;">

</Form>

Description

Two rules are required to rewrite the specified page content. The first rule identifies the onClick JavaScript token. The second rule identifies the parameter of the check function that needs to be rewritten. In this case, only the first parameter is rewritten because paramPatterns has the value y in place of first parameter.

The Gateway URL and the base URL of the page on which the JavaScript tokens appear are prefixed to the required parameter.

Form Rules for HTML Content

The HTML pages that a user browses may contain forms. Some form elements may take a URL as the value.

This section is divided into the following parts:

Form Rule Syntax

<Form name="form1" field="visit" [valuePatterns="" source="*"]/>

where

name is the name of the form (mandatory)

field is the field in the form whose value needs to be rewritten (mandatory)

valuePatterns See Using Pattern Matching in Rules

source is the URL of the html page where this form definition is present (optional, default *, meaning in any page)

Form Rule Example

Assume the base URL of the page is:

http://test.siroe.com/testcases/html/form.html

Page Content

Assume the page URI is form.html and is located in the root directory of the server.

<form name=form1  method=POST action=
"http://test.siroe.com/testcases/html/form.html">
<input type=hidden name=abc1 value="0|1234|/test.html">
</form>

To rewrite /text.html present in the value of hidden field named abc1 which is part of form1. The following rules are needed.

Rules

<Form source="*/form.html" name="form1" 
field="abc1" valuePatterns="0|1234|"/>
<Attribute name="action"/>

Output

<FORM name=”form1” 
method=”POST” action="gateway-URL/
http://test.siroe.com/testcases/html/form.html">
<input type=hidden name=abc1 
value="0|1234|gateway-URL/
http://test.siroe.com/test.html">
</FORM>

Description

The action tag is rewritten using some defined HTML attribute rule.

The input tag attribute value’s value is rewritten as shown in the output. The specified valuePatterns is located, and all content following the matched valuePatterns is rewritten by prefixing the Gateway URL, and the base URL of the page. See Using Pattern Matching in Rules.

Applet Rules for HTML Content

A single web page may contain many applets, and each applet may contain many parameters. Rewriter matches the values specified in the rule with the HTML definition of the applet and modifies the URL values present as a part of the applet parameter definition. This replacement is carried out at the server and not when the user is browsing the particular web page. This rule identifies and rewrites the parameters in both the applet and object tags of the HTML content.

This section is divided into the following parts:

Applet Rule Syntax

<Applet code="ApplicationClassName/ObjectID
" param="parametername" [valuePatterns="" source="*"] />

where

code is the name of the applet or object class (mandatory)

param is the name of the parameter whose value needs to be rewritten (mandatory)

valuePatterns See Using Pattern Matching in Rules.

source is the URL of the page that contains the applet definition (optional, default is *, meaning, in any page)

Applet Rule Example

Assume the base URL of the page is:

http://abc.siroe.com/casestudy/test/HTML/applet/rule1.html

Page Content:

<applet codebase=”appletcode” code=”
RewriteURLinApplet.class” archive=”/test.jar”>
<param name=Test1 value="/index.html">
</applet>

Rules

<Applet source="*/rule1.html" code=
"RewriteURLin*.class" param="Test*"/>

Output

<APPLET codebase=”gateway-URL
/http://abc.siroe.com/casestudy/test/HTML/
applet/appletcode” code=”RewriteURLinApplet.class”
 archive=”/test.jar”><param name=”Test1” value="
gateway-URL/http:
//abc.siroe.com/index.html">
</APPLET>

Description

The codebase attribute is rewritten because <Attribute name="codebase"/> is a defined rule in the default_gateway_ruleset.

All parameters whose names begin with Test are rewritten. The base URL of the page on which the applet code displays and the Gateway URL are prefixed to the value attribute of the param tag.

Using Pattern Matching in Rules

You can use the valuePatterns field to achieve pattern matching and identify the specific parts of a statement that need to be rewritten.

If you have specified valuePatterns as part of a rule, all the content that follows the matched pattern is rewritten.

Consider the sample form rule below.

<Form source="*/source.html
" name="form1" field="visit
" [valuePatterns="0|1234|"]/>

where

source is the URL of the html page where the form displays.

name is the name of the form.

field is the field in the form whose value needs to be rewritten.

valuePatterns indicates the portion of the string that needs to be rewritten. All content appearing after valuePatterns is rewritten (optional, default "" means the full value needs to be rewritten).

Specifying Specialized Characters in valuePatterns

You can specify specialized characters by escaping them with a backslash. For example:

<Form source="*/source.html" name="form1" field="visit" [valuePatterns="0|1234|\\;original text|changed text”]/>

Using Wild Cards in valuePatterns

You can use the wildcard asterisk (*) character to achieve pattern matching for rewriting.

You cannot specify just an * in the valuePatterns field. Because * indicates a match with all text, no text follows the valuePattern. Therefore, Rewriter has no text to rewrite. You must use * in conjunction with another string such as *abc. In this case, all content that follows *abc is rewritten.


Note –

An asterisk (*) can be used as a wildcard in any of the fields of the rule. However, all the fields in the rule cannot contain an *. If all fields contain a *, the rule is ignored. No error message is displayed.


You can use a * or ** along with the separation character (a semicolon or comma) that displays in the original statement to separate multiple fields. One asterisk (*) matches any field that is not to be rewritten, and two asterisks (**) to match any field that needs to be rewritten.

Using Wild Cards in valuePatterns lists some sample usages of the * wildcard.

Table 4–1 Sample Usage of * Wildcard

URL 

valuePatterns 

Description 

url1, url2, url3, url4

valuePatterns = "**, *, **, *"

url1 and url3 are rewritten because ** indicates the portion to be rewritten

XYZABChttp://host1.sesta.com/dir1.html

valuePatterns = "*ABC"

only the portion http://host1.sesta.com/dir1.html is rewritten. Everything after *ABC needs to be rewritten.

"0|dir1|dir2|dir3|dir4|test|url1

valuePatterns = "*|*|**|*|**|*|"

dir2, dir4 and url1 are rewritten. The last field that needs to be rewritten does not have to be indicated by using **.