HTML content in web pages can be further classified into attributes, forms and applets. Accordingly, the rules for HTML content are classified as:
This rule identifies the attributes of a tag whose value needs to be rewritten. The attribute values can be a simple URL, JavaScript, or DHTML content. For example:
src attributes of an "img" tag point to an image location (simple URL)
onClick attribute of a href attributes that handles on clicking of the link (DJS)
This section describes the following:
<Attribute name="attributeName" [tag="*" valuePatterns="" source=”*” type=”URL|DHTML|DJS”]/>
where,
attributeName is the name of the attribute (mandatory)
tag is the tag to which the attribute belongs (optional, default * , meaning any tag)
valuePatterns See Using Pattern Matching in Rules.
source specifies the URI of the page in which this attribute is defined ( optional, default * , meaning in any page)
type specifies the type of the value (optional). They can be:
URL - a simple URL (default value).
DHMTL - DHTML content. This kind of content is seen in standard HTML content and is used in Microsoft’s HTC format files.
DJS - JavaScript content. All HTML event handlers such as onClick and onMouseover have JavaScript inlined with the HTML attribute.
Assume the base URL of the page is:
http://mymachine.intranet.com/mypage.html
Page Content:
<a href="http://mymachine.intranet.com/mypage.html">
Rules
<Attribute name="href"/> or <Attribute name="href" tag="a"/>
Output
<a href=gateway-URL/http://mymachine.intranet.com/myhome.html>
Description
Because the URL to be rewritten is already an absolute URL, only the Gateway URL is prefixed to the URL.
Assume the base URL of the page is:
http://abc.sesta.com/focus.html
Page Content:
<Form>
<input TYPE=TEXT SIZE=20 value=focus onClick="Check(\q/focus.html\q,\qfocus\q);return;">
</Form>
Rules
<Attribute name=”onClick” type=”DJS”/> <Function type="URL" name="Check" paramPatterns="y,"/>
Output
<Form>
<INPUT TYPE=TEXT SIZE=20 value=focus onClick="Check(\q gateway-URL /http://abc.sesta.com/focus.html\q,\qfocus\q);return;">
</Form>
Description
Two rules are required to rewrite the specified page content. The first rule identifies the onClick JavaScript token. The second rule identifies the parameter of the check function that needs to be rewritten. In this case, only the first parameter is rewritten because paramPatterns has the value y in place of first parameter.
The Gateway URL and the base URL of the page on which the JavaScript tokens appear are prefixed to the required parameter.
The HTML pages that a user browses may contain forms. Some form elements may take a URL as the value.
This section is divided into the following parts:
<Form name="form1" field="visit" [valuePatterns="" source="*"]/>
where
name is the name of the form (mandatory)
field is the field in the form whose value needs to be rewritten (mandatory)
valuePatterns See Using Pattern Matching in Rules
source is the URL of the html page where this form definition is present (optional, default *, meaning in any page)
Assume the base URL of the page is:
http://test.siroe.com/testcases/html/form.html
Page Content
Assume the page URI is form.html and is located in the root directory of the server.
<form name=form1 method=POST action= "http://test.siroe.com/testcases/html/form.html"> <input type=hidden name=abc1 value="0|1234|/test.html"> </form>
To rewrite /text.html present in the value of hidden field named abc1 which is part of form1. The following rules are needed.
Rules
<Form source="*/form.html" name="form1" field="abc1" valuePatterns="0|1234|"/> <Attribute name="action"/>
Output
<FORM name=”form1” method=”POST” action="gateway-URL/ http://test.siroe.com/testcases/html/form.html"> <input type=hidden name=abc1 value="0|1234|gateway-URL/ http://test.siroe.com/test.html"> </FORM>
Description
The action tag is rewritten using some defined HTML attribute rule.
The input tag attribute value’s value is rewritten as shown in the output. The specified valuePatterns is located, and all content following the matched valuePatterns is rewritten by prefixing the Gateway URL, and the base URL of the page. See Using Pattern Matching in Rules.
A single web page may contain many applets, and each applet may contain many parameters. Rewriter matches the values specified in the rule with the HTML definition of the applet and modifies the URL values present as a part of the applet parameter definition. This replacement is carried out at the server and not when the user is browsing the particular web page. This rule identifies and rewrites the parameters in both the applet and object tags of the HTML content.
This section is divided into the following parts:
<Applet code="ApplicationClassName/ObjectID " param="parametername" [valuePatterns="" source="*"] />
where
code is the name of the applet or object class (mandatory)
param is the name of the parameter whose value needs to be rewritten (mandatory)
valuePatterns See Using Pattern Matching in Rules.
source is the URL of the page that contains the applet definition (optional, default is *, meaning, in any page)
Assume the base URL of the page is:
http://abc.siroe.com/casestudy/test/HTML/applet/rule1.html
Page Content:
<applet codebase=”appletcode” code=” RewriteURLinApplet.class” archive=”/test.jar”> <param name=Test1 value="/index.html"> </applet>
Rules
<Applet source="*/rule1.html" code= "RewriteURLin*.class" param="Test*"/>
Output
<APPLET codebase=”gateway-URL /http://abc.siroe.com/casestudy/test/HTML/ applet/appletcode” code=”RewriteURLinApplet.class” archive=”/test.jar”><param name=”Test1” value=" gateway-URL/http: //abc.siroe.com/index.html"> </APPLET>
Description
The codebase attribute is rewritten because <Attribute name="codebase"/> is a defined rule in the default_gateway_ruleset.
All parameters whose names begin with Test are rewritten. The base URL of the page on which the applet code displays and the Gateway URL are prefixed to the value attribute of the param tag.
You can use the valuePatterns field to achieve pattern matching and identify the specific parts of a statement that need to be rewritten.
If you have specified valuePatterns as part of a rule, all the content that follows the matched pattern is rewritten.
Consider the sample form rule below.
<Form source="*/source.html " name="form1" field="visit " [valuePatterns="0|1234|"]/>
where
source is the URL of the html page where the form displays.
name is the name of the form.
field is the field in the form whose value needs to be rewritten.
valuePatterns indicates the portion of the string that needs to be rewritten. All content appearing after valuePatterns is rewritten (optional, default "" means the full value needs to be rewritten).
You can specify specialized characters by escaping them with a backslash. For example:
<Form source="*/source.html" name="form1" field="visit" [valuePatterns="0|1234|\\;original text|changed text”]/>
You can use the wildcard asterisk (*) character to achieve pattern matching for rewriting.
You cannot specify just an * in the valuePatterns field. Because * indicates a match with all text, no text follows the valuePattern. Therefore, Rewriter has no text to rewrite. You must use * in conjunction with another string such as *abc. In this case, all content that follows *abc is rewritten.
An asterisk (*) can be used as a wildcard in any of the fields of the rule. However, all the fields in the rule cannot contain an *. If all fields contain a *, the rule is ignored. No error message is displayed.
You can use a * or ** along with the separation character (a semicolon or comma) that displays in the original statement to separate multiple fields. One asterisk (*) matches any field that is not to be rewritten, and two asterisks (**) to match any field that needs to be rewritten.
Using Wild Cards in valuePatterns lists some sample usages of the * wildcard.
Table 4–1 Sample Usage of * Wildcard
URL |
valuePatterns |
Description |
---|---|---|
url1, url2, url3, url4 |
valuePatterns = "**, *, **, *" |
url1 and url3 are rewritten because ** indicates the portion to be rewritten |
XYZABChttp://host1.sesta.com/dir1.html |
valuePatterns = "*ABC" |
only the portion http://host1.sesta.com/dir1.html is rewritten. Everything after *ABC needs to be rewritten. |
"0|dir1|dir2|dir3|dir4|test|url1 |
valuePatterns = "*|*|**|*|**|*|" |
dir2, dir4 and url1 are rewritten. The last field that needs to be rewritten does not have to be indicated by using **. |