Sun GlassFish Web Space Server 10.0 Secure Web Access Add-On Guide

Introduction to Rewriter

The Rewriter component of Secure Remote Access enables end users to browse the intranet by modifying Uniform Resource Identifier (URI) references on web pages so that they point to the Gateway. A URI defines a way to encapsulate a name in any registered name space, and labels it with the name space. The most common kinds of URIs are Uniform Resource Locators (URLs). Rewriter supports only HTTP or HTTPS. This support is regardless of the capitalization of the protocol. Rewriter only supports backslashes when they are part of a relative URL.


Example 4–1 Rewriting URLs

http://abc.sesta.com\\index.html is rewritten.

These URLs are not rewritten: http:\\\\abc.sesta.com. http:/abc.com


Character Set Encoding

HTTP standards require that HTTP headers or HTML meta tags specify a character set for web pages. However, sometimes this information is not available. The character set must be known so that encoding for the data is set and the data is displayed as intended by the creator.

To detect the character sets, install the SUNWjchdt package from the Java Enterprise System Accessory CD. If this product is installed, Rewriter will detect it and use it if necessary.


Note –

Using this product can affect performance, so you should install it only when required. See the jcharset_readme.txt for details on installation, configuration and usage.


Rewriter Usage Scenarios

When a user tries to access intranet web pages through the Gateway, web pages are made available by using Rewriter. Rewriter is used by the URL Scraper and the Gateway.

URL Scraper

The URL Scraper provider gets content from configured URIs. Before sending these URIs to the browser, it expands all relative URIs to absolute URIs.

For example, if a user is trying to access a site as:

<a href="../mypage.html">

Rewriter translates this to:

<a href="http://yahoo.com/mypage.html">

where http://yahoo.com/test/ is the base URL of the page.

See the Sun Java SystemPortal Server Administration Guide for details about the URLScraper provider.

Gateway

The Gateway obtains content from Internet portals. Before sending the content to the browser, it prefixes the Gateway URI to the existing URI so that subsequent URI requests from the browser can reach the Gateway.

For example, a user who is trying to access an HTML page on an Internet machine as:

<a href="http://mymachine.intranet.com/mypage.html>"

Rewriter prefixes this URL with a reference to the Gateway as follows:

<a href="https://gateway.company.com/http://mymachine.intranet.com/ mypage.html>"

When the user clicks a link associated with this anchor, the browser contacts the Gateway. The Gateway fetches the content of mypage.html from mymachine.intranet.com.

The Gateway uses several rules to determine the elements of a fetched web page that will be rewritten.

Writing Rulesets

For more information about defining a ruleset, see the Portal Server Administration Guide. After creating a new ruleset, you need to define the required rules.

This section covers the following topics:

Public Interface (RuleSet DTD)

RuleSet DTD:

<?xml version="1.0" encoding="UTF-8"?>
<!--
  CDDL HEADER START
  The contents of this file are subject to the terms
  of the Common Development and Distribution License
  (the License). You may not use this file except in
  compliance with the License.

  You can obtain a copy of the License at
  http://www.sun.com/cddl/cddl.html and legal/CDDLv1.0.txt
  See the License for the specific language governing
  permission and limitations under the License.

  When distributing Covered Code, include this CDDL
  Header Notice in each file and include the License file
  at legal/CDDLv1.0.txt.
  If applicable, add the following below the CDDL Header,
  with the fields enclosed by brackets [] replaced by
  your own identifying information:
  "Portions Copyrighted [year] [name of copyright owner]"

  Copyright 2008 Sun Microsystems Inc. All Rights Reserved
  CDDL HEADER END
-->

<!--
    The following constraints are not represented in DTD, but taken care
    programatically
    1. In a Rule, All Mandatory attributes cannot be "*".
    2. Only one instance of the below elements is allowed, but in any order.
	1)HTMLRules
	2)JSRules
	3)XMLRules
    3. ID should alway be in lower case.
-->
<!ENTITY % gtype 'GROUPED'>
<!ENTITY % stype 'SCATTERED'>

<!ENTITY % eURL 'URL'>
<!ENTITY % eEXPRESSION 'EXPRESSION'>
<!ENTITY % eDHTML 'DHTML'>
<!ENTITY % eDJS 'DJS'>
<!ENTITY % eSYSTEM 'SYSTEM'>

<!ENTITY % ruleSetElements '(HTMLRules | JSRules | XMLRules)?'>
<!ENTITY % htmlElements '(Form | Applet | Attribute | JSToken)*'>
<!ENTITY % jsElements '(Variable | Function)*'>
<!ENTITY % xmlElements '(Attribute | TagText)*'>

<!ELEMENT RuleSet (%ruleSetElements;,%ruleSetElements;,%ruleSetElements;)>
<!ATTLIST RuleSet
	type (%gtype; | %stype;) "GROUPED"
	id ID #REQUIRED
	extends CDATA "none"
>

<!ELEMENT HTMLRules (%htmlElements;)>
<!ATTLIST HTMLRules
	type (%gtype; | %stype;) "GROUPED"
	id CDATA "html_rules"
>
<!ELEMENT Form EMPTY>
<!ATTLIST Form
	name CDATA #REQUIRED
	field CDATA #REQUIRED
	valuePatterns CDATA ""
	source CDATA "*"
>

<!ELEMENT Applet EMPTY>
<!ATTLIST Applet
	code CDATA #REQUIRED
	param CDATA "*"
	valuePatterns CDATA ""
	source CDATA "*"
>
<!ELEMENT JSToken (#PCDATA)>


<!ELEMENT JSRules (%jsElements;)>
<!ATTLIST JSRules
	type (%gtype; | %stype;) "GROUPED"
	id CDATA "js_rules"
>

<!ELEMENT Variable (#PCDATA)>
<!ATTLIST Variable
	name CDATA ""
	type (%eURL; | %eEXPRESSION; | %eDHTML; | %eDJS; | %eSYSTEM;) "EXPRESSION"
	source CDATA "*"
>

<!ELEMENT Function EMPTY>
<!ATTLIST Function
	name CDATA #REQUIRED
	paramPatterns CDATA #REQUIRED
	type (%eURL; | %eEXPRESSION; | %eDHTML; | %eDJS;) "EXPRESSION"
	source CDATA "*"
>


<!ELEMENT XMLRules (%xmlElements;)>
<!ATTLIST XMLRules
	type (%gtype; | %stype;) "GROUPED"
	id CDATA "xml_rules"
>
<!ELEMENT TagText EMPTY>
<!ATTLIST TagText
	tag CDATA #REQUIRED
	attributePatterns CDATA ""
	source CDATA "*"
>
<!ELEMENT Attribute EMPTY>
<!ATTLIST Attribute
	name CDATA #REQUIRED
	tag CDATA "*"
	valuePatterns CDATA ""
	type (%eURL; | %eDHTML; | %eDJS; ) "URL"
	source CDATA "*"
>

Note –

You can use * as a part of rule value except that mandatory attribute values cannot be just *. Such rules are ignored, but the message is logged in the RuleSetInfo log file. For information on this log file, see Debug File Names.


Sample XML DTD

This section contains a sample rule set. The “Case Study,” on page 140 is used to illustrate how these rules are interpreted by Rewriter.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
Rules for integrating a mail client with the gateway.
-->
<!DOCTYPE RuleSet SYSTEM "jar://rewriter.jar/resources/RuleSet.dtd">
<RuleSet type="GROUPED" id="owa">
<HTMLRules>
<Attribute name="action" />
<Attribute name="background" />
<Attribute name="codebase" />
<Attribute name="href" />
<Attribute name="src" />
<Attribute name="lowsrc" />
<Attribute name="imagePath" />
<Attribute name="viewClass" />
<Attribute name="emptyURL" />
<Attribute name="draftsURL" />
<Attribute name="folderURL" />
<Attribute name="prevMonthImage" />
<Attribute name="nextMonthImage" />
<Attribute name="style" />
<Attribute name="content" tag="meta" />
</HTMLRules>
<JSRules>
<!-- Rules for Rewriting JavaScript variables in URLs -->
<Variable name="URL"> _fr.location </Variable>
<Variable name="URL"> g_szUserBase </Variable>
<Variable name="URL"> g_szPublicFolderUrl </Variable>
<Variable name="URL"> g_szExWebDir </Variable>
<Variable name="URL"> g_szViewClassURL </Variable>
<Variable name="URL"> g_szVirtualRoot </Variable>
<Variable name="URL"> g_szBaseURL </Variable>
<Variable name="URL"> g_szURL </Variable>
<Function name="EXPRESSION" name="NavigateTo" paramPatterns="y"/>
</JSRules>
<XMLRules>
<Attribute name="xmlns"/>
<Attribute name="href" tag="a"/>
<TagText tag="baseroot" />
<TagText tag="prop2" />
<TagText tag="prop1" />
<TagText tag="img" />
<TagText tag="xsl:attribute"
attributePatterns="name=src" />
</XMLRules>
</RuleSet>

Procedure to Write Rules

The general procedure to write rules is:

Ruleset Guidelines

When creating a ruleset, keep the following in mind:

Defining the RuleSet Root Element

The ruleset root element has two attributes:

Using the Recursive Feature

Rewriter uses the recursive feature to search to the end of the matched string pattern for the same pattern.

For example, when Rewriter parses the following string:

<a href="src=abc.jpg,src=bcd.jpg,src=xyz.jpg>

the rule

<Attribute name="href" valuePatterns="*src=**"/>

rewrites only the first occurrence of the pattern, which would look like this:

<a href="src=http://jane.sun.com/abc.jpg>

If you use the recursive option

<Attribute name="href" valuePatterns="REC:*src=**"/>;

Rewriter searches to the end of the matched string pattern for the same pattern, so the output would be:

<a href="src=http://jane.sun.com/abc.jpg,src=http://jane.sun.com/bcd.jpg,src=http://jane.sun.com/xyz.jpg>