The Rewriter component of Secure Remote Access enables end users to browse the intranet by modifying Uniform Resource Identifier (URI) references on web pages so that they point to the Gateway. A URI defines a way to encapsulate a name in any registered name space, and labels it with the name space. The most common kinds of URIs are Uniform Resource Locators (URLs). Rewriter supports only HTTP or HTTPS and this support exists regardless of the capitalization of the protocol. Rewriter only supports backslashes when they are part of a relative URL.
http://abc.sesta.com\\index.html is rewritten.
These URLs are not rewritten: http:\\\\abc.sesta.com. http:/abc.com
HTTP standards require that HTTP headers or HTML meta tags specify a character set for web pages. However, sometimes this information is not available. The character set must be known so that encoding for the data is set and the data is displayed as intended by the creator.
Sun Microsystems provides a third-party product to detect the character sets. To enable this product, install the SUNWjchdt package from the Java™ Enterprise System Accessory CD. If the product is installed, Rewriter will detect it and use it if necessary.
Using this product can impact performance, therefore you should install it only when required. Please see the jcharset_readme.txt for details on installation, configuration and usage.
When a user tries to access intranet web pages through the Gateway, web pages are made available by using Rewriter. Rewriter is used by these components:
The URL Scraper provider gets content from the configured URIs and before sending them to the browser, it expands all relative URIs to absolute URIs.
For example, if a user is trying to access the site with content as:
<a href="../mypage.html">
Rewriter translates this to:
<a href="http://yahoo.com/mypage.html">
where http://yahoo.com/test/ is the base URL of the page.
See the Sun Java SystemPortal Server Administration Guide for details on the URLScraper provider.
The Gateway obtains content from internet portals and before sending the content to the browser, it prefixes the Gateway URI to the existing URI so that subsequent URI requests from the browser can reach the Gateway.
For example, a user who is trying to access an HTML page on an internet machine with content as:
<a href="http://mymachine.intranet.com/mypage.html>"
Rewriter prefixes this URL with a reference to the Gateway as follows:
<a href="https://gateway.company.com/http://mymachine.intranet.com/ mypage.html>"
When the user clicks a link associated with this anchor, the browser contacts the Gateway. The Gateway fetches the content of mypage.html from mymachine.intranet.com.
The Gateway uses several rules to determine the elements of a fetched web page that will be rewritten.
For details on defining a ruleset, see the Portal Server Administration Guide. After creating a new ruleset, you need to define the required rules.
This section covers the following topics:
RuleSet DTD:
<?xml version="1.0" encoding="UTF-8"?> <!-- The following constraints are not represented in DTD, but taken care programatically 1. In a Rule, All Mandatory attributes cannot be "*". 2. Only one instance of the below elements is allowed, but in any order. 1)HTMLRules 2)JSRules 3)XMLRules 3. ID should alway be in lower case. --> <!ENTITY % eURL ’URL’> <!ENTITY % eEXPRESSION ’EXPRESSION’> <!ENTITY % eDHTML ’DHTML’> <!ENTITY % eDJS ’DJS’> <!ENTITY % eSYSTEM ’SYSTEM’> <!ENTITY % ruleSetElements ’(HTMLRules | JSRules | XMLRules)?’> <!ENTITY % htmlElements ’(Form | Applet | Attribute)*’> <!ENTITY % jsElements ’(Variable | Function)*’> <!ENTITY % xmlElements ’(Attribute | TagText)*’> <!ELEMENT RuleSet (%ruleSetElements;,%ruleSetElements;,%ruleSetElements;)> <!ATTLIST RuleSet id ID #REQUIRED extends CDATA "none" > <!-- Rules for identifying rules in HTML content --> <!ELEMENT HTMLRules (%htmlElements;)> <!ELEMENT Form EMPTY> <!ATTLIST Form name CDATA #REQUIRED field CDATA #REQUIRED valuePatterns CDATA "" source CDATA "*" > <!ELEMENT Applet EMPTY> <!ATTLIST Applet code CDATA #REQUIRED param CDATA "*" valuePatterns CDATA "" source CDATA "*" > <!-- Rules for identifying rules in JS content --> <!ELEMENT JSRules (%jsElements;)> <!ELEMENT Variable EMPTY> <!ATTLIST Variable name CDATA #REQUIRED type (%eURL; | %eEXPRESSION; | %eDHTML; | %eDJS; | %eSYSTEM;) "EXPRESSION" source CDATA "*" > <!ELEMENT Function EMPTY> <!ATTLIST Function name CDATA #REQUIRED paramPatterns CDATA #REQUIRED type (%eURL; | %eEXPRESSION; | %eDHTML; | %eDJS;) "EXPRESSION" source CDATA "*" > <!-- Rules for identifying rules in XML content --> <!ELEMENT XMLRules (%xmlElements;)> <!ELEMENT TagText EMPTY> <!ATTLIST TagText tag CDATA #REQUIRED attributePatterns CDATA "" source CDATA "*" > <!ELEMENT Attribute EMPTY> <!ATTLIST Attribute name CDATA #REQUIRED tag CDATA "*" valuePatterns CDATA "" type (%eURL; | %eDHTML; | %eDJS; ) "URL" source CDATA "*" >
You can use * as a part of the rule value. But all the mandatory attribute values cannot be just *. Such rules are ignored, but the message is logged in the RuleSetInfo log file. For information on this log file, see Debug File Names.
This section contains a sample rule set. The “Case Study,” on page 140 is used to illustrate how these rules are interpreted by Rewriter.
<?xml version="1.0" encoding="ISO-8859-1"?> <!-- Rules for integrating a mail client with the gateway. --> <!DOCTYPE RuleSet SYSTEM "jar://rewriter.jar/resources/RuleSet.dtd"> <RuleSet type="GROUPED" id="owa"> <HTMLRules> <Attribute name="action" /> <Attribute name="background" /> <Attribute name="codebase" /> <Attribute name="href" /> <Attribute name="src" /> <Attribute name="lowsrc" /> <Attribute name="imagePath" /> <Attribute name="viewClass" /> <Attribute name="emptyURL" /> <Attribute name="draftsURL" /> <Attribute name="folderURL" /> <Attribute name="prevMonthImage" /> <Attribute name="nextMonthImage" /> <Attribute name="style" /> <Attribute name="content" tag="meta" /> </HTMLRules> <JSRules> <!-- Rules for Rewriting JavaScript variables in URLs --> <Variable name="URL"> _fr.location </Variable> <Variable name="URL"> g_szUserBase </Variable> <Variable name="URL"> g_szPublicFolderUrl </Variable> <Variable name="URL"> g_szExWebDir </Variable> <Variable name="URL"> g_szViewClassURL </Variable> <Variable name="URL"> g_szVirtualRoot </Variable> <Variable name="URL"> g_szBaseURL </Variable> <Variable name="URL"> g_szURL </Variable> <Function name="EXPRESSION" name="NavigateTo" paramPatterns="y"/> </JSRules> <XMLRules> <Attribute name="xmlns"/> <Attribute name="href" tag="a"/> <TagText tag="baseroot" /> <TagText tag="prop2" /> <TagText tag="prop1" /> <TagText tag="img" /> <TagText tag="xsl:attribute" attributePatterns="name=src" /> </XMLRules> </RuleSet>
Listed below is a general procedure that you can follow to write the rules.
Identify the directories that contain the HTML pages whose content needs to be rewritten.
In these directories, identify the pages that need to be rewritten.
Identify the URLs that need to be rewritten on each page. An easy way identify most of the URLs is to search for "http" and "/".
Identify the content type of the URL: HTML, JavaScript or XML.
Write the rule required to rewrite each of these URLs by editing the required ruleset in the Rewriter service under Portal Server Configuration in the Access Manager administration console.
Combine all these rules into a ruleset for that domain.
Keep the following in mind:
The order of precedence for specific hosts is based on the longest URI match. For example for the following rulesets
mail1.central.abc.com|iplanet_mail_ruleset *.sfbay.abc.com|sfbay_ruleset *.abc.com|generic_ruleset
sfbay_ruleset is used as it has the longest match.
The rules in the ruleset are applied in order to each statement in the page, until a rule matches a particular statement.
While writing the rules, keep in mind the order of the rules. Rules are applied to the statements in a page, in the order in which they occur in the ruleset. If you have specific rules, and general rules that contain a "*", define the specific rules first, then the general rules. Otherwise, the general rule is applied to all statements, even before the specific rule is encountered.
All rules need to be enclosed within the <RuleSet> </RuleSet> tags.
Include all rules that need to rewrite HTML content in the <HTMLRules> </HTMLRules> section of the ruleset.
Include all rules that need to rewrite JavaScript content in the <JSRules> </JSRules> section of the ruleset.
Include all rules that need to rewrite XML content in the <XMLRules> </XMLRules> section of the ruleset.
In your intranet pages, identify the URLs that need to be rewritten, and include the required rules in the appropriate sections (HTML, JSRules, or XMLRules) of the ruleset.
Assign the ruleset to the required domain.
Restart the Gateway to affect any changes:
gateway-install-root/SUNWportal/bin/gateway -n gateway-profile-name start
The ruleset root element has two attributes:
RuleSetName. For example, default_ruleset. This name is referened in RuleSet to URI mapping.
Extends. This attribute refers to the inheritance feature of rulesets. An extends value points to the ruleset from which you would like to derive a ruleset.
Use the extends value none to signify that this new, independent ruleset does not depend on any other ruleset, or specify your RuleSetName to signify that your ruleset depends on another ruleset.
Rewriter uses the recursive feature to search to the end of the matched string pattern for the same pattern.
For example, when Rewriter parses the following string:
<a href="src=abc.jpg,src=bcd.jpg,src=xyz.jpg>
the rule
<Attribute name="href" valuePatterns="*src=**"/>
rewrites only the first occurrence of the pattern and it would look like this:
<a href="src=http://jane.sun.com/abc.jpg>
but if you use the recursive option as,
<Attribute name="href" valuePatterns="REC:*src=**"/>;
Rewriter searches to the end of the matched string pattern for the same pattern, hence the output would be:
<a href="src=http://jane.sun.com/abc.jpg,src=http://jane.sun.com/bcd.jpg,src=http://jane.sun.com/xyz.jpg>