Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Portal Server 6 2005Q1 Administration Guide 

Chapter 12
Administering the Rewriter Service

This chapter describes how to administer the Rewriter service of the Sun Java System Portal Server.

This chapter includes the following sections:


Overview of the Rewriter Service

The Sun Java System Portal Server Rewriter provides an engine for performing URL translation in markup languages and JavaScript™ code. The URLScraperProvider and the XMLProvider in the Desktop and the Sun Java™ System Portal Server: Secure Remote Access gateway service all use the Rewriter service.

Rewriter scans the content of web pages and identifies the URLs it finds on those web pages. It uses a collection of rules defined in a ruleset to determine the elements of a web page to rewrite. Once Rewriter identifies a URL it can rewrite the URL by:

Expanding Relative URLs to Absolute URLs

The URLScraperProvider is part of the core Portal Server product. In a non-gateway scenario, the URLScraperProvider can be used to expand relative URLs to absolute URLs. For example, if a user is trying to access the site:

<a href=”../mypage.html”>

The Rewriter translates this to:

<a href=”http://www.yahoo.com/mail/mypage.html”>

where http://www.yahoo.com/mail/ is the base URL of the page scraped.

URLScraperProvider Limitations

The URLScraperProvider simply tries to display a designated URL in a channel. There’s no way to specify parts of a document URL (document) to display. The URLScraperProvider acts much like an HTTP client, in that it makes a request for the content of the specified URL. Just like in a browser, the target URL to scrape must be network visible, or you must have a proxy configured.

The resultant URL scraper channel, however, is not a mini-browser nor is it a frame. Therefore, if you have a link in the content, it effects the whole page, not just the channel. You should not browse inside the URL scraper channel. If you select a link within the channel the browser can interpret the link and replace the currently displayed page (your portal server Desktop) with the contents of the link location.

The appearance of the scraped channel is controlled by whatever is producing the original content. The URLScraperProvider does not modify the content at all and only displays whatever is available through the URL. Since the channel is essentially a cell in an HTML table, it can only display HTML content that is legal to appear in table cells. That is, a frameset cannot be scraped using the URLScraperProvider because a <FRAMESET> tag cannot appear within a <BODY> tag. The URLScraperProvider will also not execute JavaScript code in <HEAD> tags. Because of this, the following scraping scenarios are inappropriate for the URLScraperProvider:

When cookies are sent by the origin server, they are forwarded back every time web content is re-scraped. So the origin should get the cookies it sent as the web content scraped the first time, when portal desktop is updated or reloaded. But those cookies are not expected to be sent back when user clicks on any links in the url scraper channel.

Prefixing the Gateway URL to an Existing URL

In an implementation with a gateway such as the Sun Java System Portal Server: Secure Remote Access, the gateway acts as a proxy for the client and accesses intranet sites and returns responses to the client. The Rewriter translates URLs in downloaded pages so that they point back to the gateway rather than to the original site by prefixing the gateway URL to the existing URL.

For example, if a user tries to access an HTML page on mymachine using the following URL:

<a href=”http://mymachine.intranet.com/mypage.html>

The Rewriter prefixes this URL with a reference to the gateway as follows:

<a href=”https://gateway.company.com/http://mymachine.intranet.com/mypage.html>

When a user selects a link associated with this anchor, the browser contacts the gateway. The gateway fetches the content of mypage.html from mymachine.intranet.com.

See the Sun Java System Portal Server: Secure Remote Access 6 2005Q1 Administration Guide for more information on using the Rewriter to prefix a gateway URL to an existing URL.


Supported URLs

Rewriter supports rewriting of all standard URLs as specified by RFC-1738. These URLs are supported whether the protocol is HTTP or HTTPS and regardless of the capitalization of the protocol. For example, hTtP, HTtp, and httP are all valid. Some sample standard URLs are listed below:

http://www.my.sesta.com
http://www.example.org:8000/imaginary/test
http://www.example.edu/org/admin/people#andy
http://info.example.org/AboutUs/Index/Phonebook?dobbins
http://www.example.org/RDB/EMP?*%20where%20name%%3Dobbins
http://info.example.org/AboutUs/Phonebook
http://user:password@example.com

Rewriter supports rewriting of some basic non-standard URLs. The information to convert non-standard URLs to a standard format is taken from the base URL of the page where the URL appears and can include the protocol, host name, and path. The back slash (\) is supported only when it is part of a relative URL and not part of an absolute URL. For example, http://sesta.com\index.html is rewritten, but http:\\sesta.com is not.

In addition, URLs with a single slash (/) after the protocol or scheme such as http:/sesta.com are not rewritten.


Defining Rewriter Rules and Rulesets

The Rewriter modifies the URL portions of various elements that appear on a web page. The Rewriter comes with a default set of rules to determine the elements of a web page to rewrite. A collection of rules for various categories and subcategories is stored in a .dtd file and is called a ruleset. The Rewriter rulesets are defined in XML.

The DTD is located in /opt/SUNWps/web-src/WEB-INF/lib/rewriter.jar (resources/RuleSet.dtd). Rulesets are used to identify URLs. By default, all strings in web content starting with characters such as "/", ../ , "http" and "https" are considered to be URLs and are candidates for rewriting.

To configure the Rewriter for your implementation, you create a ruleset and define rules in the Rewriter section of the Portal Server Configuration in the administration console. See Administering the Rewriter Service for details on creating and modifying rulesets. You define multiple rules based on the content type in the web pages. For example, the rule required to rewrite HTML content would be different from the rule required to rewrite JavaScript content. Rewriter rules fall into the following broad categories:

The ruleset is an XML document and the XML within it must be properly formed. When defining rules in a ruleset, keep the following guidelines in mind:

Rules for HTML Content

HTML content in web pages can be classified into attributes, JavaScript tokens, forms, and applets. Accordingly, the rules for HTML content are classified as:

Attribute Rules for HTML Content

Attribute rules identify the basic attribute tags in HTML pages to rewrite. Rewriter modifies the various occurrences of the defined tags by expanding or prefixing the existing URL. The default ruleset rewrites the following attribute tags:

The syntax for attribute rules is:

<Attribute name="name" [tag="tag" valuePatterns="patterns"]

where name specifies the attribute, tag specifies the tag to which the attribute belongs (set to * to match all tags), and patterns specifies the possible patterns to match with the attribute. The tag and valuePatterns parameters are optional.

JavaScript Token Rules for HTML Content

Web pages can contain pure JavaScript code within the JavaScript tags, or they can contain JavaScript tokens or functions. For example, a web page can contain an onClick() function that causes a jump to a different URL. In order for the page to function properly, the value of the onClick() function needs to be translated and rewritten. In most cases, the rules provided in the default ruleset are sufficient to rewrite the URLs in JavaScript tokens. The default ruleset rewrites the following JavaScript tokens:

The syntax for JavaScript Token rules is:

<JSToken>javascript_function_name</JSToken>

where javascript_function_name is the name of the function such as onLoad or onClick.

Form Rules for HTML Content

Users can browse HTML pages that contain forms. Form elements, such as input, can take a URL as a value. The default ruleset does not rewrite any form elements. The syntax for form rules is:

<Form source="/source.html" name="form1" field="field1"> [valuePatterns="pattern"] />

where /source.html is the URL of the HTML page containing the form, form1 is the name of the form, field1 is the field of the form to be rewritten, and pattern indicates the part of the field to be rewritten. All content that follows the pattern specified is rewritten.

The valuePatterns parameter is optional.

Applet Rules for HTML Content

A single web page can contain many applets, and each applet can contain many parameters. The Rewriter rule for URLs in applets should contain pattern matching information for the following:

Rewriter matches the values specified in the rule with the content of the applet and modifies the URLs as required. This replacement is carried out at the server and not when the user is browsing the particular web page. A wildcard character (*) can also be used as part of the rule. For example, the parameter name could be *, in which case, the Rewriter does not compare the parameter name in the applet.

The default ruleset does not rewrite any applet parameters.

The syntax for applet rules is:

<Applet source="sourcehtml.jsp" code="class" param="parameter_name" [valuePatterns="pattern"]

where /sourcehtml.jsp is the URL containing the applet, class is the name of the applet class, parameter_name is the parameter whose value needs to be rewritten, and pattern indicates the part of the field to be rewritten. All content that follows the pattern specified is rewritten. The valuePatterns parameter is optional.

Rules for JavaScript Content

URLs can occur in various portions of JavaScript code. The Rewriter cannot directly parse the JavaScript code and determine the URL portion. A special set of rules needs to be written to help the JavaScript processor translate the URL.

JavaScript elements that contain URLs are classified as follows:

JavaScript Variables

JavaScript variables are again classified into five categories:

JavaScript URL Variables

URL variables have a URL string on the right hand side. The default ruleset rewrites the following JavaScript URL variables:

The syntax of URL variables in JavaScript content rules is:

<Variable type="URL">variable_name</Variable>

where variable_name is the name of the variable to be rewritten.

JavaScript EXPRESSION Variables

EXPRESSION variables have an expression on the right hand side. The result of this expression is a URL. The Rewriter appends a JavaScript function for converting the expression to the HTML page as it cannot evaluate such expressions. This function takes the expression as a parameter and evaluates it at the client browser.

The default ruleset rewrites the location JavaScript EXPRESSION variable.

The syntax of EXPRESSION variables in JavaScript content rules is:

<Variable type="EXPRESSION">variable_exp</Variable>

where variable_exp is the expression variable.

JavaScript DHTML Variables

DHTML variables are JavaScript variables that hold HTML content. The default ruleset rewrites the following JavaScript DHTML variables:

The syntax of DHTML variables in JavaScript content is:

<Variable type="DHTML">variable</Variable>

where variable is the DHTML variable.

JavaScript DJS (Dynamic JavaScript) Variables

DJS (Dynamic JavaScript) variables are JavaScript variables that hold JavaScript content.

The syntax of DJS variables in JavaScript content is:

<Variable type="DJS">variable</Variable>

where variable is the DJS variable.

The JavaScript code contained in the variable needs another rule to translate it.

JavaScript System Variables

System variables are variables that are not declared by the user, but that are available as a part of the JavaScript standard.

The default ruleset rewrites the window.location.pathname JavaScript system variable.

The syntax of system variables in JavaScript content is:

<Variable type="SYSTEM">variable</Variable>

where variable is the system variable.

JavaScript Function Parameters

Function parameters are classified into four categories:

JavaScript URL Parameters

URL parameters are string parameters that directly contain the URL.

The default ruleset rewrites the following JavaScript URL parameters:

The syntax for URL parameters is:

<Function type = "URL" name = "function" [paramPatterns="y,y,"] />

where function is the name of the function to be evaluated and y indicates the position of the parameter(s) that need to be rewritten. Parameter positions are delimited by commas. For example, in the syntax line the first and second parameters need to be rewritten, but the third parameter should not be rewritten.

JavaScript EXPRESSION Parameters

EXPRESSION parameters are variables within a function that result in a URL when they are evaluated. The syntax for EXPRESSION parameters is

<Function type = "EXPRESSION" name = "function" [paramPatterns="y,y,"] />

where function is the name of the function to be evaluated and y indicates the position of the parameter(s) that need to be rewritten. Parameter positions are delimited by commas. For example, in the syntax line the first and second parameters need to be rewritten, but the third parameter should not be rewritten.

JavaScript DHTML Parameters

DHTML parameters are native JavaScript methods that generate an HTML page dynamically. For example, the document.write() method falls under this category.

The default ruleset rewrites the following JavaScript DHTML parameters:

The syntax for DHTML parameters is:

<Function type = "DHTML" name = "function" [paramPatterns="y,y,"] />

where function is the name of the function to be evaluated and y indicates the position of the parameter(s) that need to be rewritten. Parameter positions are delimited by commas. For example, in the syntax line the first and second parameters need to be rewritten, but not the third parameter should not be rewritten.

JavaScript DJS Parameters

Dynamic JavaScript (DJS) parameters such as Cascading Style Sheets (CSS) in HTML are also translated. There are no rules defined for this translation as the URL appears only in the url() function of the CSS. The syntax for DJS parameters is:

<Function type = "DJS" name = "function" [paramPatterns="y,y,"] />

where function is the name of the function to be evaluated and y indicates the position of the parameter(s) that need to be rewritten. Parameter positions are delimited by commas. For example, in the syntax line the first and second parameters need to be rewritten, but not the third parameter should not be rewritten.

Rules for XML Content

Web pages can contain XML content which in turn can contain URLs and Rewriter can rewrite URLS in XML content.

XML content that contains URLs is classified as follows:

Tag Text in XML

Rewriter translates XML content based on the tag name.

The default ruleset rewrites the following tags in XML:

The syntax for tag text is:

<TagText tag ="attribute" attributePatterns="name=src"/>

where attribute is the name of the tag and src is the name of the attribute.

Attributes in XML

The rules for attributes in XML are similar to the rules for attributes in HTML. See Attribute Rules for HTML Content” for additional information Rewriter translates attribute values based on the attribute and tag names.

The default ruleset rewrites the following attributes in XML:

The syntax for attributes in HTML is:

<Attributes>
  <Attribute name="attribute" [valuePatterns="name=src"/>
</Attributes>

where attribute is the name of the tag and src is the name of the attribute.


Administering the Rewriter Service

In Portal Server 6, the Rewriter service uses Sun Java System Access Manager attributes to provide persistent storage for the Rewriter rulesets. A Rewriter ruleset defines how contents in a web page should be rewritten by the Rewriter. Multiple Rewriter rulesets can be defined and stored as Sun Java System Access Manager service attribute values through the Sun Java System Access Manager administration console.

You can also administer the Rewriter using the command line. See the Sun Java System Portal Server 2005Q1 Technical Reference Guide for more information on the rwadmin command.

Because the Sun Java System Access Manager administration console does not have any concept of a rewriter ruleset, Portal Server uses a customized service management plug-in module to manage them. All Rewriter rulesets are global to the organizations in Sun Java System Access Manager. There is no provision to enable the creation of ruleset at any particular organization level.


Note

The URLScraperProvider can only scrape content that is valid inside of an HTML table cell. If the HTML markup to scrape contains markup that cannot be rendered in a table cell, such as <body>, <base>, and certain JavaScript procedures, that cannot be rendered within a table cell, the display of the Desktop page can be corrupted. When defining content to scrape, try to confirm the content is valid HTML. See URLScraperProvider Limitations for further information.


To Configure the Rewriter URLScraperProvider for SSL

You can use the Rewriter’s URLScraperProvider to scrape SSL pages and rewrite the URLs for access over a secure session.

  1. Initialize the trust database in the web server administration console for the server on which you installed Portal Server as follows:
    1. From a browser, enter the following URL to access the Web Server admin page:
    2. http://servername:8088

    3. Log in as Admin and click the Security tab.
    4. Enter the Database password twice and select OK.
  2. Create a password file as follows:
    1. Change directories to /AccessManager-base/SUNWam/config.
    2. Create a hidden text file .wtpass.
    3. Type the password that you gave when you initialized the trust database.
  3. Add the following line to the /AccessManager-base/SUNWam/lib/AMConfig-instance_nickname.properties file if the root CA is not installed for the certificates used by the Web servers accessed using the URLScraperProvider.
  4. com.sun.am.jssproxy.trustAllServerCerts=true

    This option tells JSS to trust the certificate.

  5. Restart Portal Server.

To Create a New Ruleset from the Default Template


Note

For current and complete information on the Access Manager admin console, refer to the Sun Java System Access Manager 2005Q1 Administration Guide.


  1. Log in to the Sun Java System Access Manager administration console as administrator.
  2. Select Service Configuration from the location pane.
  3. Click the properties arrow next to Rewriter in the navigation pane.
  4. A list of currently defined rulesets appears in the data pane.

  5. Click New.
  6. This displays a ruleset template for possible modifications.

  7. Edit the <RuleSet id="ruleset_template"> line, replacing default_ruleset with the name for the new ruleset.
  8. Add or modify the rules within the ruleset template to rewrite URLs as necessary.
  9. Click Save to create the new ruleset.
  10. Upon success, you see the initial page and the list of all currently defined rulesets, which should include the one you just created.

To Edit an Existing Ruleset


Note

For current and complete information on the Access Manager admin console, refer to the Sun Java System Access Manager 2005Q1 Administration Guide.


  1. Log in to the Sun Java System Access Manager administration console as administrator.
  2. Select Service Configuration from the location pane.
  3. Click the properties arrow next to Rewriter in the navigation pane.
  4. A list of currently defined rulesets appears in the data pane.

  5. Click the Edit link for the ruleset to edit.
  6. This displays the XML for the ruleset to edit.

  7. Add or modify the rules within the ruleset template to rewrite URLs as necessary.
  8. If you would like to change the name of the ruleset, edit the <RuleSet id="ruleset_template"> line, replacing name with a name for the ruleset.
  9. Click Save.

To Download a Ruleset


Note

For current and complete information on the Access Manager admin console, refer to the Sun Java System Access Manager 2005Q1 Administration Guide.


Rulesets can be downloaded and saved to a file.

  1. Log in to the Sun Java System Access Manager administration console as administrator.
  2. Select Service Configuration from the location pane.
  3. Click the properties arrow next to Rewriter in the navigation pane.
  4. A list of currently defined rulesets appears in the data pane.

  5. Click the Download link for the ruleset to save to a file.
  6. Specify a name for the file and save it.

To Upload a Ruleset


Note

For current and complete information on the Access Manager admin console, refer to the Sun Java System Access Manager 2005Q1 Administration Guide.


A ruleset file can be uploaded into the system.

  1. Log in to the Sun Java System Access Manager administration console as administrator.
  2. Select Service Configuration from the location pane.
  3. Click the properties arrow next to Rewriter in the navigation pane.
  4. A list of currently defined rulesets appears in the data pane.

  5. Click the Upload link next to any ruleset in the list.
  6. Browse to or type the file name for the ruleset to upload.
  7. Click Upload.
  8. If the name defined in the <RuleSet id="ruleset_template"> line within the file matches a ruleset name on the system that ruleset file will be replaced with the contents of the file. If the name defined in the <RuleSet id="ruleset_template"> line is unique, a new ruleset will be created with that name and added to the list.

To Delete an Existing Ruleset


Note

For current and complete information on the Access Manager admin console, refer to the Sun Java System Access Manager 2005Q1 Administration Guide.


  1. Log in to the Sun Java System Access Manager administration console as administrator.
  2. Select Service Configuration from the location pane.
  3. Click the properties arrow next to Rewriter in the navigation pane.
  4. A list of currently defined rulesets appears in the data pane.

  5. Click the checkbox next to the ruleset to be deleted.
  6. You can select more than one ruleset.

  7. Click Delete.
  8. A confirmation message appears.

  9. Click Yes to delete the selected rulesets.

To Restore the Default Ruleset

In case you accidentally delete the default ruleset, you can restore it as follows:

rwadmin store --runasdn "uid=amadmin, ou=people, o=sesta.com, o=isp" --password "testing123" /resources/DefaultRuleSet.xml

where "/resources/DefaultRuleSet.xml" is the location of the ruleset stored in the rewriter.jar file.


Note

The default ruleset packaged from the installation is restored. If you have customized the default ruleset, the changes are not restored.




Previous      Contents      Index      Next     


Copyright 2005 Sun Microsystems, Inc. All rights reserved.