Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Web Proxy Server 4 Administrator's Guide 

Chapter 13
Filtering Content through the Proxy

This chapter describes how to filter URLs so that your proxy server either does not allow access to the URL or modifies the HTML and JavaScript content it returns to the client. This chapter also describes how you can restrict access through the proxy based on the web browser (user agent) that the client is using.

The proxy server lets you use a URL filter file to determine which URLs the server supports. For example, instead of manually typing in wildcard patterns of URLs to support, you can create or purchase one text file that contains URLs you want to restrict. This feature lets you create one file of URLs that you can use on many different proxy servers.

You can also filter URLs based on their MIME type. For example, you might allow the proxy to cache and send HTML and GIF files but not allow it to get binary or executable files because of the risk of computer viruses.

This chapter contains the following sections:


Filtering URLs

You can use a file of URLs to configure what content the proxy server retrieves. You can set up a list of URLs the proxy always supports and a list of URLs the proxy never supports.

For example, if you are an Internet service provider who runs a proxy server with content appropriate for children, you might set up a list of URLs that are approved for viewing by children. You can then have the proxy server retrieve only the approved URLs; if a client tries to go to an unsupported URL, either you can have the proxy return the default “Forbidden” message or you can create a custom message explaining why the client could not access that URL.

To restrict access based on URLs, you need to create a file of URLs to allow or restrict. You can do this through the Server Manager. Once you have the file, you can set up the restrictions. These processes are discussed in the following sections.

Creating a Filter File of URLs

A filter file is a file that contains a list of URLs. The filter files the proxy server uses are plain text files with lines of URLs in the following pattern:

protocol://host:port/path/filename

You can use regular expressions in each of the three sections: protocol, host:port, and path/filename. For example, if you want to create a URL pattern for all protocols going to the netscape.com domain, you’d have the following line in your file:

.*://.*\.example\.com/.*

This line works only if you do not specify a port number. For more information on regular expressions, see “Understanding Regular Expressions” in Managing Templates and Resources..

If you want to create your own file without using the Server Manager, you should use the Server Manager pages to create an empty file, and then add your text in that file or replace the file with one containing the regular expressions.

To create a filter file
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Restrict URL Filter Access link. The Restrict URL Filter Access page displays.
  3. Choose New Filter from the drop-down list next to the Create/Edit button.
  4. Type a name for the filter file in the text box to the right of the drop-down list and then click the Create/Edit button. The Filter Editor page displays.
  5. Use the Filter Content scrollable text box to enter URLs and regular expressions of URLs. The Reset button clears all the text in this field.
  6. For more information on regular expressions, see “Understanding Regular Expressions” in Managing Templates and Resources..

  7. Click OK.

The proxy server creates the file and returns you to the Restrict URL Filter Access page. The filter file is created in the proxy-serverid/conf_bk directory.

Setting Default Access for a Filter File

Once you have a filter file that contains the URLs you want to use, you can set the default access for those URLs.

To set default access for a filter file
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Restrict URL Filter Access link. The Restrict URL Filter Access page displays.
  3. Choose the template you want to use with the filters.
  4. Typically, you will want to create filter files for the entire proxy server, but you might want one set of filter files for HTTP and another for FTP.

  5. Use the URL Filter To Allow list to choose a filter file that contains the URLs you want the proxy server to support.
  6. Use the URL Filter To Deny list to choose a filter file that contains the URLs to which you want the proxy server to deny access.
  7. Choose the text you want the proxy server to return to clients who request a denied URL. You can choose one of two options:
    • You can send the default “Forbidden” response that the proxy generates.
    • You can send a text or HTML file with customized text. Type the absolute path to this file in the text box.
  8. Click OK.
  9. Click Restart Required. The Apply Changes page displays.
  10. Click the Restart Proxy Server button to apply the changes.


Content URL Rewriting

Proxy Server 4 has the ability to inspect the content being returned to the client and replace patterns (such as URLs) with other strings. There are two parameters that can be configured - a source string and a destination string. The Proxy Server looks for text matching the source string and substitutes it with the text in the destination string. This feature works only in the reverse proxy mode.

To create a URL rewriting pattern
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Set Content URL Rewriting link. The Set Content URL Rewriting page displays.
  3. Select a resource from the drop-down list or specify a regular expression. For more information on regular expressions, see “Understanding Regular Expressions” in Managing Templates and Resources..
  4. Specify the source string in the Source Pattern text box.
  5. Specify the destination string in the Destination Pattern text box.
  6. Specify the content type in the MIME Pattern text box.
  7. Click OK.
  8. Click Restart Required. The Apply Changes page displays.
  9. Click the Restart Proxy Server button to apply the changes.
To edit a URL rewriting pattern
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Set Content URL Rewriting link. The Set Content URL Rewriting page displays.
  3. Click the Edit link next to the URL rewriting pattern you want to edit.
  4. Click OK
  5. Click Restart Required. The Apply Changes page displays.
  6. Click the Restart Proxy Server button to apply the changes.
To delete a URL rewriting pattern
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Set Content URL Rewriting link. The Set Content URL Rewriting page displays.
  3. Click the Remove link next to the URL rewriting pattern you want to delete. Click OK to confirm deletion.
  4. Click Restart Required. The Apply Changes page displays.
  5. Click the Restart Proxy Server button to apply the changes.


Restricting Access to Specific Web Browsers

You can restrict access to the proxy server based on the type and version of the client’s web browser. Restriction occurs based on the user-agent header that all web browsers send to servers when making requests.

To restrict access to the proxy based on the client’s web browser
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Set User-Agent Restriction link. The Set User-Agent Restriction page displays.
  3. Select the resource from the drop-down list or type a regular expression that matches the user-agent string for the browsers you want the Proxy Server to support. If you want to specify more than one client, enclose the regular expression in parentheses and use the | character to separate the multiple entries. For more information on regular expressions, see “Understanding Regular Expressions” in Managing Templates and Resources..
  4. Check the Allow Only User-Agents Matching option.
  5. Click OK.
  6. Click Restart Required. The Apply Changes page displays.
  7. Click the Restart Proxy Server button to apply the changes.


Blocking Requests

You may want to block file uploads and other requests based on the upload content type.

To block requests based on MIME type
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Set Request Blocking link. The Set Request Blocking page displays.
  3. Select the resource from the drop-down list or click the Regular Expression button, enter a regular expression and click OK.
  4. Click the radio button for the type of request blocking you want. The options include the following:
    • Disabled - disables request blocking
    • Multipart MIME (File Upload) - blocks all file uploads
    • MIME Types Matching Regular Expression - blocks requests for MIME types that match the regular expression you enter. For more information on regular expressions, see “Understanding Regular Expressions” in Managing Templates and Resources..
  5. Choose whether you want to block requests for all clients or for user-agents that match a regular expression you enter.
  6. Click the radio button for the methods for which you want to block requests. The options are:
    • Any Method With Request Body - blocks all requests with a request body, regardless of the method
    • only for:
      POST - blocks file upload requests using the POST method
      PUT - blocks file upload requests using the PUT method
    • Methods Matching Regular Expression- blocks all file upload requests using the method you enter
  7. Click OK.
  8. Click Restart Required. The Apply Changes page displays.
  9. Click the Restart Proxy Server button to apply the changes.


Suppressing Outgoing Headers

You can configure the proxy server to remove outgoing headers from the request (usually for security reasons). For example, you might want to prevent the From header from going out because it reveals the user’s email address or, you might want to filter out the user-agent header so external servers cannot determine what web browsers your organization uses. You may also want to remove logging or client-related headers that are to be used only in your intranet before a request is forwarded to the Internet.

This feature does not affect headers that are specially handled or generated by the proxy itself or that are necessary to make the protocol work properly (such as If-Modified-Since and Forwarded).

Although it is not possible to stop the forwarded header from originating from a proxy, this is not a security problem. The remote server can detect the connecting proxy host from the connection. In a proxy chain, a forwarded header coming from an inner proxy can be suppressed by an outer proxy. Setting your servers up this way is recommended when you do not want to have the inner proxy or client host name revealed to the remote server.

To suppress outgoing headers
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Suppress Outgoing Headers link. The Suppress Outgoing Headers page displays.
  3. Enter a comma separated list of request headers to be suppressed in the Suppress Headers text box. For example, to suppress the From and User-Agent headers, type from,user-agent. The headers you type are not case-sensitive. For more information on regular expressions, see “Understanding Regular Expressions” in Managing Templates and Resources..
  4. Click Restart Required. The Apply Changes page displays.
  5. Click the Restart Proxy Server button to apply the changes.


Filtering by MIME Type

You can configure the proxy server to block certain files that match a MIME type. For example, you could set up your proxy server to block any executable or binary files so that any clients using your proxy server can’t download a possible computer virus.

If you want the proxy server to support a new MIME type, in the Server Manager, choose Preferences > Create/Edit MIME Types and add the type. For more information on creating a MIME type, see Creating a New MIME Type.

You can combine filtering MIME types with templates, so that only certain MIME types are blocked for specific URLs. For example, you could block executables coming from any computer in the .edu domain.

To filter by MIME type
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Set MIME Filters link. The Set MIME Filters page displays.
  3. Choose the template you want to use for filtering MIME types, or make sure you are editing the entire server.  
  4. In the Current filter text box, you can type a regular expression that matches the MIME types you want to block.
  5. For example, to filter out all applications, you could type application/.* for the regular expression. This is faster than checking each MIME type for every application type. The regular expression is not case-sensitive. For more information on regular expressions, see “Understanding Regular Expressions” in Managing Templates and Resources..

  6. Check the MIME types you want to filter. When a client attempts to access a file that is blocked, the proxy server returns a “403 Forbidden” message.
  7. Click OK.
  8. Click Restart Required. The Apply Changes page displays.
  9. Click the Restart Proxy Server button to apply the changes.


Filtering by HTML Tags

The proxy server lets you specify HTML tags you want to filter out before passing the file to the client. This lets you filter out objects such as Java applets and JavaScript embedded in the HTML file. To filter HTML tags, you specify the beginning and ending HTML tags. The proxy then substitutes blanks for all text and objects in those tags before sending the file to the client.


Note

The proxy stores the original (unedited) file in the cache, if the proxy is configured to cache that resource.


To filter out HTML tags
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Set HTML Tag Filters link. The Set HTML Tag Filters page displays.
  3. Choose the template you want to modify. You might choose HTTP, or you might choose a template that specifies only certain URLs such as those from hosts in the .edu domain.
  4. Check the filter box for any of the default HTML tags you want to filter. The default tags include the following:
    • APPLET usually surrounds Java applets.
    • SCRIPT indicates the start of JavaScript code.
    • IMG specifies an inline image file.
  5. You can enter any HTML tags you want to filter. Type the beginning and ending HTML tags.
  6. For example, to filter out forms, you could type FORM in the Start Tag box (the HTML tags are not case-sensitive) and /FORM in the End Tag box. If the tag you want to filter does not have an end tag, such as OBJECT and IMG, you can leave the End Tag box empty.

  7. Click OK.
  8. Click Restart Required. The Apply Changes page displays.
  9. Click the Restart Proxy Server button to apply the changes.


Configuring the Server for Content Compression

Proxy Server supports HTTP content compression. Content compression allows you to increase delivery speed to clients and serve higher content volumes without increasing your hardware expenses. Content compression reduces content download time, a benefit most apparent to users of dialup and high-traffic connections.

With content compression, your Proxy Server sends out compressed data and instructs the browser to decompress the data on the fly, thus reducing the amount of data sent and increasing page display speed.

Configuring the Server to Compress Content on Demand

You can configure the Proxy Server to compresses transmission data on the fly. A dynamically generated HTML page does not exist until a user asks for it.

To configure your server to compress content on demand
  1. Access the Server Manager, and click the Filters tab.
  2. Click the Compress Content on Demand link. The Compress Content on Demand page displays.
  3. Select the resource from the drop-down list or specify a regular expression. For more information on regular expressions, see “Understanding Regular Expressions” in Managing Templates and Resources..
  4. Specify the following information:
    • Activate Compress Content on Demand? Choose whether the server should serve precompressed content for the selected resource.
    • Vary Header. Specify whether to insert a Vary: Accept-encoding header. Select either yes or no.If set to yes, then a Vary: Accept-encoding header is always inserted when a compressed version of a file is selected.
    • If set to no, then a Vary: Accept-encoding header is never inserted.

      By default, the value is set to yes.

    • Fragment Size. Specifies the memory fragment size in bytes to be used by the compression library (zlib) to control how much to compress at a time. The default value is 8096.
    • Compression Level. Specifies the level of compression. Choose a value between 1 and 9. The value 1 yields the best speed; the value 9 the best compression. The default value is 6, a compromise between speed and compression.
  5. Click OK.
  6. Click Restart Required. The Apply Changes page displays.
  7. Click the Restart Proxy Server button to apply the changes.


Previous      Contents      Index      Next     


Part No: 819-0908-10.   Copyright 2005 Sun Microsystems, Inc. All rights reserved.