Oracle iPlanet Web Proxy Server 4.0.14 Administration Guide

Chapter 12 Caching

This chapter describes how Proxy Server caches documents. It also describes how you can configure the cache by using the online pages.

This chapter contains the following sections:

How Caching Works

Caching reduces network traffic and offers faster response time for clients that are using the proxy server instead of going directly to remote servers.

When a client requests a web page or document from the proxy server, the proxy server copies the document from the remote server to its local cache directory structure while sending the document to the client.

When a client requests a document that was previously requested and copied into the proxy cache, the proxy returns the document from the cache instead of retrieving the document from the remote server again as shown in the following figure. If the proxy determines that the file is not up to date, the proxy refreshes the document from the remote server and updates its cache before sending the document to the client.

Figure 12–1 Proxy Document Retrieval

Diagram showing a client requesting a document and the
proxy server sending the document from cache

Files in the cache are automatically maintained by the Proxy Server garbage collection utility (CacheGC). The CacheGC automatically cleans the cache on a regular basis to ensure that the cache does not get cluttered with out-of-date documents.

Understanding the Cache Structure

A cache consists of one or more partitions. Conceptually, a partition is a storage area on a disk that you set aside for caching. If you want to have your cache span several disks, configure at least one cache partition for each disk. Each partition can be independently administered. In other words, you can enable, disable, and configure a partition independently of all other partitions.

Storing a large number of cached files in a single location can slow performance; therefore, create several directories, or sections, in each partition. Sections are the next level under partitions in the cache structure. You can have up to 256 sections in your cache across all partitions. The number of cache sections must be a power of 2 (for example, 1, 2, 4, 8, 16, ..., 256).

The final level in the cache structure hierarchy is the subsection. Subsections are directories within sections. Each section has 64 subsections. Cached files are stored in the subsections which is the lowest level in your cache.

The following figure shows an example cache structure with partitions and sections. In this figure, the cache directory structure divides the total cache into three partitions. The first partition contains four cache sections, and the second two partitions each contain two sections.

Each cache section is noted by “s” for section, and then a section number. For the section shown as s3.4, the 3 indicates the power of 2 for the number of cache sections (2³ = 8), and the 4 means the number for the section (for the 8 sections labeled 0 through 7). Therefore, s3.4 means section 5 of 8.

Figure 12–2 Example of a Cache Structure

Diagram showing an example cache directory where the
total cache is divided into three partitions.

Distributing Files in the Cache

The Proxy Server uses a specific algorithm to determine the directory where a document should be stored. This algorithm ensures equal distribution of documents in the directories. Equal distribution is important because directories with large numbers of documents tend to cause performance problems.

The Proxy Server uses the RSA MD5 algorithm (Message Digest 5) to reduce the URL to 16 bytes of binary data and uses 8 bytes of this data to calculate a 16-character hexadecimal file name that is used to store the document in the cache.

Setting Cache Specifics

You can enable caching and control which types of protocols your Proxy Server will cache by setting the cache specifics. Cache specifics include the following items:

Whether your cache is enabled or disabled
The working directory where the cache stores its temporary files
The name of the directory in which you will record the cached URLs
The size of the cache
The capacity of the cache
What types of protocols will be cached
When to refresh a cached document
Whether the proxy should track the number of times a document is accessed and report that value back to the remote server

Note –

Setting the specifics for a large cache is time-consuming and may cause the administration interface to time-out. Therefore, if you are creating a large cache, use the command line utilities to set cache specifics. For more information on the cache command line utilities, see Using the Cache Command-Line Interface.

To Set Cache Specifics

Access the Server Manager, and click the Caching tab.

Click the Set Cache Specifics link.

The Set Cache Specifics page is displayed.

You can enable or disable the cache by selecting the appropriate option.

The cache is enabled by default.

Provide the working directory.

By default the working directory is present under the proxy instance. This location can be changed. For more information, see Creating a Cache Working Directory.

Click the partition configuration link.

The Add/Edit Cache Partitions page is displayed. You can add a new cache partition or edit existing cache paritions. Cache size is the maximum size the cache is allowed to grow. The maximum cache size is 32Gbytes. For more information, see Setting Cache Size.

Click the cache capacity configuration link.

The Set Cache Capacity page is displayed. You can set the cache capacity on the Set Cache Capacity page.

Select the Cache HTTP to enable caching of HTTP documents.

If you decide that you want your proxy server to cache HTTP documents, determine whether it should always do an up -to-date check for the documents in the cache or whether it should check based on an interval. You can also enable or disable the Proxy Server from reporting cache hits to the remote server. For more information, see Caching HTTP Documents. The available options are:
- Select the Always Check That The Document Is Up To Date option to ensure that the HTTP document is always up-to- date.
  - Select the number of hours from the Check Only If Last Check More Than drop-down list to specify the refresh interval for the proxy server. The up-to-date check is performed using any one of the following options:
    - Use Last-modified Factor. The last modified header that is sent by the origin server along with the document.
    - Use Only Explicit Expiration Information. The proxy server uses the Expires header to decide if the cache entry is fresh or stale.
    Select the Never Report Accesses To Remote Server option to prevent the proxy server from reporting the number of accesses to the remote server.
  - Select the Report Cache Hits To Remote Server option to track the number of times a document was accessed and report it back to the remote server.

Set the refresh interval for cached FTP documents by selecting the Yes; Reload If Older Than checkbox and also set the time interval by selecting the value from the drop-down list. For more information, see Caching FTP and Gopher Documents.

You can set the refresh interval for cached Gopher documents. Select the Yes; Reload If Older Than checkbox and also set the time interval by selecting the value from the drop-down list. For more information, see Caching FTP and Gopher Documents.

Click OK.

Click Restart required. The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Creating a Cache Working Directory

The cache files are under cache partitions. The working directory you specify on the Set Cache Specifics page is often the parent directory for the cache. All cached files appear in an organized directory structure under the caching directory. If you change the cache directory name or move it to another location, you have to provide the proxy with the new location.

You can extend the cache directory structure to multiple file systems so that you can have a large cache structure divided on multiple smaller disks instead of keeping it all on one large disk. Each proxy server must have its own cache directory structure, that is, cache directories cannot be concurrently shared by multiple proxy servers.

Setting Cache Size

The cache size indicates the partition size. Cache size should always be less than the cache capacity as it is the maximum size to which the cache can grow. The sum of all the partition sizes must be less than or equal to the cache size.

The amount of disk space available for the proxy cache has a considerable effect on cache performance. If the cache is too small, the Cache GC must remove cached documents to make room on the disk more often, and documents must be retrieved from content servers more often. These activities slow performance.

Large cache sizes are more efficient because the more cached documents, the less the network traffic load and the faster the response time the proxy provides. Also, the GC removes cached documents if users no longer need them. Barring any file system limitations, cache size can never be too large. The excess space simply remains unused.

You can also have the cache split on multiple disk partitions.

Caching HTTP Documents

HTTP documents offer caching features that documents of the other protocols do not. However, by setting up and configuring the cache properly, you can ensure that your Proxy Server will cache HTTP, FTP, and Gopher documents effectively.

Note –

Proxy Server 4 does not support caching HTTPS documents.

All HTTP documents have a descriptive header section that the Proxy Server uses to compare and evaluate the document in the proxy cache and the document on the remote server. When the proxy does an up-to-date check on an HTTP document, the proxy sends one request to the server that tells the server to return the document if the version in the cache is out of date. Often, the document has not changed since the last request and therefore is not transferred. This method of checking to see if an HTTP document is up-to-date saves bandwidth and decreases latency.

To reduce transactions with remote servers, the Proxy Server enables you to set a Cache Expiration setting for HTTP documents. The Cache Expiration setting provides information to the proxy to estimate whether the HTTP document needs an up-to-date check before sending the request to the server. The proxy makes this estimate based on the HTTP document’s Last-Modified date found in the header.

With HTTP documents, you can also use a Cache Refresh setting. This option specifies whether the proxy always does an up-to-date check, which would override an Expiration setting or whether the proxy waits a specific period of time before doing a check. The following table shows what the proxy does if both an Expiration setting and a Refresh setting are specified. Using the Refresh setting decreases latency and saves bandwidth considerably.

Table 12–1 Using the Cache Expiration and Cache Refresh settings With HTTP


Refresh setting	Expiration setting	Results
Always do an up-to-date check	(Not applicable)	Always do an up-to-date check
User-specified interval	Use document’s “expires” header	Do an up-to-date check if interval expired
	Estimate with document’s Last-Modified header	Smaller value* of the estimate and expires header

Note –

* Using the smaller value guards against getting stale data from the cache for documents that change frequently.

Setting the HTTP Cache Refresh Interval

If you decide that you want your Proxy Server to cache HTTP documents, determine whether it should always do an up-to-date check for documents in the cache or whether it should check based on a Cache Refresh setting (up-to-date check interval). For HTTP documents, a reasonable refresh interval would be four to eight hours, for example. The longer the refresh interval, the fewer the number of times the proxy connects with remote servers. Even though the proxy does not do up-to-date checking during the refresh interval, users can force a refresh by clicking the Reload button in the client. This action makes the proxy force an up-to-date check with the remote server.

You can set the refresh interval for HTTP documents on either the Set Cache Specifics page or the Set Caching Configuration page. The Set Cache Specifics page enables you to configure global caching procedures, and the Set Caching Configuration page enables you to control caching procedures for specific URLs and resources.

Setting the HTTP Cache Expiration Policy

You can also set up your server to check if the cached document is up-to-date by using a last-modified factor or explicit expiration information only.

Explicit expiration information is a header found in some HTTP documents that specifies the date and time when that file will become outdated. Not many HTTP documents use explicit Expires headers, so you should estimate based on the Last-modified header.

If you decide to have your HTTP documents cached based upon the Last-modified header, you need to select a fraction to use in the expiration estimation. This fraction, known as the LM factor, is multiplied by the interval between the last modification and the time that the last up-to-date check was performed on the document. The resulting number is compared with the time since the last up-to-date check. If the number is smaller than the time interval, the document is not expired. Smaller fractions make the proxy check documents more often.

For example, suppose you have a document that was last changed ten days ago. If you set the last-modified factor to 0.1, the proxy interprets the factor to mean that the document is probably going to remain unchanged for one day (10 * 0.1 = 1). The proxy would, in that case, return the document from the cache if the document was checked less than a day ago.

In this same example, if the cache refresh setting for HTTP documents is set to less than one day, the proxy does the up-to-date check more than once a day. The proxy always uses the value, cache refresh or cache expiration, that requires the more frequent update.

You can set the expiration setting for HTTP documents on either the Set Cache Specifics page or the Set Caching Configuration page. The Set Cache Specifics page enables you to configure global caching procedures and the Set Caching Configuration page enables you to control caching procedures, for specific URLs and resources.

Reporting HTTP Accesses to the Remote Server

When a document is cached by Proxy Server, it can be accessed many times before it is refreshed again. For the remote server, sending one copy to the proxy that will cache it represents only one access, or “hit.” The Proxy Server can count how many times a given document is accessed from the proxy cache between up-to-date checks and then send that hit count back to the remote server in an additional HTTP request header (Cache-Info) the next time the document is refreshed. This way, if the remote server is configured to recognize this type of header, it receives a more accurate account of how many times a document is accessed.

Caching FTP and Gopher Documents

FTP and Gopher do not include a method for checking to see whether a document is up-to-date. Therefore, the only way to optimize caching for FTP and Gopher documents is to set a Cache Refresh interval. The Cache Refresh interval is the amount of time the Proxy Server waits before retrieving the latest version of the document from the remote server. If you do not set a Cache Refresh interval, the proxy will retrieve these documents even if the versions in the cache are up to date.

If you are setting a cache refresh interval for FTP and Gopher, choose one that you consider safe for the documents the proxy gets. For example, if you store information that rarely changes, use a high number for several days. If the data changes constantly, you will want the files to be retrieved at least every few hours. During the refresh time, you risk sending an out-of-date file to the client. If the interval is short enough, for example, a few hours, you eliminate most of this risk while getting noticeably faster response time.

You can set the cache refresh interval for FTP and Gopher documents on either the Set Cache Specifics page or the Set Caching Configuration page. The Set Cache Specifics page enables you to configure global caching procedures, and the Set Caching Configuration page enables you to control caching procedures for specific URLs and resources. For more information about using the Set Cache Specifics page, see Setting Cache Specifics. For more information about using the Set Caching Configuration page, see Configuring the Cache.

Note –

If your FTP and Gopher documents vary widely (some change often, others rarely), use the Set Caching Configuration page to create a separate template for each kind of document (for example, create a template with resources ftp://.*.gif) and then set a refresh interval that is appropriate for that resource.

Creating and Modifying a Cache

Cache partitions are reserved parts of disks or memory that are set aside for caching purposes. If your caching capacity changes, you may want to change or add partitions.

To Add Cache Partitions

Access the Server Manager, and click the Caching tab.

Click the Add/Edit Cache Partitions link.

The Add/Edit Cache Partitions page is displayed.

Click the Add Cache Partition button.

The Cache Partition Configuration page is displayed.

Provide the appropriate values for the new partition.

Click OK.

Click Restart Required.

The Apply changes page is displayed.

Click Restart Proxy Server button to apply the changes

To Modify Cache Partitions

Access the Server Manager, and click the Caching tab.

Click the Add/Edit Cache Partitions link.

The Add/Edit Cache Partitions page is displayed.

Click on the name of the partition that you would like to change.

Edit the information.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Setting Cache Capacity

Cache capacity value is used to derive the cache directory structure. The number of sections that can be in the cache directory is derived from the cache capacity. Cache capacity is directly related to the cache hierarchy in the cache directories. The bigger the capacity, the larger the hierarchy. The cache capacity should be equal to or greater than the cache size. Setting the capacity larger than the cache size can be helpful if you know that you plan to increase the cache size later (such as by adding an external disk). The cache capacity can be of maximum 128 GB which will create 256 sections.

To set the cache capacity

Access the Server Manager, and click the Caching tab.

Click the Set Cache Capacity link.

The Set Cache Capacity page is displayed.

Choose a capacity from the New Capacity Range drop-down list.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Managing Cache Sections

The proxy cache is separated into one or more cache sections. You can have up to 256 sections. The number of cache sections must be a power of two (for example, 1, 2, 4, 8, 16, ..., 256). The largest capacity is 32 Gbytes (optimum) with 256 cache sections.

If you pick a cache capacity of 500 Mbytes, the installer will create 4 cache sections (500 d6 125 = 4); if you choose a cache capacity of 2GB, the installer creates 16 sections (2000 d6 125 = 16). The optimum value for each section to get the number of sections is 125 Mbytes. More the number of sections larger the number of URLs stored and distributed across.

To Manage Cache Sections

Access the Server Manager, and click the Caching tab.

Click the Manage Sections link.

The Manage Sections page is displayed.

Change the information in the table.

The sections can be moved among existing partitions.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Setting the Garbage Collection Preferences

You can use the cache garbage collector to delete files from the cache. Garbage collection can be done in either the automatic mode or the explicit mode. The explicit mode is externally scheduled by the administrator. Select one of the modes and click OK. Click Restart Required. The Apply Changes page is displayed. Click the Restart Proxy Server button to apply the changes.

Scheduling Garbage Collection

The Schedule Garbage Collection page enables you to specify the days and time when garbage collection will take place.

To Set Garbage Collection

Access the Server Manager, and click the Caching tab.

Click the Schedule Garbage Collection link.

The Schedule Garbage Collection is displayed.

Select the time at which garbage collection will occur from the Schedule Garbage Collection At list.

Specify the day of the week on which garbage collection will occur.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Configuring the Cache

You can specify several configuration parameter values for URLs matching a regular expression pattern that you specify. This feature gives you fine control of the proxy cache based on the type of document cached. Configuring the cache can include identifying the following items:

The cache default
How to cache pages that require authentication
How to cache queries
The minimum and maximum cache file sizes
When to refresh a cached document
The cache expiration policy
The caching behavior for client interruptions
The caching behavior for failed connections to origin servers

Note –

If you set the cache default for a particular resource to either Derived configuration or Don’t cache, the cache configuration options will not appear on the Set Caching Configuration page. However, if you choose a cache default of Cache for a resource, you can specify several other configuration items.

To Configure the Cache

Access the Server Manager, and click the Caching tab.

Click the Set Caching Configuration page.

The Set Caching Configuration page is displayed.

Select the resource from the drop-down list or click the Regular Expression button, type a regular expression, and click OK.

Change the configuration information.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Caching Configuration Elements

The following sections include information that will help you to determine which configuration will best suit your needs.

Setting the Cache Default

The proxy server enables you to identify a cache default for specific resources. A resource is a type of file that matches certain criteria that you specify. For instance, to have your server automatically cache all documents from the domain company.com, you could create the following regular expression

[a-z] *://[^/:]\\.company\\.com.*.

By default, the Cache option is selected. Your server automatically caches all cacheable documents from that domain.

Note –

If you set the cache default for a particular resource to either Derived configuration or Don’t cache, it is not necessary to configure the cache for that resource. However, if you choose a cache default of Cache for a resource, you can specify several other configuration items. For a list of these items, see Configuring the Cache.

The cache default for HTTP, FTP, and Gopher can also be set.

Caching Pages That Require Authentication

You can have your server cache files that require user authentication. The Proxy Server tags the files in the cache so that it can require authentication from the remote server if a user asks for them.

Because the Proxy Server cannot determine how remote servers authenticate and it does not have a list of users’ IDs or passwords, it will simply force an up-to-date check with the remote server each time a request is made for a document that requires authentication. The user therefore must type an ID and password to gain access to the file. If the user has already accessed that server earlier in the browser session, the browser automatically sends the authentication information without prompting the user.

If you do not enable the caching of pages that require authentication, the proxy does not cache them, which is the default behavior.

Caching Queries

Cached queries only work with HTTP documents. You can limit the length of queries that are cached, or you can completely inhibit caching of queries. The longer the query, the less likely it is to be repeated, and the less useful it is to cache.

The following caching restrictions apply for queries:

The access method has to be GET, the document must not be protected (unless caching of authenticated pages is enabled), and the response must have at least a Last-modified header. This requires the query engine to indicate that the query result document can be cached.
If the Last-modified header is present, the query engine should support a conditional GET method (with an If-modified-since header) in order to make caching effective; otherwise the query engine should return an Expires header.

Setting Minimum and Maximum Cache File Sizes

You can set the minimum and maximum sizes for files cached by your Proxy Server. You may want to set a minimum size if you have a fast network connection. If your connection is fast, small files may be retrieved so quickly that having the server to cache them is unnecessary. In this instance, you would want to cache only larger files. You may want to set a maximum file size to make sure that large files do not occupy too much of your proxy’s disk space.

Setting the Up-to-date Checking Policy

The up-to-date checking policy ensures that the HTTP document is always up-to- date. You can also specify the refresh interval for the Proxy Server.

Setting Expiration Policy

You can set the Expiration Policy using the last modified factor or the explicit expiration information.

Setting Cache Behavior for Client Interruptions

If a document is only partially retrieved and the client interrupts the data transfer, the proxy can finish retrieving the document for the purpose of caching it. The proxy’s default is to finish retrieving a document for caching if at least 25 percent of the document has already been retrieved. Otherwise, the proxy terminates the remote server connection and removes the partial file. You can raise or lower the client interruption percentage.

Behavior on Failure to Connect to Server

If an up-to-date check on a stale document fails because the origin server is unreachable, you can specify whether the proxy sends the stale document from the cache.

Caching Local Hosts

If a URL requested from a local host lacks a domain name, the Proxy Server will not cache it. This behavior avoids duplicate caching. For example, if a user requests http://machine/filename.html and http://machine.example.com/filename.html from a local server, both URLs might appear in the cache. Because these files are from a local server, they may be retrieved so quickly caching them is not necessary.

However, if your company has servers in many remote locations, you might want to cache documents from all hosts to reduce network traffic and decrease the time needed to access the files.

To Enable the Caching of Local Hosts

Access the Server Manager, and click the Caching tab.

Click the Cache Local Hosts link.

The Cache Local Hosts page is displayed.

Select the resource from the drop-down list or click the Regular Expression button, type a regular expression, and click OK.

For more information on regular expressions, see Chapter 16, Managing Templates and Resources.

Click the enabled button.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Configuring the File Cache

The file cache is turned on by default. The file cache settings are contained in the server.xml file. You can use the Server Manager to change the file cache settings.

To Configure the File Cache

From the Server Manager, click the Caching tab.

Click the Configure File Cache link.

The Configure File Cache page is displayed.

Select Enable File Cache, if not already selected.

Choose whether to transmit files.

When you enable Transmit File, the server caches open file descriptors for files in the file cache rather than the file contents. PR_TransmitFile is used to send the file contents to a client. When Transmit File is enabled, the distinction normally made by the file cache between small, medium, and large files no longer applies, because only the open file descriptor is being cached. By default, Transmit File is enabled on Windows and disabled on UNIX. On UNIX, you should only enable Transmit File for platforms that have native OS support for PR_TransmitFile, which currently includes HP-UX. Use on UNIX/Linux platforms is not recommended.

Type a size for the hash table.

The default size is twice the maximum number of files plus 1. For example, if your maximum number of files is set to 1024, the default hash table size is 2049.

Type a maximum age in seconds for a valid cache entry.

The default setting is 30. This setting controls how long cached information will continue to be used once a file has been cached. An entry older than MaxAge is replaced by a new entry for the same file, if the same file is referenced through the cache. Set the maximum age based on whether the content is updated on a regular schedule. For example, if content is updated four times a day at regular intervals, you could set the maximum age to 21600 seconds (6 hours). Otherwise, consider setting the maximum age to the longest time you are willing to serve the previous version of a content file after the file has been modified.

Type the Maximum Number of Files to be cached.

The default setting is 1024.

Type medium and small file size limits in bytes.

The Medium File Size Limit is set by default to 537600. The Small File Size Limit is set by default to 2048.

The cache treats small, medium, and large files differently. The contents of medium files are cached by mapping the file into virtual memory only on UNIX/Linux platforms. The contents of small files are cached by allocating heap space and reading the file into it. Information about large files is cached but the file contents are not cached. The advantage of distinguishing between small files and medium files is to avoid wasting part of many pages of virtual memory when there are lots of small files. So the Small File Size Limit is typically a slightly lower value than the VM page size.

Set the medium and small file space.

The medium file space is the size in bytes of the virtual memory used to map all medium sized files. The size is set by default to 10485760. The small file space is the size of heap space in bytes used for the cache, including heap space used to cache small files. The size is set by default to 1048576 for UNIX/Linux.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Viewing the URL Database

You can view the names and attributes of all recorded cached URLs grouped by access protocol and site name. By accessing this information, you can perform various cache management functions such as expiring and removing documents from the cache.

To View the URLs in the Database

Access the Server Manager, and click the Caching tab.

Click the View URL Database link.

The View URL Database page is displayed.

Click the Regenerate button to generate a current list of cached URLs.

(Optional) To view the information for a specific URL, type a URL or regular expression in the Search field and click the Search button.

To view cache database information grouped by domain name and host:
1. Select a domain name from the list.
  
  A list of hosts in that domain appears. Click the name of a host and a list of URLs appears.
2. Click on the name of a URL.
  
  Detailed information about that URL appears.
3. Click the name of a URL to see detailed information about that URL.

To Cause Cached URLs to Expire or Remove the Cached URLs

Access the Server Manager, and click the Caching tab.

Click the View URL Database link.

The View URL Database page is displayed.

Click the Regenerate button to generate a snapshot of the cache database.

This snapshot forms the basis for the remaining steps.

If you know of a specific URL that you would like to cause the expiry of or remove, type that URL or a regular expression that matches that URL in the Search field and click the Search button.

If you would like to work with URLs grouped by domain name and host:
1. select a domain name from the list.
  
  A list of hosts in that domain appears.
2. Click the name of a host and a list of URLs appears.

To cause the expiry of individual files:
1. Select the Ex option next to the URLs for those files.
2. Click the Exp/Rem Marked button.

To expire all of the files in the list, click the Exp All button on the bottom of the form.

To remove individual files from the cache:
1. Select the Rm option next to the URLs for these files you want to remove.
2. Click the Exp/Rem Marked button.

To remove all of the files in the list, click the Rem All button.

Click the Regenerate button to regenerate the snapshot.

Note –
When you use the Ex or Rm option, the associated file is processed but the changes are not reflected in the snapshot. The snapshot needs to be regenerated for the changes to be visible.

Using Cache Batch Updates

You can pre-load files in a specified web site or do an up-to-date check on documents already in the cache whenever the proxy server is not busy. You can create, edit, and delete batches of URLs and enable and disable batch updating.

Creating Batch Updates

You can actively cache files by specifying files to be updated in a batch. You can perform an up-to-date check on several files currently in the cache or pre-load multiple files in a particular web site.

To Create a Batch Update

Access the Server Manager, and click the Caching tab.

Click the Set Cache Batch Updates link.

The Set Cache Batch Updates page is displayed.

Select New and Create from the drop-down lists next to Create/Select a Batch Update Configuration.

Click OK. The Set Cache Batch Updates page is displayed.

In the Name section, type a name for the new batch update entry.

In the Source section of the page, select the type of batch update that you want to create.

Click the first radio button if you want to perform an up-to-date check on all documents in the cache. Click the second radio button if you want to cache URLs recursively starting from the given source URL.

In the Source section fields, identify the documents that you want to use in the batch update.

In the Exceptions section, identify any files that you would like to exclude from the batch update.

In the Resources section, type the maximum number of simultaneous connections and the maximum number of documents to traverse.

Click OK.

Select the newly added batch name and Schedule from the drop-down lists next to Create/Select a Batch Update Configuration.

Click OK.

Note –
You can create, edit, and delete batch update configurations without having batch updates turned on. However, if you want your batch updates to be updated according to the times you set on the Set Cache Batch Updates page, you must turn updates on.

The Schedule Batch Updates page is displayed.

Select either Update On or Update Off option.

Select a time in the drop-down list and select the days on which you want the update to be run.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Editing or Deleting Batch Update Configurations

You can edit batch updates if you want to exclude certain files or want to update the batch more frequently. You might also want to delete a batch update configuration completely.

To edit or delete a batch update configuration

Access the Server Manager, and click the Caching tab.

Click the Set Cache Batch Updates link.

The Set Cache Batch Updates page is displayed.

To edit a batch, select the name of that batch and select Edit from the drop-down lists next to Create/Select a Batch Update Configuration.

Click OK.

The Set Cache Batch Updates page is displayed.

Modify the information as you wish.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

To Delete a Batch Update Configuration

Access the Server Manager, and click the Caching tab.

Click the Set Cache Batch Updates link.

To delete a batch, select the name of that batch and select Delete from the drop-down lists next to Create/Select a Batch Update Configuration.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Using the Cache Command-Line Interface

The proxy server comes with several command-line utilities that enable you to configure, change, generate, and repair your cache directory structure. Most of these utilities duplicate the functionality of the Server Manager pages. You might want to use the utilities if you need to schedule maintenance, for example, as a cron job. All of the utilities are located in the extras directory.

To Run the Command-Line Utilities

From the command-line prompt, go to the server_root/proxy-serverid directory.

Type ./start -shell

The following sections describe the various utilities.

Building the Cache Directory Structure

The proxy utility called cbuild is an offline cache database manager. This utility enables you to create a new cache structure or modify an existing cache structure using the command-line interface. You can use the Server Manager pages to enable the proxy to use the newly created cache.

Note –

The utility does not update the server.xml file. cbuild cannot resize a cache that has multiple partitions. When the cache is created or modified by cbuild, the cachecapacity parameter should be manually updated in the server.xml file.

<PARTITION partitionname="part1" partitiondir="/home/build/install9
/proxy-server1/cache" maxsize="1600" minspace="5" enabled="true"/>
<CACHE enabled="true" cachecapacity="2000" cachedir="/tmp/cache">

You can invoke the cbuild utility in two modes. The first mode is:

cbuild -d conf-dir -c cache-dir -s cache size 
cbuild -d conf-dir -c cache-dir -s cache size -r

For example:

cbuild -d server_root/proxy-serverid/config 
	-c server_root/proxy-serverid/cache -s 512
cbuild -d server_root/proxy-serverid/config 
	-c server_root/proxy-serverid/cache -s 512 -r

where

conf-dir is the configuration directory of the proxy instance located in the server_root/proxy-serverid/config directory.
cache-dir is the directory for your cache structure.
cache size is the maximum size to which the cache can grow. This option cannot be used with the cache-dim parameter. The maximum size is 65135 Mbytes.
-r resizes an existing cache structure provided it has a single partition. This is not required for creating a new cache.

The second mode is:

cbuild -d conf-dir -c cache-dir -n cache-dim
cbuild -d conf-dir -c cache-dir -n cache-dim -r

For example:

cbuild -d server_root/proxy-serverid/config 
	-c server_root/proxy-serverid/cache -n 3
cbuild -d server_root/proxy-serverid/config 
	-c server_root/proxy-serverid/cache -n 3 -r

where

conf-dir is the configuration directory of the proxy instance located in the server_root/proxy-serverid/config directory.
cache-dir is the directory for your cache structure.
cache-dim determines the the number of sections. For example, in Figure 12–2 the section shown as s3.4, the 3 indicates the dimension. The default value of cache-dim is 0 and the maximum value is 8.
-r resizes an existing cache structure provided it has a single partition. This option is not required for creating a new cache.

Additionally, cbuild accepts a -R argument which specifies that the .size files of a specified partition must be updated to full accuracy. For example:

cbuild -d conf-dir -c cache-dir -R

Managing the Cache URL List

The proxy utility urldb manages the URL list in the cache. You can use this utility to list the URLs that are cached. You can also selectively expire and remove cached objects from the cache database.

The urldb commands can be categorised into three groups based on the -o option:

domains
sites
URLs

To list domains, type the following command at the command line:
```
urldb -o matching_domains -e reg-exp -d conf-dir
```
For example:
```
urldb -o matching_domains -e “.*phoenix.*” -d server-root/proxy-serverid/config
```
where
- matching_domains lists domains that match regular expression
- reg-exp is the regular expression used
- conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.
To list all the matching sites in a domain, type the following command at the command line:
```
urldb -o matching_sites_in_domain -e reg-exp -m domain_name -d conf-dir
```
For example:
```
urldb -o matching_sites_in_domain -e “.*atlas” -m phoenix.com 
	-d server-root/proxy-serverid/config
```
where
- matching_sites_in_domain lists all the sites in a domain that match the regular expression
- reg-exp is the regular expression used
- domain_name is the name of the domain
- conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.
To list all the matching sites, type the following command at the command line:
```
urldb -o all_matching_sites -e reg-exp -d conf-dir
```
For example:
```
urldb -o all_matching_sites -e “.*atlas.*” -d server-root/proxy-serverid/config
```
where
- all_matching_sites lists all the sites that match the regular expression
- reg-exp is the regular expression used
- conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.
To list matching URLs in a site, type the following command at the command line:
```
urldb -o matching_urls_from_site -e reg-exp -s site_name -d conf-dir
```
For example:
```
urldb -o matching_urls_from_site -e “http://.*atlas.*” -s atlas.phoenix.com 
	-d server-root/proxy-serverid/config
```
where
- matching_urls_from_site lists all URLs from site that match the regular expression
- reg-exp is the regular expression used
- site_name is the name of the site
- conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.
To expire or remove matching URLs in a site, type the following command at the command line:
```
urldb -o matching_urls_from_site -e reg-exp -s site_name -x e -d conf-dir
urldb -o matching_urls_from_site -e reg-exp -s site_name -x r -d conf-dir
```
For example:
```
urldb -o matching_urls_from_site -e “http://.*atlas.*” -s atlas.phoenix.com 
	-x e -d server-root/proxy-serverid/config
```
where
- matching_urls_from_site lists all URLs from site that match the regular expression
- reg-exp is the regular expression used
- site_name is the name of the site
- -x e is the option to expire the matching URLs from the c ache database. This option can not be used with the domain and site modes
- -x r is the option to remove the matching URLs from the cache database
- conf-dir is the configuration directory of the proxy instance. It is located in the server-root/proxy-serverid/config directory.
To list all matching URLs , type the following at the command line:
```
urldb -o all_matching_urls -e reg-exp -d conf-dir
```
For example:
```
urldb -o all_matching_urls -e “.*cgi-bin.*” -d 
	server-root/proxy-serverid/config
```
where
- all_matching_urls lists all the URLs that match the regular expression
- reg-exp is the regular expression used
- conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.
To cause the expiry of all matching URLs, or to remove all matching URLs, type the following command at the command line:
```
urldb -o all_matching_urls -e reg-exp -x e -d conf-dir
urldb -o all_matching_urls -e reg-exp -x r -d conf-dir
```
For example:
```
urldb -o all_matching_urls -e “.*cgi-bin.*” -x e -d server-root/proxy-serverid/config
```
where
- all_matching_urls lists all the URLs that match the regular expression
- reg-exp is the regular expression used
- -x e is the option to cause the expiry of the matching URLs from the cache database
- -x r is the option to remove the matching URLs from the cache database
- conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.
To cause the expiry of a list of URLs, or to remove a list of URLs , type the following command at the command line:
```
urldb -l url-list -x e -e reg-exp -d conf-dir
urldb -l url-list -x r -e reg-exp -d conf-dir
```
For example:
```
urldb -l url.lst -x e -e “.*cgi-bin.*” -d server-root/proxy-serverid/config
```
where
- url-list is the list of URLs that need to be expired. This option can be used for providing the URL list.
- -x e is the option to cause the expiry of the matching URLs from the cache database.
- -x r is the option to remove the matching URLs from the cache database.
- reg-exp is the regular expression used
- conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.

Managing Cache Garbage Collection

The cachegc utility enables you to remove objects from the cache database that might have expired or are too old to be cached due to cache size constraints.

Note –

Ensure that the CacheGC is not running in the proxy instance when the cachegc utility is used.

The cachegc utility can be used in the following way:

cachegc -f leave-fs-full-percent -u gc-high-margin-percent -l gc-low-margin-percent -e 
	extra-margin-percent -d conf-dir

For example:

cachegc -f 50 -u 80 -l 60 -e 5 -d server-root/proxy-serverid/config

where

leave-fs-full-percent determines the percentage of the cache partition size below which garbage collection will not go
gc-high-margin-percent controls the percentage of the maximum cache size that, when reached, triggers garbage collection
gc-low-margin-percent controls the percentage of the maximum cache size that the garbage collector targets
extra-margin-percent is used by the garbage collector to determine the fraction of the cache to remove.
conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.

Managing Batch Updates

The bu utility updates the cache and works in two modes. In the first mode, it iterates through the cache database and updates all the URLs that are present in the cache by sending HTTP requests for each. In the second mode, it starts with a given URL and does a breadth first iteration of all the links from that URL to the depth that you specify and fetches pages to the cache. bu is a RFC compliant robot.

bu -n hostname -p port -t time-lmt -f contact-address -s sleep-time -o object -r n -d conf-dir

For example:

bu -n phoenix -p 80 -t 3600 -f admin@phoenix.com -s 60 -o nova -r n 
	-d server-root/proxy-serverid/config

where

hostname is the host name of the machine on which proxy is running. The default value is the localhost.
port is the port on which proxy server is running. The default port is 8080.
time-lmt is the time limit to which the utility will run
contact-address determines the contact address that would be sent in the HTTP requests that are sent from bu. The default value is worm@proxy-name.
sleep-time is the sleep time between two consecutive requests. The default value is 5 seconds.
object is the object specified in bu.conf that is currently being executed.
-r n option determines whether the robot.txt policy is followed. The default value is y.
conf-dir is the configuration directory of the proxy instance located in the server-root/proxy-serverid/config directory.

Using the Internet Cache Protocol (ICP)

The Internet Cache Protocol (ICP) is an object location protocol that enables caches to communicate with one another. Caches can use ICP to send queries and replies about the existence of cached URLs and about the best locations from which to retrieve those URLs. In a typical ICP exchange, one cache will send an ICP query about a particular URL to all neighboring caches. Those caches will then send back ICP replies that indicate whether they contain that URL. If the caches do not contain the URL, they send back miss. If they do contain the URL, they send back hit.

Routing Through ICP Neighborhoods

ICP can be used for communication among proxies located in different administrative domains. It enables a proxy cache in one administrative domain to communicate with a proxy cache in another administrative domain. It is effective for situations in which several proxy servers want to communicate, but cannot all be configured from one master proxy as they are in a proxy array. Figure 12–3 shows an ICP exchange between proxies in different administrative domains.

The proxies that communicate with each other through ICP are called neighbors. You cannot have more than 64 neighbors in an ICP neighborhood. The two types of neighbors in an ICP neighborhood are parents and siblings. Only parents can access the remote server if no other neighbors have the requested URL. Your ICP neighborhood can have no parents or it can have more than one parent. Any neighbor in an ICP neighborhood that is not a parent is considered a sibling. Siblings cannot retrieve documents from remote servers unless the sibling is marked as the default route for ICP, and ICP uses the default.

You can use polling rounds to determine the order in which neighbors receive queries. A polling round is an ICP query cycle. For each neighbor, you must assign a polling round. If you configure all neighbors to be in polling round one, then all neighbors will be queried in one cycle at the same time. If you configure some of the neighbors to be in polling round 2, then all of the neighbors in polling round one are queried first and if none of them return a Hit, all round two proxies will be queried. The maximum number of polling rounds is two.

Since ICP parents are likely to be network bottlenecks, you can use polling rounds to lighten their load. A common setup is to configure all siblings to be in polling round one and all parents to be in polling round two. That way, when the local proxy requests a URL, the request goes to all of the siblings in the neighborhood first. If none of the siblings have the requested URL, the request goes to the parent. If the parent does not have the URL, the URL will retrieve it from a remote server.

Each neighbor in an ICP neighborhood must have at least one ICP server running. If a neighbor does not have an ICP server running, it cannot answer the ICP requests from their neighbors. Enabling ICP on your proxy server starts the ICP server if it is not already running.

Figure 12–3 ICP Exchange

Diagram showing an ICP exchange between proxies in different
administrative domains.

Setting Up ICP

This section provides details about setting up ICP. The general steps required to set up ICP are:

(Optional) Add parents to your ICP neighborhood.

For more information, see To Add Parent or Sibling Proxies to an ICP Neighborhood.
Add siblings to your ICP neighborhood.

For more information, see To Add Parent or Sibling Proxies to an ICP Neighborhood.
Configure each neighbor in the ICP neighborhood.

For more information, see To Edit a Configuration in an ICP Neighborhood.
Enable ICP.

For information, see To Enable ICP.
If your proxy has siblings or parents in its ICP neighborhood, enable routing through an ICP neighborhood.

For more information, see To Enable Routing Through an ICP Neighborhood.

To Add Parent or Sibling Proxies to an ICP Neighborhood

Access the Server Manager, and click the Caching tab.

Click the Configure ICP link.

The Configure ICP page is displayed.

In the Parent List section of the page, click the Add button.

The ICP Parent page is displayed.
- To add a parent proxy, click Add in the Parent List section of the page.
  
  The ICP Parent page is displayed.
- To add a sibling proxy, click Add in the Sibling List section of the page.
  
  The ICP Sibling page is displayed.

In the Machine Address field, type the IP address or host name of the proxy you are adding to the ICP neighborhood.

In the ICP Port field, type the port number on which the proxy will listen for ICP messages.

(Optional) In the Multicast Address field, type the multicast address to which the parent listens. A multicast address is an IP address to which multiple servers can listen.

Using a multicast address enables a proxy to send one query to the network that all neighbors who are listening to that multicast address can see. This technique eliminates the need to send a query to each neighbor separately. Using multicast is optional.

Note –
Neighbors in different polling rounds should not listen to the same multicast address.

In the TTL field, type the number of subnets that the multicast message will be forwarded to.

If the TTL is set to 1, the multicast message will only be forwarded to the local subnet. If the TTL is 2, the message will go to all subnets that are one level away, and so on.

Note –
Multicast enables two unrelated neighbors to send ICP messages to each other. Therefore, to prevent unrelated neighbors from receiving ICP messages from the proxies in your ICP neighborhood, set a low TTL value in the TTL field.

In the Proxy Port field, type the port for the proxy server on the parent.

From the Polling Round drop-down list, choose the polling round that you want the parent to be in. The default polling round is 1.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

To Edit a Configuration in an ICP Neighborhood

Access the Server Manager, and click the Caching tab.

Select the Configure ICP link. The Configure ICP page is displayed.

Select the radio button next to the proxy you want to edit.

Click the Edit button.

Modify the appropriate information.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

To Remove Proxies from an ICP Neighborhood

Access the Server Manager, and click the Caching tab.

Select the Configure ICP link. The Configure ICP page is displayed.

Select the radio button next to the proxy you want to remove.

Click the Delete button.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

To Configure the Local Proxy Server in Your ICP Neighborhood

You need to configure each neighbor, or local proxy, in your ICP neighborhood.

Access the Server Manager, and click the Caching tab.

Select the Configure ICP link.

The Configure ICP page is displayed.

In the Binding Address field, type the IP address to which the neighbor server will bind.

In the Port field, type the port number to which the neighbor server will listen for ICP.

In the Multicast Address field, type the multicast address to which the neighbor listens.

A multicast address is an IP address to which multiple servers can listen. Using a multicast address enables a proxy to send one query to the network that all neighbors who are listening to that multicast address can see. This technique eliminates the need to send a query to each neighbor separately.

If both a multicast address and bind address are specified for the neighbor, the neighbor uses the bind address to send replies and uses multicast to listen. If neither a bind address or a multicast address is specified, the operating system will decide which address to use to send the data.

In the Default Route field, type the name or IP address of the proxy to which the neighbor should route a request when none of the neighboring proxies respond with a hit.

If you type the word “origin” into this field, or if you leave the field blank, the default route will be to the origin server.

If you choose “first responding parent” from the No Hit Behavior drop-down list , the route you type in the Default Route field will have no effect. The proxy only uses this route if you choose the default “no hit” behavior.

In the second Port field, type the port number of the default route machine that you typed into the Default Route field.

From the On No Hits, Route Through drop-down list, select the neighbor’s behavior when none of the siblings in the ICP neighborhood have the requested URL in their caches.

The available options are:
- first responding parent. The neighbor will retrieve the requested URL through the parent that first responds with a miss
- default route. The neighbor will retrieve the requested URL through the machine specified in the Default Route field

In the Server Count field, type the number of processes that will service ICP requests.

In the Timeout field, type the maximum amount of time the neighbor will wait for an ICP response in each round.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

To Enable ICP

Access the Server Manager, and click the Preferences tab.

Click the Configure System Preferences link.

The Configure System Preferences page is displayed.

Select the Yes radio button for ICP and Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

To Enable Routing Through an ICP Neighborhood

You need to enable routing through an ICP neighborhood only if your proxy has other siblings or parents in the ICP neighborhood. If your proxy is a parent to another proxy and does not have any siblings or parents of its own, then you need to enable ICP only for that proxy. You do not need to enable routing through an ICP neighborhood.

Access the Server Manager, and click the Routing tab.

Click the Set Routing Preferences link.

The Set Routing Preferences page is displayed.

Select the resource from the drop-down list or click the Regular Expression button, type a regular expression, and click OK.

Select the radio button next to the Route Through option.

Select the checkbox next to ICP.

(Optional) To enable the client to retrieve a document directly from the ICP neighbor that has the document instead of going through another neighbor to get it, select the checkbox next to the Text Redirect option.

Click OK.

Caution –
Redirect is not currently supported by any clients, so don’t use the feature at this time.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Using Proxy Arrays

Proxy arrays for distributed caching enable multiple proxies to serve as a single cache. Each proxy in the array will contain different cached URLs that can be retrieved by a browser or downstream proxy server. Proxy arrays prevent the duplication of caches that often occurs with multiple proxy servers. Through hash-based routing, proxy arrays route requests to the correct cache in the proxy array.

Proxy arrays also enable incremental scalability. If you decide to add another proxy to your proxy array, each member’s cache is not invalidated. Only 1/n of the URLs in each member’s cache, where n is the number of proxies in your array, will be reassigned to other members.

Routing Through Proxy Arrays

For each request through a proxy array, a hash function assigns each proxy in the array a score that is based on the requested URL, the proxy’s name and the proxy’s load factor. The request is then routed to the proxy with the highest score.

Since requests for URLs can come from both clients and proxies, there are two types of routing through proxy arrays: client-to-proxy routing and proxy-to-proxy routing.

In client-to-proxy routing, the client uses the Proxy Auto Configuration (PAC) mechanism to determine which proxy to go through. However, instead of using the standard PAC file, the client uses a special PAC file that computes the hash algorithm to determine the appropriate route for the requested URL. Figure 12–4 shows client to proxy routing. In this fugure, each member of the proxy array loads and polls the master proxy for updates to the PAT file. Once the client has a PAC file, the client only needs to download this file again if the configuration changes. Generally, clients will download the PAC file at restart.

The proxy server can automatically generate the special PAC file from the Proxy Array Membership Table (PAT) specifications you determine using the administration interface.

Figure 12–4 Client to Proxy Routing

Diagram showing client to proxy routing.

In proxy-to-proxy routing, proxies use a PAT (Proxy Array Table) file to compute the hash algorithm instead of the PAC file used by clients. The PAT file is an ASCII file that contains information about a proxy array, including the proxies’ machine names, IP addresses, ports, load factors, cache sizes, and so on. For computing the hash algorithm at the server, using a PAT file is much more efficient than using a PAC file (which is a JavaScript file that has to be interpreted at run-time). However, most clients do not recognize the PAT file format, and therefore, must use a PAC file. Figure 12–5 shows proxy-to-proxy routing.

The PAT file is be created on the master proxy in the proxy array. The proxy administrator must determine which proxy will be the master proxy. The administrator can change the PAT file from this master proxy server. All other members of the proxy array can then manually or automatically poll the master proxy for these changes. You can configure each member to automatically generate a PAC file from these changes.

You can also chain proxy arrays together for hierarchical routing. If a proxy server routes an incoming request through an upstream proxy array, the upstream proxy array is then known as a parent array. In other words, if a client requests a document from Proxy X, and Proxy X does not have the document, it sends the request to Proxy Array Y instead of sending it directly to the remote server. So, Proxy Array Y is a parent array.

In Figure 12–5, Proxy Array 1 is a parent array to Proxy Array 2. A member of Proxy Array 2 loads and polls for updates to the parent array’s PAT file. Usually, the member polls the master proxy in the parent array. The hash algorithm for the requested URL is computed using the downloaded PAT file. The member in the Proxy Array 2 then retrieves the requested URL from whichever proxy in Proxy Array 1 has the highest score. In the figure, Proxy B has the highest score for the URL requested by the client.

Figure 12–5 Proxy-to-Proxy Routing

The general steps to set up a proxy array are as follows.

From the master proxy, do the following steps:

Create the proxy array.

For more information on creating the member list, see Creating a Proxy Array Member List.
Generate a PAC file from your PAT file.

You only need to generate a PAC file if you are using client to proxy routing. For more information, see Generating a PAC File From a PAT File.
Configure the master member of the array. For more information, see Configuring Proxy Array Members.
Enable routing through a proxy array. For more information, see Enabling Routing Through a Proxy Array.
Create a PAT mapping to map the URL /pat to the PAT file.
Enable your proxy array.

For more information, see Enabling or Disabling a Proxy Array.

From each of the non-master proxies, do the following steps:

Configure the non-master member of the array.

For more information, see Configuring Proxy Array Members
Enable routing through a proxy array.

For more information, see Enabling Routing Through a Proxy Array.
Enable your proxy array.

For more information, see Enabling or Disabling a Proxy Array.

Note –
If your proxy array is going to route through a parent array, you also need to enable the parent array and configure each member to route through a parent array for desired URLs. For more information, see Routing Through Parent Arrays.

Creating a Proxy Array Member List

You should create and update the proxy array member list from the master proxy of the array only. You only need to create the proxy array member list once, but you can modify it at any time. By creating the proxy array member list, you are generating the PAT file to be distributed to all of the proxies in the array and to any downstream proxies.

Note –

You should only make changes or additions to the proxy array member list through the master proxy in the array. All other members of the array can only read the member list.

To Create a Proxy Array Member List

Access the Server Manager, and click the Caching tab

Click the Configure Proxy Array link.

The Configure Proxy Array page is displayed.

In the Array name field, type the name of the array.

In the Reload Configuration Every field, type the number of minutes between each polling for the PAT file.

Click the Array Enabled checkbox.

Click the Create button.

The Create button changes to an OK button after the proxy array has been created.

Note –
Be sure to click OK before you begin to add members to the member list.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

For each member in the proxy array, provide the following and then click OK.

The master member should be added first before adding the other members.
- Name. The name of the proxy server you are adding to the member list
- IP Address. The IP address of the proxy server you are adding to the member list
- Port. This is the port on which the member polls for the PAT file.
- Load Factor. An integer that reflects the relative load that should be routed through the member.
- Status. The status of the member. This value can be either on or off. If you disable a proxy array member, the member’s requests will be re-routed through another member
Note –
Be sure to click OK after you type the information for each proxy array member you are adding.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Editing Proxy Array Member List Information

At any time, you can change the information for the members in the proxy array member list. You can only edit the proxy array member list from the master proxy.

Note –

You should only make changes or additions to the proxy array member list through the master proxy in the array. If you modify this list from any other member of the array, all changes will be lost.

To Edit Member List Information

Access the Server Manager, and click the Caching tab.

Click the Configure Proxy Array link.

The Configure Proxy Array page is displayed.

In the Member List, select the radio button next to the member that you want to edit.

Click the Edit button.

The Configure Proxy Array Member page is displayed.

Edit the appropriate information.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Note –
If you want your changes to take effect and to be distributed to the members of the proxy array, update the Configuration ID on the Configure Proxy Array page and click OK. To update the configuration ID, you can simply increase it by one.

Deleting Proxy Array Members

Deleting proxy array members removes them from the proxy array. You can only delete proxy array members from the master proxy.

To Delete Members of a Proxy Array

Access the Server Manager, and click the Caching tab.

Click the Configure Proxy Array link.

The Configure Proxy Array page is displayed.

In the Member List, select the radio button next to the member that you want to delete.

Click the Delete button.

Note –
If you want your changes to take effect and to be distributed to the members of the proxy array, update the Configuration ID on the Configure Proxy Array page and click OK. To update the configuration ID, you can simply increase it by one.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Configuring Proxy Array Members

You must configure each member in the proxy array once from the member itself. You cannot configure a member of the array from another member. You also need to configure the master proxy.

To Configure Each Member of the Proxy Array

Access the Server Manager, and click the Caching tab.

Click the Configure Proxy Array Member link.

The Configure Proxy Array Member page is displayed.

In the Proxy Array section, indicate whether the member needs to poll for the PAT file by selecting the appropriate radio button.
- Non-Master Member. Select this option if the member you are configuring is not the master proxy. Any proxy array member that is not a master proxy must poll for the PAT file in order to retrieve it from the master proxy.
- Master Member.Select this option if you are configuring the master proxy. If you are configuring the master proxy, the PAT file is local and does not need to be polled.

In the Poll Host field, type the name of the master proxy to be polled for the PAT file.

In the Port field, type the port at which the master proxy accepts HTTP requests.

In the URL field, type the URL of the PAT file on the master proxy. If you have created a PAT mapping on the master proxy, to map the PAT file to the URL /pat, you should type /pat in the URL field.

(Optional) In the Headers File field, type the full path name for a file with any special headers that must be sent with the HTTP request for the PAT file, such as authentication information.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Enabling Routing Through a Proxy Array

To Enable Routing Through a Proxy Array

Access the Server Manager, and click the Routing tab.

Click the Set Routing Preferences link.

The Set Routing Preferences page is displayed.

Select the resource from the drop-down list or click the Regular Expression button, type a regular expression, and click OK.

Select the Route Through option.

Select the checkboxes for proxy array or parent array.

You can only enable proxy array routing if the proxy server you are configuring is a member of a proxy array. You can only enable parent routing if a parent array exists. Both routing options are independent of each other.

If you choose to route through a proxy array and you want to redirect requests to another URL, select the redirect checkbox.

Redirecting means that if a member of a proxy array receives a request that it should not service, it tells the client which proxy to contact for that request.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Enabling or Disabling a Proxy Array

If you are not routing through a proxy array, you should make sure that all clients use a special PAC file to route correctly before you disable the proxy array option. If you disable the parent array option, you should have valid alternative routing options set in the Set Routing Preferences page, such as explicit proxy or a direct connection.

To Enable or Disable a Proxy Array

Access the Server Manager, and clcik the Preferences tab.

Click the Configure System Preferences link.

The Configure System Preferences page is displayed.

Enable or Disable the proxy array.
- To enable the proxy array, click the Yes option for the type of array or arrays you want to enable: a normal proxy array or a parent array.
- To disable the proxy array, click No.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Redirecting Requests in a Proxy Array

If you choose to route through a proxy array, you need to designate whether you want to redirect requests to another URL. Redirecting means that if a member of a proxy array receives a request that it should not service, it tells the client which proxy to contact for that request.

Generating a PAC File From a PAT File

Because most clients do not recognize the PAT file format, the clients in client-to-proxy routing use the Proxy Auto Configuration (PAC) mechanism to receive information about which proxy to go through. However, instead of using the standard PAC file, the client uses a special PAC file derived from the PAT file. This special PAC file computes the hash algorithm to determine the appropriate route for the requested URL.

You can generate a PAC file from the PAT file manually or automatically . If you manually generate the PAC file from a specific member of the proxy array, that member will immediately regenerate the PAC file based on the information currently in the PAT file. If you configure a proxy array member to automatically generate a PAC file, the member will automatically regenerate the file after each time it detects a modified version of the PAT file.

Note –

If you are not using the proxy array feature for your proxy server, use the Create/Edit Autoconfiguration File page to generate your PAC file. For more information see Chapter 17, Using the Client Autoconfiguration File.

To manually generate a PAC file from a PAT file

The PAC file can be generated only from the master proxy.

Access the Server Manager of the master proxy, and click the Caching tab.

Click the Configure Proxy Array link.

The Configure Proxy Array page is displayed.

Click the Generate PAC button.

The PAC Generation page is displayed.

If you want to use custom logic in your PAC file, type the name of the file containing the customized logic you would like to include in the generation of your PAC file in the Custom logic file field.

This logic is inserted before the proxy array selection logic in the FindProxyForURL function. This function is typically used for local requests which need not go through the proxy array.

If you have already provided the custom logic file when configuring the proxy array member, this field will be populated with that information. You may edit the custom logic file name here.

In the Default Route field, type the route a client should take if the proxies in the array are not available.

If you have already provided the default route when configuring a proxy array member, this field will be populated with that information. You may edit the default route here.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

To Automatically Generate a PAC File

Access the Server Manager, and click the Caching tab.

Click the Configure Proxy Array Member link.

The Configure Proxy Array Member page is displayed.

Select the Auto-generate PAC File checkbox.

If you want to use custom logic in your PAC file, type the name of the file containing the customized logic you would like to include in the generation of your PAC file in the Custom Logic File field

This logic is inserted before the proxy array selection logic in the FindProxyFor URL function.

If you have already provided and saved the custom logic file when configuring the proxy array, this field will be populated with that information. You may edit the custom logic file name here.

In the Default Route field, type the route a client should take if the proxies in the array are not available.

If you have already provided the default route when configuring the proxy array, this field will be populated with that information. You may edit the default route.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Routing Through Parent Arrays

You can configure your proxy or proxy array member to route through an upstream parent array instead of going directly to a remote server.

To Route Through a Parent Array

Enable the parent array.

For more information, see Enabling or Disabling a Proxy Array.

Enable routing through the parent array.

For more information, see Enabling Routing Through a Proxy Array.

Access the Server Manager, and click the Caching tab.

Click the Configure Proxy Array Member link.

The Configure Proxy Array Member page is displayed.

In the Poll Host field in the Parent Array section of the page, type the host name of the proxy in the parent array to be polled for the PAT file.

This proxy is usually the master proxy of the parent array.

In the Port field in the Parent Array section of the page, type the port number of the proxy in the parent array that you will poll for the PAT file.

In the URL field, type the URL of the PAT file on the master proxy.

If you have created a PAT mapping on your master proxy, type the mapping into this URL field.

(Optional) In the Headers File field in the Parent Array section of the form, type the full path name for a file with any special headers that must be sent with the HTTP request for the PAT file, such as authentication information.

This field is optional.

Click OK.

Click Restart Required.

The Apply Changes page is displayed.

Click the Restart Proxy Server button to apply the changes.

Viewing Parent Array Information

If your proxy array is routing through a parent array, you need information about the members of the parent array. This information is sent from the parent array in the form of a PAT file.

To View Parent Array Information

Access the Server Manager, and click the Caching tab.

Click the View Parent Array Configuration link.

The View Parent Array Configuration page is displayed.

View the information.