Sun ONE logo     Previous      Contents      Index      Next     
Sun ONE Web Proxy Server 3.6 SP2 Administrator's Guide - UNIX Version



Chapter 9   Caching

This chapter describes how iPlanet Web Proxy Server caches documents. It also describes how you can configure the cache by using the online forms and how the cache directory structure is maintained automatically by the Cache Monitor and Cache Manager.

How Caching Works

Caching reduces network traffic and offers faster response time for clients who are using the proxy server instead of going directly to remote servers.

When a client requests a web page or document from the proxy server, the proxy server copies the document from the remote server to its local cache directory structure while sending the document to the client.

When a client requests a document that was previously requested and copied into the proxy cache, the proxy returns the document from the cache instead of retrieving the document from the remote server again (see Figure 9-1). If the proxy determines the file is not up to date, it refreshes the document from the remote server and updates its cache before sending it to the client.

Figure 9-1    Proxy document retrieval

Files in the cache are automatically maintained by the iPlanet Web Proxy Server Cache Manager. The Cache Manager automatically cleans the cache on a regular basis to ensure that the cache doesn't get cluttered with out-of-date documents.

Understanding the Cache Structure

A cache consists of one or more partitions. Conceptually, a partition is a storage area on a disk that you set aside for caching. If you wish to have your cache span several disks, you need to configure at least one cache partition for each disk. Each partition can be independently administered. In other words, you can enable, disable, and configure a partition independently of all other partitions.

Storing a large number of cached files in a single location can slow performance; therefore, it is a good idea to create several directories, or sections, in each partition. Sections are the next level under partitions in the cache structure. You can have up to 256 sections in your cache across all partitions. The number of cache sections must be a power of 2 (for example, 1, 2, 4, 8, 16, ..., 256).

The final level in the cache structure hierarchy is the subsection. Subsections are directories within sections. If you choose to have subsections, you may have up to 256 of them, and the number of subsections must be a power of two. Cached files are stored in the lowest level in your cache.

Figure 9-2 shows an example cache structure with partitions and sections. In this figure, the cache directory structure divides the total cache into three partitions. The first partition contains four cache sections, and the second two partitions each contain two sections.

For the Unix proxy server, each cache section is noted by s for section, and then a section number. For the section shown as s3.4, the 3 indicates the power of 2 for the number of cache sections (23 = 8), and the 4 means the number for the section (for the 8 sections labeled 0 through 7). Therefore, s3.4 means section 5 of 8.

Figure 9-2    Example of a cache structure

In summary, a cache consists of partitions. In those partitions you may have sections, and within those sections you may have subsections. Cached files are always stored in the lowest level in your cache. Therefore, if your cache has subsections within the sections, the cached files are stored in the subsections. If your cache has sections, but no subsections, the files are stored in the sections.



Note

If you are unsure about how many cache sections and subsections to create for your cache, remember that for good cache performance, it is wise to plan for approximately 100 and no more than 500 cached files in each directory.



Distributing Files in the Cache

The proxy server uses a specific algorithm to determine the directory where a document should be stored. This algorithm ensures equal distribution of documents in the base directories, so the directories contain a small and nearly equal number of documents. Equal distribution is important because directories with large numbers of documents tend to cause performance problems.

The Unix proxy server uses the RSA MD5 algorithm (Message Digest 5) to reduce a URL to 8 characters, which it then uses for the file name of the document it stores in the cache. The MD5 algorithm reduces the URL to 128 bits (16 bytes) of binary data. The proxy uses 48 bits (6 bytes) of this data to calculate an 8-character file name and determine the storage directory. This method allows the proxy to cache over 70 million URLs.

Setting Cache Specifics

You can enable caching and control which types of protocols your proxy server will cache by setting the cache specifics. Cache specifics include the following items:

  • Whether your cache is enabled or disabled
  • The cache manager's "working directory" where it stores its temporary files
  • The name of the directory in which you will record the cached URLs
  • The size of the cache
  • The capacity of the cache
  • What types of protocols will be cached
  • When to refresh a cached document
  • Whether the proxy should track the number of times a document is accessed and report it back to the remote server


  • Note

    Setting the specifics for a large cache is time-consuming and may cause the administration interface to time-out. Therefore, if you are creating a large cache, use the command line utilities to set cache specifics. For more information on the cache command line utilities, see Using the Cache Command Line Utilities.



To set cache specifics from the Server Manager:

  1. In the Server Manager, choose Caching|Specifics.
  2. The Cache Specifics form appears.

  3. Change the information.
  4. Click OK.

The following sections describe the items listed on the Cache Specifics form. These sections include information that will help you to determine which settings will best suit your needs.

Enabling the Cache

Caching is an effective way to reduce network traffic for users of the proxy server. Caching also offers a faster response time for clients by eliminating the need to retrieve a document from a remote server. Your proxy server will function most effectively whenever caching is enabled.

You can enable the cache on the Cache Specifics form.

Creating a Cache Working Directory

If you set up caching during installation, you specified a directory for the proxy's cache structure. This directory is also used as a working directory for the Cache Manager. The working directory is where the proxy puts the temporary files that are related to caching. The actual cache files are under cache partitions. The working directory you specify on the Cache Specifics form is often the parent directory for the cache (though it does not need to be). All cached files appear in an organized directory structure under the caching directory. If you change the cache directory name or move it to another location, you have to tell the proxy the new location.

You can extend the cache directory structure to multiple file systems so that you can have a large cache structure divided on multiple smaller disks instead of keeping it all on one large disk. Each proxy server must have its own cache directory structure—that is, cache directories can't be concurrently shared by multiple proxy servers.

You can create the working directory on the Cache Specifics form.

Recording URLs

Your proxy server allows you to record all cached URLs in a URL list. You can identify which directory will hold all cached URL information and enable URL recording on the Cache Specifics form. For information on viewing and editing the URL list, see Accessing Cache Manager Information.



Note

The proxy does not have to record URLs to function properly. This feature exists so that the proxy administrator can view which URLs are in the cache. Continually recording URLs into a list may have an impact on the proxy's performance. To avoid this negative effect on performance, you can disable URL recording on the Cache Specifics form and view or manage URLs in the cache by using the command line program: extras/proxy/urldbgen. This program generates the URL list on command and does not effect the proxy's performance. See "Repairing the Cache URL List" for more information about urldbgen.



Setting the Cache Size

Cache size is the maximum size the cache is allowed to grow. The maximum cache size is 64GB. The amount of disk space available for the proxy cache has a considerable effect on cache performance. If the cache is too small, the Cache Manager program must remove cached documents to make room on the disk more often, and documents must be retrieved from content servers more often; therefore slowing performance.

Large cache sizes are best because the more cached documents, the less the network traffic load and the faster the response time the proxy provides. Also, the Cache Manager removes cached documents if users no longer need them. Barring any file system limitations, cache size can never be too large; the excess space simply remains unused.

Proxy caching is designed to work efficiently at any size up to 64GB. The exact cache size you choose depends on the number of people using your proxy server. For a single user cache, 20MB to 50MB is usually enough. For a proxy that caches a multitude of documents, you might need to allocate an entire 2GB to 4GB disk partition for the cache. You can also have the cache split on multiple disk partitions. See "Adding and Modifying Cache Partitions" for more information on partitions.

You can set the cache size on the Cache Specifics form.



Note

You might encounter problems with caching if the file system where the cache root resides has less disk space than the cache size you specify. Also, note that expanding the cache size requires a hard restart (shutdown and restart) for the changes to take effect.





Caution

Changing the cache structure after installation requires that you reformat the structure and relocate existing files, causing any alterations to be time-consuming. If you aren't sure what cache size to use, use 2GB as the default value in the installation forms (this default can hold more than 2GB of data and can be used with 3GB to 5GB caches).



Editing the Cache Capacity

You can edit the cache capacity through the Cache Specifics form as well as on the Cache Administration Operations form. For more information on editing the cache capacity, see Setting the Cache Capacity.

Caching HTTP Documents

Internally, caching HTTP documents differs from caching FTP and Gopher documents. HTTP documents offer caching features that documents of the other protocols do not. However, by setting up and configuring the cache properly, you can ensure that your proxy server will cache HTTP, FTP, and Gopher documents effectively.

All HTTP documents have a descriptive header section that the proxy server uses to compare and evaluate the document in the proxy cache and the document on the remote server. When the proxy does an up-to-date check on an HTTP document, the proxy sends one request to the server that tells the server to return the document if the version in the cache is out of date. Often, the document hasn't changed since the last request and therefore is not transferred. This method of checking to see if an HTTP document is up-to-date saves bandwidth and decreases latency.

To reduce transactions with remote servers, the proxy server allows you to set a Cache Expiration setting for HTTP documents. The Cache Expiration setting tells the proxy to estimate if the HTTP document needs an up-to-date check before sending the request to the server. The proxy makes this estimate based on the HTTP document's Last-Modified date found in the header.

With HTTP documents, you can also use a Cache Refresh setting. This option specifies whether the proxy always does an up-to-date check (which would override an Expiration setting) or if the proxy waits a specific period of time before doing a check. Table 9-1 shows what the proxy does if both an Expiration setting and a Refresh setting are specified. Using the Refresh setting decreases latency and saves bandwidth considerably.

Table 9-1    Using the Cache Expiration and Cache Refresh settings with HTTP

Refresh setting

Expiration setting

Results

Always do an up-to-date check

(Not applicable)

Always do an up-to-date check

User-specified interval

Use document's "expires" header

Do an up-to-date check if interval expired

Estimate with document's Last-Modified header

Smaller value* of the estimate and expires header

* Using the smaller value guards against getting stale data from the cache for documents that change frequently.

Setting the HTTP Cache Refresh Interval

If you decide that you want your proxy server to cache HTTP documents, you need to determine whether it should always do an up-to-date check for documents in the cache or if it should check based on a Cache Refresh setting (up-to-date check interval). For HTTP documents, a reasonable refresh interval would be four to eight hours, for example. The longer the refresh interval, the fewer the number of times the proxy connects with remote servers. Even though the proxy doesn't do up-to-date checking during the refresh interval, users can force a refresh by clicking the Reload button in the client (such as Netscape Navigator); this action makes the proxy force an up-to-date check with the remote server.

You can set the refresh interval for HTTP documents on either the Cache Specifics form or the Cache Configuration form. The Cache Specifics form allows you to configure global caching procedures, and the Cache Configuration form allows you to control caching procedures for specific URLs and resources. For more information on using the Cache Specifics form, see Setting Cache Specifics, and for more information on using the Cache Configuration form, see Configuring the Cache.

Setting the HTTP Cache Expiration Policy

You can also set up your server to check if the cached document is up-to-date by using a last-modified factor or explicit expiration information only.

Explicit expiration information is a header found in some HTTP documents that specifies the date and time when that file will become outdated. Not many HTTP documents use explicit Expires headers, so it's better to estimate based on the Last-modified header.

If you decide to have your HTTP documents cached based upon the Last-modified header, you need to select a fraction to use in the expiration estimation. This fraction, known as the LM factor, is multiplied by the interval between the last modification and the time that the last up-to-date check was performed on the document. The resulting number is compared with the time since the last up-to-date check. If the number is smaller than the time interval, the document is not expired. Smaller fractions make the proxy check documents more often. For example, suppose you have a document that was last changed ten days ago. If you set the last-modified factor to 0.1, the proxy interprets the factor to mean that the document is probably going to remain unchanged for one day (10 * 0.1 = 1). The proxy would, in that case, return the document from the cache if the document was checked less than a day ago.

In this same example, if the cache refresh setting for HTTP documents is set to less than one day, the proxy does the up-to-date check more than once a day. The proxy always uses the value (cache refresh or cache expiration) that requires that it update the files more frequently.

You can set the expiration setting for HTTP documents on either the Cache Specifics form or the Cache Configuration form. The Cache Specifics form allows you to configure global caching procedures and the Cache Configuration form allows you to control caching procedures, for specific URLs and resources. For more information on using the Cache Specifics form, see Setting Cache Specifics, and for more information on using the Cache Configuration form, see Configuring the Cache.

Reporting HTTP Accesses to the Remote Server

When a document is cached by iPlanet Web Proxy Server, it can be accessed many times before it is refreshed again. For the remote server, sending one copy to the proxy that will cache it represents only one access, or "hit." iPlanet Web Proxy Server can count how many times a given document is accessed from the proxy cache between up-to-date checks and then send that hit count back to the remote server in an additional HTTP request header (Cache-Info) the next time the document is refreshed. This way, if the remote server is configured to recognize this type of header, it receives a more accurate account of how many times a document is accessed.

You can enable HTTP access reporting on the Cache Specifics form. For more information on using the Cache Specifics form, see Setting Cache Specifics.

Caching FTP and Gopher Documents

FTP and Gopher do not include a method for checking to see if a document is up-to-date. Therefore, the only way to optimize caching for FTP and Gopher documents is to set a Cache Refresh interval. The Cache Refresh interval is the amount of time the proxy server waits before retrieving the latest version of the document from the remote server. If you do not set a Cache Refresh interval, the proxy will retrieve these documents even if the versions in the cache are up-to-date.

Setting FTP and Gopher Cache Refresh Intervals

If you are setting a cache refresh interval for FTP and Gopher, choose one that you consider safe for the documents the proxy gets. For example, if you store information that rarely changes, use a high number (several days). If the data changes constantly, you'll want the files to be retrieved at least every few hours. During the refresh time, you risk sending an out-of-date file to the client. If the interval is short enough (a few hours), you eliminate most of this risk while getting noticeably faster response time.

You can set the cache refresh interval for FTP and Gopher documents on either the Cache Specifics form or the Cache Configuration form. The Cache Specifics form allows you to configure global caching procedures, and the Cache Configuration form allows you to control caching procedures for specific URLs and resources. For more information on using the Cache Specifics form, see Setting Cache Specifics, and for more information on using the Cache Configuration form, see Configuring the Cache.



Note

If your FTP and Gopher documents vary widely (some change often, others rarely), use the Cache Configuration form to create a separate template for each kind of document (for example, create a template with resources ftp://.*.gif) and then set a refresh interval that is appropriate for that resource.



Configuring the Cache

You can configure the kind of caching you want for specific resources, using the Caching Configuration form. You can specify several configuration parameter values for URLs matching the regular expression pattern that you specify. This feature gives you fine control of the proxy cache, based on the type of document cached. Configuring the cache can include identifying the following items:

  • The cache default
  • How to cache pages that require authentication
  • How to cache queries
  • The minimum and maximum cache file sizes
  • When to refresh a cached document
  • The cache expiration policy
  • The caching behavior for client interruptions
  • The caching behavior for failed connections to origin servers


  • Note

    If you set the cache default for a particular resource to either Derived configuration or Don't cache, the cache configuration options will not appear on the Caching Configuration form. However, if you choose a cache default of Cache for a resource, you can specify several other configuration items.



To configure the cache:

  1. In the Server Manager, choose Caching|Configuration.
  2. The Caching Configuration form appears.

  3. Select the resource you are editing by either choosing it from the Editing pull-down menu or by clicking the Regular Expression button, entering a regular expression, and clicking OK.
  4. For more information on regular expressions, see "Understanding Regular Expressions".

  5. Change the configuration information.
  6. Click OK.

The following sections describe the items listed on the Caching Configuration form. These sections include information that will help you to determine which configuration will best suit your needs.

Setting the Cache Default

The proxy server allows you to identify a cache default for specific resources. A resource is a type of file that matches certain criteria that you specify. For instance, you may want your server to automatically cache all documents from the domain company.com. If so, click the Regular Expression button on the top of the Configuration form and, in the field that appears, enter

[a-z] *://[^/:]\.company\.com.*.

Then click the Cache radio button. Your server automatically caches all cacheable documents from that domain. For more information on regular expressions, see "Understanding Regular Expressions".



Note

If you set the cache default for a particular resource to either Derived configuration or Don't cache, it is not necessary to configure the cache for that resource. However, if you choose a cache default of Cache for a resource, you can specify several other configuration items. For a list of these items, see Configuring the Cache.



You can set the cache default for any resource on the Cache Configuration form. The cache default for HTTP, FTP, and Gopher can also be set on the Cache Specifics form.

Caching Pages Retrieved Using HTTPS

You can choose to have your server cache files that are retrieved using HTTPS. Because documents that are retrieved using HTTPS are secure, they have to be encrypted by the remote server and then decrypted by the proxy before they are viewed by the client. This process can sometimes slow document retrieval. If clients frequently request a secure document through your proxy, you may want to store it in the cache. By storing the document in the cache, you avoid the encryption and decryption process, minimizing the time it takes to retrieve the document.

If you do not enable the caching of HTTPS documents, the proxy assumes the default, which is to not cache them.

You can set the policy for caching pages retrieved using HTTPS on the Cache Configuration form.

Caching Pages that Require Authentication

You can have your server cache files that require user authentication. If you choose to have your proxy server cache these files, it tags the files in the cache so that if a user asks for them, it knows that the files require authentication from the remote server.

Because the proxy server does not know how remote servers authenticate and it does not know users' IDs or passwords, it will simply force an up-to-date check with the remote server each time a request is made for a document that requires authentication. The user therefore must enter an ID and password to gain access to the file. If the user has already accessed that server earlier in the Navigator session, Navigator automatically sends the authentication information without prompting the user for it.

If you do not enable the caching of pages that require authentication, the proxy assumes the default, which is to not cache them.

You can set the policy for caching pages that require authentication on the Cache Configuration form.

Caching Queries

Cached queries only work with HTTP documents. You can limit the length of queries that are cached, or you can completely inhibit caching of queries. The longer the query, the less likely it is to be repeated, and the less useful it is to cache.

These caching restrictions apply for queries: the access method has to be GET, the document must not be protected (unless caching of authenticated pages is enabled), and the response must have at least a Last-modified header. This requires the query engine to indicate that the query result document can be cached. If the Last-modified header is present, the query engine should support a conditional GET method (with an If-modified-since header) in order to make caching effective; otherwise it should return an Expires header.

If you do not enable the caching of queries, the proxy assumes the default, which is to not cache them.

You can set the query cache policy on the Cache Configuration form.

Setting the Minimum and Maximum Cache File Sizes

You can set the minimum and maximum sizes for files cached by your proxy server. You may want to set a minimum size if you have a fast network connection. If your connection is fast, small files may be retrieved so quickly that it is not necessary for the server to cache them. In this instance, you would want to cache only larger files. You may want to set a maximum file size to make sure that large files do not occupy too much of your proxy's disk space.

You can set the minimum and maximum cache file sizes on the Cache Configuration form.

Setting the Cache Behavior for Client Interruptions

If a document is only partly retrieved and the client interrupts the data transfer, the proxy has the ability to finish retrieving the document for the purpose of caching it. The proxy's default is to finish retrieving a document for caching if at least 25 percent of it has already been retrieved. Otherwise, the proxy terminates the remote server connection and removes the partial file. You can raise or lower the client interruption percentage on the Cache Configuration form.

Setting the Cache Behavior for Failed Origin Server Connections

If an up-to-date check on a stale document fails because the origin server is unreachable, you can specify whether the proxy sends the stale document from the cache. You can specify the failure to connect to server behavior on the Cache Configuration form.

Adding and Modifying Cache Partitions

Cache partitions are reserved parts of disks or memory that are set aside for caching purposes. The largest cache capacity is 64GB with 256 cache sections. If your caching capacity changes, you may want to change or add partitions using the Cache Partition Configuration form. From this form, you can edit a partition's location, mnemonic name, and maximum and minimum sizes. You can also view the cache section table for that partition.

To add cache partitions:

  1. In the Server Manager, choose Caching|Partitions.
  2. The Cache Partition Table appears.

  3. Click the Add Cache Partition button.
  4. Enter the appropriate values for the new partition.
  5. Restart the proxy from the command line by going to the proxy directory and typing ./restart.

To modify cache partitions,

  1. In the Server Manager, choose Caching|Partitions.
  2. The Cache Partition Table appears.

  3. Click on the name of the partition that you would like to change.
  4. Edit the information.
  5. Click Change.
  6. Restart the proxy from the command line by going to the proxy directory and typing ./restart.

Adding and Modifying Cache Sections

The proxy cache is separated into one or more cache sections. You can have up to 256 sections. The number of cache sections must be a power of two (for example, 1, 2, 4, 8, 16, ..., 256).

Each cache section can hold 100MB to 250MB of data; the optimum size is around 125MB per section. This means that if you pick a cache capacity of 500MB, the installer will create 4 cache sections (500 ÷ 125 = 4); if you choose a cache capacity of 2GB, the installer creates 16 sections (2000 ÷ 125 = 16). The smallest available capacity is 125MB with a single cache section. The largest capacity is 32GB (optimum) with 256 cache sections which can hold up to 64GB of data.

To add or modify cache sections:

  1. In the Server Manager, choose Caching|Sections. The Cache Section Table appears.
  2. Change the information in the table.
  3. Click Make These Changes.
  4. Restart the proxy from the command line by going to the proxy directory and typing ./restart.

Setting the Cache Capacity

Cache capacity is directly related to the cache hierarchy in the cache directories. The larger the hierarchy, the bigger the capacity. The cache capacity should be equal to or greater than the cache size. Setting the capacity larger than the cache size can be helpful if you know that you plan to increase the cache size later (such as by adding an external disk).

To set the cache capacity:

  1. In the Server Manager, choose Caching|Capacity.
  2. The Cache Administrative Operations form appears.

  3. Choose a capacity from the Capacity pull-down menu.
  4. Click Change Capacity.
  5. Restart the proxy from the command line by going to the proxy directory and typing ./restart.

Or

  1. In the Server Manager, choose Caching|Specifics.
  2. The Cache Specifics form appears.

  3. Click the word edit that appears next to Cache capacity.
  4. Choose a capacity from the Capacity pull-down menu.
  5. Click Change Capacity.
  6. Restart the proxy from the command line by going to the proxy directory and typing ./restart.

Enabling the Cache Monitor and Manager

The proxy program spawns two extra copies of itself to perform cache management. These two processes are the Cache Monitor and Cache Manager. The Cache Monitor receives data from the server process pool about cache activity and maintains information about its size and other aspects. It occasionally triggers the Cache Manager to do the actual cache clean-up tasks.

If the Cache Manager process is accidentally killed, it starts again automatically. The Cache Manager daemon uses the same configuration file as the proxy server.

You can disable the Cache Manager and Monitor if you plan to perform cache maintenance with an external program. Otherwise, the Cache Manager and Monitor should be enabled.

By accessing Cache Manager information, you can view all cached URLs, control caching for specific documents, and see an estimated size of the current cache structure. You can explicitly expire documents in the cache (so that the next time they are accessed, the proxy does an up-to-date check to determine if the document in the cache needs to be refreshed) and you can remove documents from the cache. For more information on accessing Cache Manager information, Accessing Cache Manager Information.

To enable or disable the Cache Monitor and Manager:

  1. In the Server Manger, choose Caching|Special.
  2. The Special Cache Configuration form appears.

  3. Click the appropriate button to either enable or disable the Cache Manager and Monitor.
  4. Click OK.

Accessing Cache Manager Information

You can view the names and attributes of all cached URLs through the Cache Manager information. Cache Manager information is a list of all cached documents grouped by access protocol and site name. This list is stored in the directory that you specify on the Cache Specifics form. You can limit the URLs you view in the list by typing a domain name into the Search field. By accessing this information, you can perform various cache management functions such as expiring and removing documents from the cache.

To access cache manager information,

  1. In the Server Manager, choose Caching|Cache Management.
  2. Enter a DNS domain name in the Search field and click the Search button, or select a domain name from the list. A list of subdomains in that domain appears.
  3. Click on the name of a subdomain. A list of the hosts in that subdomain appears.
  4. Click on the name of a host. A list of all of URLs appears.
  5. Click on the name of a URL. Detailed information about that URL appears.


  6. Note

    Because continually recording URLs slows the proxy's performance, you do not have to enable URL recording to access Cache Manager information. To access this information without effecting performance, you can run the command line program: extras/proxy/urldbgen. This program generates a list of cached URLs on command. Once you have generated this list you can use the Cache Management form to access and manage the cache.



Caching Local Hosts

If a URL requested from a local host lacks a domain name, the proxy server will not cache it in order to avoid duplicate caching. For example, if a user requests http://machine/filename.html and http://machine.netscape.com/filename.html from a local server, both URLs might appear in the cache. Because these files are from a local server, they may be retrieved so quickly that it is not necessary to cache them anyway.

However, if your company has servers in many remote locations, you may want to cache documents from all hosts to reduce network traffic and decrease the time needed to access the files.

To enable the caching of local hosts,

  1. In the Server Manager, choose Caching|Cache Local Hosts.
  2. Select the resource you are editing by either choosing it from the Editing pull-down menu or by clicking the Regular Expression button and entering the name of the resource to edit.
  3. For more information on regular expressions, see "Understanding Regular Expressions".

  4. Click the enabled button.
  5. Click OK.

Using Cache Batch Updates

The Cache Batch Update feature allows you to pre-load files in a specified web site or do an up-to-date check on documents already in the cache whenever the proxy server is not busy. From the Cache Batch Updates form, you can create, edit, and delete batches of URLs and enable and disable batch updating.

Creating a Batch Update

You can actively (as opposed to on-demand) cache files by specifying files to be batch updated. The proxy server allows you to perform an up-to-date check on several files currently in the cache or pre-load multiple files in a particular web site.

To create a batch update:

  1. In the Server Manager, choose Caching|Batch Updates.
  2. The Cache Batch Updates form appears.

  3. Select New and Create from the pull-down menus next to "Select a configuration to edit".
  4. Click OK.
  5. A new Cache Batch Update form appears.

  6. In the Name section of the form, enter a name for the new batch update entry.
  7. In the Source section of the form, click the radio button for the type of batch update that you want to create. Click the first radio button if you want to perform an up-to-date check on all documents in the cache. Click the second radio button if you want to cache URLs recursively starting from the given source URL.
  8. In the Source section fields, identify the documents that you want to use in the batch update.
  9. In the Exceptions section, identify any files that you would like to exclude from the batch update.
  10. In the Resources section, enter the maximum number of simultaneous connections and the maximum number of documents to traverse.
  11. In the Timing section, enter the start and end times for the generation of the batch update. Only one batch update can be active at any time, so it is best to not overlap other batch update configurations.
  12. Click OK.


  13. Note

    You can create, edit, and delete batch update configurations without having batch updates turned on. However, if you want your batch updates to be updated according to the times you set on the Cache Batch Updates form, you must turn updates on.



Editing or Deleting a Batch Update Configuration

You can edit or delete batch updates using the Cache Batch Updates form. You may want to edit a batch update if you need to exclude certain files or want to update the batch more frequently. You may also want to delete a batch update configuration completely.

To edit or delete a batch update configuration:

  1. In the Server Manager, choose Caching|Batch Updates. The Cache Batch Updates form appears.
  2. If you want to edit a batch, select the name of that batch and "Edit" from the pull-down menus next to "Select a configuration to edit." If you want to delete a batch, select the name of that batch and "Delete" from the pull-down menus.
  3. Click OK. The Cache Batch Updates form appears.
  4. Modify the information as you wish.
  5. Click OK.

Using the Cache Command Line Utilities

The proxy server comes with several command line utilities that let you configure, change, generate, and repair your cache directory structure. Most of these utilities are duplications of the Server Manager forms, but you might want to use the utilities if you need to schedule the maintenance (for example, as a cron job). All of the utilities are located in the extras/proxy directory. The following sections describe the various utilities.

Building the Cache Directory Structure

The utility cbuild creates a single directory structure for the proxy's cache. After creating the directory structure, you can use the Server Manager forms to enable the proxy to use the newly created cache.

To run the cbuild utility, at the command line, enter:

cbuild -d conf-dir -s user
cbuild -c cache-dir -u urldb-dir -s user

where:

  • conf-dir is the directory where the proxy server instance is installed. For example, the proxy server directory could be /usr/ns-home/proxy-id. The utility determines the cache directory and location of the cache database based on the information in the directory you enter.
  • user is the user account that the created files and directories should be owned by if running cbuild as root. This user ID should be the same user ID that the proxy is running as.
  • cache-dir is the directory for your cache structure.
  • urldb-dir is the directory where the cache management information is located.
  • cbuild is located in the extras/proxy directory.

Upgrading the Cache Structure

If you have upgraded your existing 1.1 or 2.0 proxy server, you should upgrade the cache separately. Depending on the size of your cache, a cache upgrade can be a time-consuming process. You can upgrade a version 1.1 or 2.0 cache directory structure and all of its files. The cupgrade utility for upgrading a 1.1 structure, moves all of the files from the old directories to the new 3.5 directory structure. The cupgrade utility for upgrading a 2.0 cache, works in place and simply modifies the existing 2.0 cache so that it is in a 3.5 format.



Note

The 2.5 proxy uses the same cache structure as 3.5, so you will not need to upgrade it. These instructions only apply to upgrading a 1.1 or 2.0 cache structure.



Before you upgrade a 1.1 cache structure, you must make sure you have a 3.5 structure. If you installed the proxy server by using the upgrade utility and enabling caching, then you already have a cache structure. If you don't have a cache directory structure, use the cbuild utility before running the cupgrade utility. If you are upgrading a 2.0 cache structure, you should not have a 3.5 cache. After upgrade, you should replace the 3.5 cache with the old cache.

cupgrade is located in the extras/proxy directory.

Upgrading a 1.1 Cache Structure

If you are using the cupgrade utility to upgrade a 1.1 cache structure, enter the following at the command line:

cupgrade -d conf-dir -o 1.1-cache-root -s user

conf-dir is the directory where the proxy server is installed. For example, the directory could be /usr/ns-home/proxy-id. The utility determines the new cache directory and location of the cache database based on the configuration files found in the directory you enter.

1.1-cache-root is the directory of the version 1.1 cache structure.

user is the Unix user ID that the files in the cache should be owned as. It is optional and should be included only if you run the cupgrade utility as root and your proxy as another user. For example, you could run cupgrade as root and your proxy as nobody. In this case you would replace <user> with nobody.



Note

Specifying user as nobody will not work on some systems, such as HP-UX. When using these systems, you must specify a user other than nobody for both the proxy and for cupgrade.



The cache upgrade can take anywhere from a few minutes to several hours depending on the size of the old cache structure.

Upgrading a 2.0 Cache Structure

If you are using the cupgrade utility to upgrade a 2.0 cache structure, enter the following at the command line:

cupgrade sect sect ... sect

The 2.0 upgrade should be run in the cache directory where all of the cache sections reside.

sect is a section in the cache that you want to upgrade. The number of sect calls depends upon how many sections are in the cache.

For example, if your cache directory is: /usr/ns-home/cache and you have a 1GB cache, you would then have 8 sections in your cache directory. You should type the following at the command line:

cd /usr/ns-home/cache
cupgrade s3.0 s3.1 s3.2 s3.3 s3.4 s3.5 s3.6 s3.7

Instead of typing each section, you could simply use s* to pass all of the section directory names. In this instance, you would type the following:

cd /usr/ns-home/cache
cupgrade s*

If you have multiple cache partitions you need to run an upgrade utility for each partition. For example, your cache directory may be /usr/ns-home/cache and you have a 2GB cache, 16 sections, and 2 partitions (with 8 sections on each partition). The partitions are /disk1/cache-1 and /disk2/cache-2. The syntax for the cupgrade utility would then be:

cd /usr/ns-home/cache/disk1/cache-1
cupgrade s4.00 s4.01 s4.02 s4.03 s4.04 s4.05 s4.06 s4.07

cd /usr/ns-home/cache/disk2/cache-2
cupgrade s4.08 s4.09 s4.10 s4.11 s4.12 s4.13 s4.14 s4.15

You could also upgrade all sections on both partitions by typing the following at the command line:

cupgrade /disk1/cache-1/s* /disk2/cache-2/s*

The cache upgrade can take anywhere from a few minutes to several hours depending on the size of the old cache structure.

Repairing the Cache URL List

The proxy has a utility called urldbgen that goes through the entire cache directory structure and repairs the Cache Manager's URL list. Use this utility if your Cache Manager's URL list appears damaged when viewed through the Cache Management form (for example, if the URL list doesn't seem to contain all of the URLs that you know are cached or if the Cache Manager claims that the cache is empty or corrupt). You may also want to run this utility if you have disabled URL recording for the sake of performance, but want to generate a URL list on command.

The urldbgen utility is located in the extras/proxy directory.

You can invoke the urldbgen utility in one of two ways. The first way is:

urldbgen -d conf-dir -s user

conf-dir is the directory where the proxy server is installed. For example, the directory could be /usr/ns-home/proxy-id. The utility determines the cache directory and location of the cache database based on the information in the directory you enter.

user is the is the user account that the created files and directories should be owned by if running urldbgen as root. This user ID should be the same user ID that the proxy is running as.

The second way you can run urldgben is:

urldbgen -c cache-dir -u urldb-dir -s user

cache-dir is the directory for your cache structure.

urldb-dir is the directory where the cache URLs are recorded.

user is the user account that the created files and directories should be owned by if running urldbgen as root. This user ID should be the same user ID that the proxy is running as.



Note

Running the URL list repair utility can take anywhere from a few seconds to a couple of hours to complete depending on the size of the cache and the speed and load of your machine and its disks.



The URL list is rarely corrupted. The only way that URL list corruption could occur is if something prevents the proxy from updating its URL list after it has completed writing a file to the cache. This could happen if the disk is full, if the proxy users' permissions prevent the proxy from writing to the list file, or if the system suddenly goes down. The URL list is located in the urldb directory under the cache root directory.This utility can recreate the entire URL list from scratch if it is accidentally deleted.

Cleaning the URL List

The proxy server has a command line utility called urldbgc that goes through the URL database and purges any old files. It is good to run this utility if, for some reason, the database is out of sync with the actual files in the cache. You can run this utility as a cron job and schedule it for the lowest peak time for your proxy server.

To clean the URL list using the urldbgc utility, type the following at the command line:

urldbgc -d conf-dir -s user
urldbgc -c cache-dir -u urldb-dir -s user

conf-dir is the directory where the proxy server is installed. For example, the directory could be /usr/ns-home/proxy-id. The utility determines the cache directory and location of the cache database based on the directory you enter.

user is the user account that the created files and directories should be owned by if running urldbgc as root. This user ID should be the same user ID that the proxy is running as.

cache-dir is the directory for your cache structure.

urldb-dir is the directory where the cache URL database is kept



Note

If you do not want to garbage collect, but you want to fully delete all of the files in your cache, type the following at the command line:

cd proxy directory/cache
find s* -type f -exec rm {} \.;

where proxy directory is the directory where your proxy is stored.



Routing through Proxy Arrays

Proxy arrays for distributed caching allow multiple proxies to serve as a single cache. In other words, each proxy in the array will contain different cached URLs that can be retrieved by a browser or downstream proxy server. Proxy arrays prevent the duplication of caches that often occurs with multiple proxy servers. Through hash-based routing, proxy arrays route requests to the correct cache in the proxy array.

Proxy arrays also allow incremental scalability. In other words, if you decide to add another proxy to your proxy array, each member's cache is not invalidated. Only 1/n of the URLs in each member's cache, where n is the number of proxies in your array, will be reassigned to other members.

For each request through a proxy array, a hash function assigns each proxy in the array a score that is based on the requested URL, the proxy's name and the proxy's load factor. The request is then routed to the proxy with the highest score.

Since requests for URLs can come from both clients and proxies, there are two types of routing through proxy arrays: client to proxy routing and proxy to proxy routing.

In client to proxy routing, the client uses the Proxy Auto Configuration (PAC) mechanism to determine which proxy to go through. However, instead of using the standard PAC file, the client uses a special PAC file which computes the hash algorithm to determine the appropriate route for the requested URL. Figure 9-3 shows client to proxy routing. For more information about the PAC file, see Chapter 11 "Using the Client Autoconfiguration File." The proxy server can automatically generate the special PAC file from the Proxy Array Membership Table (PAT) specifications made through the administration interface.

In proxy to proxy routing, proxies use a PAT (Proxy Array Table) file to compute the hash algorithm instead of the PAC file used by clients. The PAT file is an ASCII file that contains information about a proxy array, including the proxies' machine names, IP addresses, ports, load factors, cache sizes, etc. For computing the hash algorithm at the server, it is much more efficient to use a PAT file than a PAC file (which is a JavaScript file that has to be interpreted at run-time). However, most clients do not recognize the PAT file format, and therefore, must use a PAC file. Figure 9-4 shows proxy to proxy routing.

The PAT file will be created on one proxy in the proxy array - the master proxy. The proxy administrator must determine which proxy will be the master proxy. The administrator can change the PAT file from this master proxy server and all other members of the proxy array can then manually or automatically poll the master proxy for these changes. You can configure each member to automatically generate a PAC file from these changes.

You can also chain proxy arrays together for hierarchical routing. If a proxy server routes an incoming request through an upstream proxy array, the upstream proxy array is then known as a parent array. A parent array is a proxy array that a proxy server goes through. In other words, if a client requests a document from Proxy X, and Proxy X does not have the document, it sends the request to Proxy Array Y instead of sending it directly to the remote server. So, Proxy Array Y is a parent array. In Figure 9-4, Proxy Array 1 is a parent array to Proxy Array 2.

All of the proxy servers in a proxy array should be in a single administrative domain. Two proxy arrays in separate administrative domains can communicate, however if the requesting proxy can retrieve cached URLs from more than one proxy array, ICP should be used to determine which array to go to.

Figure 9-3    Client to Proxy Routing

Figure 9-4    Proxy to Proxy Routing

To set up a proxy array:

  1. From the master proxy, create the member list. For more information on creating the member list, see Creating a Proxy Array Member List.
  2. From the master proxy, create a PAT mapping to map the URL "/pat" to the PAT file. For information on creating a PAT mapping, see "Proxying and Routing URLs".
  3. Configure each non-master member of the array. For more information on configuring non-master members, see Configuring Proxy Array Members.
  4. Enable routing through a proxy array. For more information on enabling routing through a proxy array, see Enabling Routing Through a Proxy Array.
  5. Enable your proxy array. For more information on enabling a proxy array, see Enabling a Proxy Array.
  6. Generate a PAC file from your PAT file. You only need to generate a PAC file if you are using client to proxy routing. For more information on generating a PAC file from a PAT file, see Generating a PAC File from a PAT File.


  7. Note

    If your proxy array is going to route through a parent array, you also need to enable the parent array and configure each member to route through a parent array for desired URLs. For more information on parent arrays, see Routing Through a Parent Array.



Creating a Proxy Array Member List

You should create and update the proxy array member list from the master proxy of the array only. You only need to create the proxy array member list once, but you can modify it at any time. By creating the proxy array member list, you are generating the PAT file to be distributed to all of the proxies in the array and to any downstream proxies.



Caution

You should only make changes or additions to the proxy array member list through the master proxy in the array. All other members of the array can only read the member list.



  1. From the Server Manager, choose Caching|Proxy Array Configuration. The Proxy Array Configuration form appears.
  2. In the Array name field, enter the name of the array.
  3. In the "Reload Configuration Every" field, enter the number of minutes between each polling for the PAT file.
  4. Click OK.


  5. Note

    Be sure to click OK before you begin to add members to the member list.



  6. Click the Add button. The Proxy Array Member form appears.
  7. For each member in the proxy array, enter the following and then click OK:
    • Name - the name of the proxy server you are adding to the member list
    • IP Address - the IP address of the proxy server you are adding to the member list
    • Port - This is the port on which the member polls for the PAT file.
    • Load Factor - an integer that reflects the relative load that should be routed through the member.
    • Status - the status of the member. This value can be either on or off. If you disable a proxy array member, the member's requests will be re-routed through another member.


    • Note

      Be sure to click OK after you enter the information for each proxy array member you are adding.



Deleting Proxy Array Members

Deleting proxy array members will remove them from the proxy array. You can only delete proxy array members from the master proxy.



Caution

You should only make changes or additions to the proxy array member list through the master proxy in the array. If you modify this list from any other member of the array, all changes will be lost.



To delete members of a proxy array:

  1. From the Server Manager, choose Caching|Proxy Array Configuration. The Proxy Array Configuration form appears.
  2. In the Member List, select the radio button next to the member that you want to delete.
  3. Click the Delete Button.


  4. Note

    If you want your changes to take effect and to be distributed to the members of the proxy array, you need to update the Configuration ID on the Proxy Array Configuration form and click OK. To update the configuration ID, you can simply increase it by one.



Editing Proxy Array Member List Information

At any time, you can change the information for the members in the proxy array member list. You can only edit the proxy array member list from the master proxy.



Caution

You should only make changes or additions to the proxy array member list through the master proxy in the array. If you modify this list from any other member of the array, all changes will be lost.



To edit member list information for any of the members in a proxy array:

  1. From the Server Manager, choose Caching|Proxy Array Configuration. The Proxy Array Configuration form appears.
  2. In the Member List, select the radio button next to the member that you want to edit.
  3. Click the Edit Button. The Proxy Array Member form appears.
  4. Edit the appropriate information.
  5. Click OK.


  6. Note

    If you want your changes to take effect and to be distributed to the members of the proxy array, you need to update the Configuration ID on the Proxy Array Configuration form and click OK. T update the configuration ID, you can simply increase it by one.



Configuring Proxy Array Members

You only need to configure each member in the proxy array once, and you must do so from the member itself. You cannot configure a member of the array from another member. You also need to configure the master proxy.

You should follow this process for each member of the array:

  1. From the Server Manager, choose Caching|Member Configuration. The Proxy Array Member Configuration form appears.
  2. In the Proxy Array section, indicate whether or not the member needs to poll for the PAT file by selecting the appropriate radio button. The choices are:
    • Non-master member - You should select this option if the member you are configuring is not the master proxy. Any proxy array member that is not a master proxy will need to poll for the PAT file in order to retrieve it from the master proxy.
    • Master member - You should select this option if you are configuring the master proxy. If you are configuring the master proxy, the PAT file is local and does not need to be polled.

  3. If, in Step 2, you chose "Don't Poll", Click OK, you are finished with this form. If you chose "Poll for PAT file", continue with Step 4.
  4. In the Poll Host field, enter the name of the master proxy that you will be polling for the PAT file.
  5. In the Port field, enter the port at which the master proxy accepts HTTP requests.
  6. In the URL field, enter the URL of the PAT file on the master proxy. If on your master proxy, you have created a PAT mapping to map the PAT file to the URL "/pat," you should enter "/pat" into this URL field.
  7. In the Headers File field, enter the full pathname for a file with any special headers that must be sent with the HTTP request for the PAT file (such as authentication information). This field is optional.
  8. Click OK.

Enabling Routing Through a Proxy Array

To enable routing through a proxy array:

  1. From the Server Manager, choose Routing|Routing. The Routing Configuration form appears.
  2. Select the resource you want to route by either choosing it from the Editing pull-down menu or clicking the Regular Expression button, entering a regular expression, and clicking OK.
  3. Select the radio button next to the text "Route through".
  4. Select the checkboxes for proxy array and/or parent array.


  5. Note

    You can only enable proxy array routing if the proxy server you are configuring is a member of a proxy array. You can only enable parent routing if a parent array exists. Both routing options are independent of eachother.



  6. If you choose to route through a proxy array and you want to redirect requests to another URL, select the redirect checkbox. Redirecting means that if a member of a proxy array receives a request that it should not service, it tells the client which proxy to contact for that request.


  7. Caution

    Redirect is not currently supported by any clients, so you should not use the feature at this time.



  8. Click OK.

Enabling a Proxy Array

To enable a proxy array:

  1. From the Server Manager, choose Server Preferences|System Specifics. The System Specifics form appears.
  2. Select the Yes radio button for the type of array or arrays you want to enable - either a normal proxy array or a parent array.


  3. Note

    If you are not routing through a proxy array, you should make sure that all clients use a special PAC file to route correctly before you disable the proxy array option. If you disable the parent array option, you should have valid alternative routing options set in the Routing form, such as explicit proxy or a direct connection.



  4. Click OK.

Redirecting Requests in a Proxy Array

If you choose to route through a proxy array, you need to designate whether you want to redirect requests to another URL. Redirecting means that if a member of a proxy array receives a request that it should not service, it tells the client which proxy to contact for that request.



Caution

Redirect is not currently supported by any clients, so you should not use the feature at this time.



Generating a PAC File from a PAT File

Because most clients do not recognize the PAT file format, the clients in client to proxy routing use the Proxy Auto Configuration (PAC) mechanism to receive information about which proxy to go through. However, instead of using the standard PAC file, the client uses a special PAC file derived from the PAT file. This special PAC file computes the hash algorithm to determine the appropriate route for the requested URL.

You can manually or automatically generate a PAC file from the PAT file. If you manually generate the PAC file from a specific member of the proxy array, that member will immediately re-generate the PAC file based on the information currently in the PAT file. If you configure a proxy array member to automatically generate a PAC file, the member will automatically re-generate the file after each time it detects a modified version of the PAT file.



Note

If you are not using the proxy array feature for your proxy server, then you should use the Proxy Client Autoconfiguration form to generate your PAC file. For more information see Chapter 11 "Using the Client Autoconfiguration File."



Manually Generating a PAC File from a PAT File



Note

The PAC file can be generated only from the master proxy.



To manually generate a PAC file from a PAT file:

  1. From the Server Manager of the master proxy, choose Caching|Proxy Array Configuration. The Proxy Array Configuration form appears.
  2. Click the Generate PAC button.
  3. The PAC Generation form appears.

  4. If you want to use custom logic in your PAC file, in the Custom Logic File field, enter the name of the file containing the customized logic you would like to include in the generation of your PAC file. This logic is inserted before the proxy array selection logic in the FindProxyForURL function. This function is typically used for local requests which need not go through the proxy array.
  5. If you have already entered the custom logic file on the Member Configuration form, this field will be populated with that information. You may edit the custom logic filename if you wish, and the changes you make will transfer to the Member Configuration form as well.

  6. In the Default Route field, enter the route a client should take if the proxies in the array are not available.
  7. If you have already entered the default route on the Member Configuration form, this field will be populated with that information. You may edit the default route if you wish, and the changes you make will transfer to the Member Configuration form as well.

  8. Click OK.

Automatically Generating a PAC File from a PAT File

To automatically generate a PAC file from a PAT file each time a change is detected:

  1. From the Server Manager, choose Caching|Member Configuration. The Member Configuration form appears.
  2. Select the checkbox next to "Auto-generate PAC file".
  3. In the Default Route field, enter the route a client should take if the proxies in the array are not available.
  4. If you have already entered and saved the default route on the Member Configuration form, this field will be populated with that information. You may edit the default route if you wish, and the changes you make will transfer to the Member Configuration form as well.

  5. If you want to use custom logic in your PAC file, in the Custom Logic File field, enter the name of the file containing the customized logic you would like to include in the generation of your PAC file. This logic is inserted before the proxy array selection logic in the FindProxyFor URL function.
  6. If you have already entered and saved the custom logic file on the Member Configuration form, this field will be populated with that information. You may edit the custom logic filename if you wish, and the changes you make will transfer to the Member Configuration form as well.

  7. Click OK.

Routing Through a Parent Array

You can configure your proxy or proxy array to route through an upstream parent array instead of going directly to a remote server. To configure a proxy or proxy array member to route through a parent array,

  1. Enable the parent array. For instructions on enabling an array, see Enabling a Proxy Array.
  2. Enable routing through the parent array. For instructions on enabling routing through an array, see Enabling Routing Through a Proxy Array.
  3. From the Server Manager, choose Caching|Member Configuration. The Proxy Array Member Configuration form appears.
  4. In the Poll Host field in the Parent Array section of the form, enter the host name of the proxy in the parent array that you will poll for the PAT file. This proxy is usually the master proxy of the parent array.
  5. In the Port field in the Parent Array section of the form, enter the Port number of the proxy in the parent array that you will poll for the PAT file.
  6. In the URL field, enter the URL of the PAT file to be polled.
  7. In the URL field, enter the URL of the PAT file on the master proxy. If on your master proxy, you have created a PAT mapping, you should enter the mapping into this URL field.
  8. In the Headers File field in the Parent Array section of the form, full pathname for a file with any special headers that must be sent with the HTTP request for the PAT file (such as authentication information). This field is optional.
  9. Click OK.

Viewing Parent Array Information

If your proxy array is routing through a parent array, you need information about the members of the parent array. This information is sent from the parent array in the form of a PAT file. The information in this PAT file is displayed on the Parent Array Configuration form.

To view parent array information,

  1. From the Server Manager, choose Caching|Parent Array Configuration. The Parent Array Configuration form appears.
  2. View the information.

Routing Through ICP Neighborhoods

The Internet Cache Protocol (ICP) is an object location protocol that enables caches to communicate with one another. Caches can use ICP to send queries and replies about the existence of cached URLs and about the best locations from which to retrieve those URLs. In a typical ICP exchange, one cache will send an ICP query about a particular URL to all neighboring caches. Those caches will then send back ICP replies that indicate whether or not they contain that URL. If they do not contain the URL, they send back a "MISS." If they do contain the URL, they send back a "HIT."

ICP can be used for communication among proxies located in different administrative domains. It allows a proxy cache in one administrative domain to communicate with a proxy cache in another administrative domain. It is effective for situations in which several proxy servers want to communicate, but cannot all be configured from one master proxy (as they are in a proxy array). Figure 9-5 shows an ICP exchange between proxies in different administrative domains.

The proxies that communicate with each other via ICP are called neighbors. You cannot have more than 64 neighbors in an ICP neighborhood. There are two types of neighbors in an ICP neighborhood, parents and siblings. Only parents can access the remote server if no other neighbors have the requested URL. Your ICP neighborhood can have no parents or it can have more than one parent. Any neighbor in an ICP neighborhood that is not a parent is considered a sibling. Siblings cannot retrieve documents from remote servers unless the sibling is marked as the default route for ICP, and ICP uses the default.

You can use polling rounds to determine the order in which neighbors receive queries. A polling round is an ICP query cycle. For each neighbor, you must assign a polling round. If you configure all neighbors to be in polling round one, then all neighbors will be queried in one cycle. In other words, they will all be queried at the same time. If you configure some of the neighbors to be in polling round 2, then all of the neighbors in polling round one will be queried first and if none of them return a "HIT," all round two proxies will be queried. The maximum number of polling rounds is two.

Since ICP parents are likely to be network bottlenecks, you can use polling rounds to lighten their load. A common setup is to configure all siblings to be in polling round one and all parents to be in polling round two. That way, when the local proxy requests a URL, the request goes to all of the siblings in the neighborhood first. If none of the siblings have the requested URL, the request goes to the parent. If the parent does not have the URL, it will retrieve it from a remote server.

Each neighbor in an ICP neighborhood must have at least one ICP server running. If a neighbor does not have an ICP server running, it cannot answer the ICP requests from their neighbors. Enabling ICP on your proxy server starts the ICP server if it is not already running.

Figure 9-5    An ICP exchange

To set up ICP, follow these steps:

  1. Add parent(s) to your ICP neighborhood. (This step is only necessary if you want parents in your ICP neighborhood.) For more information on adding parents to an ICP neighborhood, see Adding Parents to an ICP Neighborhood.
  2. Add sibling(s) to your ICP neighborhood. For more information on adding siblings to your ICP neighborhood, see Adding Siblings to an ICP Neighborhood.
  3. Configure each neighbor in the ICP neighborhood. For more information on configuring ICP neighbors, see Configuring Individual ICP Neighbors.
  4. Enable ICP. For information on enabling ICP, see Enabling ICP.
  5. If your proxy has siblings or parents in its ICP neighborhood, enable routing through an ICP neighborhood. For more information on enabling routing through an ICP neighborhood, see Enabling Routing Through an ICP Neighborhood.

Adding Parents to an ICP Neighborhood

To add parent proxies to an ICP neighborhood:

  1. From the Server Manager, choose Caching|ICP. The ICP Configuration form appears.
  2. In the Parent List section of the form, click the Add Parent button. The ICP Parent form appears.
  3. In the Machine Address field, enter the IP address or host name of the parent proxy you are adding to the ICP neighborhood.
  4. In the ICP Port field, enter the port number on which the parent proxy will listen for ICP messages.
  5. In the Multicast Address field, you can enter the multicast address to which the parent listens. A multicast address is an IP address to which multiple servers can listen. Using a multicast address allows a proxy to send one query to the network that all neighbors who are listening to that multicast address can see; therefore, eliminating the need to send a query to each neighbor separately. Using multicast is optional.


  6. Note

    Neighbors in different polling rounds should not listen to the same multicast address.



  7. In the TTL field, enter the number of subnets that the multicast message will be forwarded to. If the TTL is set to 1, the multicast message will only be forwarded to the local subnet. If the TTL is 2, the message will go to all subnets that are one level away, and so on.


  8. Note

    Multicast makes it possible for two unrelated neighbors to send ICP messages to eachother. Therefore, if you want to prevent unrelated neighbors from receiving ICP messages from the proxies in your ICP neighborhood, you should set a low TTL value in the TTL field.



  9. In the Proxy Port field, enter the port for the proxy server on the parent.
  10. From the Polling Round pull-down, choose the polling round that you want the parent to be in. The default polling round is 1. For more information on polling rounds see page 141.
  11. Click OK.

Removing Parents from an ICP Neighborhood

To remove parent proxies from an ICP neighborhood:

  1. From the Server Manager, choose Caching|ICP. The ICP Configuration form appears.
  2. Click the radio button next to the parent you want to remove.
  3. Click the Remove button.

Editing Configurations for Parents in an ICP neighborhood

To edit the machine address, port number, multicast address, time to live value, proxy port number, or polling round value for a parent proxy:

  1. From the Server Manager, choose Caching|ICP. The ICP Configuration form appears.
  2. Click the radio button next to the parent you want to edit.
  3. Click the Edit button.
  4. Modify the appropriate information.
  5. Click OK.

Adding Siblings to an ICP Neighborhood

To add sibling proxies to an ICP neighborhood:

  1. From the Server Manager, choose Caching|ICP. The ICP Configuration form appears.
  2. In the Sibling List section of the form, click the Add Sibling button. The ICP Sibling form appears.
  3. In the Machine Address field, enter the IP address or host name of the sibling proxy you are adding to the ICP neighborhood.
  4. In the Port field, enter the port number on which the sibling proxy will listen for ICP messages.
  5. In the Multicast Address field, enter the multicast address to which the sibling listens. A multicast address is an IP address to which multiple servers can listen. Using a multicast address allows a proxy to send one query to the network that all neighbors who are listening to that multicast address can see; therefore, eliminating the need to send a query to each neighbor separately.


  6. Note

    Neighbors in different polling rounds should not listen to the same multicast address.



  7. In the TTL field, enter the number of subnets that the multicast message will be forwarded to. If the TTL is set to 1, the multicast message will only be forwarded to the local subnet. If the TTL is 2, the message will go to all subnets that are one level away.


  8. Note

    Multicast makes it possible for two unrelated neighbors to send ICP messages to eachother. Therefore, if you want to prevent unrelated neighbors from receiving ICP messages from the proxies in your ICP neighborhood, you should set a low TTL value in the TTL field.



  9. In the Proxy Port field, enter the port for the proxy server on the sibling.
  10. From the Polling Round pull-down, choose the polling round that you want the sibling to be in. The default polling round is 1. For more information on polling rounds see page 141.
  11. Click OK.

Removing Siblings from an ICP Neighborhood

To remove sibling proxies from an ICP neighborhood:

  1. From the Server Manager, choose Caching|ICP. The ICP Configuration form appears.
  2. Click the radio button next to the sibling you want to remove.
  3. Click the Remove button.

Editing Configurations for Siblings in an ICP Neighborhood

To edit the machine address, port number, multicast address, time to live value, proxy port number, or polling round value for a sibling proxy:

  1. From the Server Manager, choose Caching|ICP. The ICP Configuration form appears.
  2. Click the radio button next to the sibling you want to edit.
  3. Click the Edit button.
  4. Modify the appropriate information.
  5. Click OK.

Configuring Individual ICP Neighbors

You need to configure each neighbor, or local proxy, in your ICP neighborhood.

To configure the local proxy server in your ICP neighborhood:

  1. From the Server Manager, choose Caching|ICP. The ICP Configuration form appears.
  2. In the Binding Address field, enter the IP address to which the neighbor server will bind.
  3. In the Port field, enter the port number to which the neighbor server will listen for ICP.
  4. In the Multicast Address field, enter the multicast address to which the neighbor listens. A multicast address is an IP address to which multiple servers can listen. Using a multicast address allows a proxy to send one query to the network that all neighbors who are listening to that multicast address can see; therefore, eliminating the need to send a query to each neighbor separately.
  5. If both a multicast address and bind address are specified for the neighbor, the neighbor uses the bind address to send replies and uses multicast to listen. If neither a bind address or a multicast address is specified, the operating system will decide which address to use to send the data.

  6. In the Default Route field, enter the name or IP address of the proxy to which the neighbor should route a request when none of the neighboring proxies respond with a "hit." If you enter the word "origin" into this field, or if you leave it blank, the default route will be to the origin server.


  7. Note

    If you choose "first responding parent" from the No Hit Behavior pull-down discussed in Step 7, the route you enter in the Default Route field will have no effect. The proxy only uses this route if you choose the default no hit behavior.



  8. In the second Port field, enter the port number of the default route machine that you entered into the Default Route field.
  9. From the "On no hits, route through" pull-down, choose the neighbor's behavior when none of the siblings in the ICP neighborhood have the requested URL in their caches. You can choose:
    • first responding parent - the neighbor will retrieve the requested URL through the parent that first responds with a "miss"
    • default - the neighbor will retrieve the requested URL through the machine specified in the Default Route field.

  10. In the Server Count field, enter the number of processes that will service ICP requests.
  11. In the Timeout field, enter the maximum amount of time the neighbor will wait for an ICP response in each round.
  12. Click OK.

Enabling ICP

To enable ICP:

  1. From the Server Manager, choose Server Preferences|System Specifics. The System Specifics form appears.
  2. Select the Yes radio button for ICP.
  3. Click OK.

Enabling Routing Through an ICP Neighborhood

To enable routing through an ICP neighborhood:

  1. From the Server Manager, choose Routing|Routing. The Routing Configuration form appears.
  2. Select the resource you want to route by either choosing it from the Editing pull-down menu or clicking the Regular Expression button, entering a regular expression, and clicking OK.
  3. Select the radio button next to the text "Route through."
  4. Select the checkbox next to ICP.
  5. If you want the client to retrieve a document directly from the ICP neighbor that has the document instead of going through another neighbor to get it, select the checkbox next to the text "redirect."


  6. Caution

    Redirect is not currently supported by any clients, so don't use the feature at this time.



  7. Click OK.


  8. Note

    You need to enable routing through an ICP neighborhood only if your proxy has other siblings or parents in the ICP neighborhood. If your proxy is a parent to another proxy and does not have any siblings or parents of its own, then you need to enable ICP only for that proxy. You do not need to enable routing through an ICP neighborhood.




Previous      Contents      Index      Next     
Copyright 2002 Sun Microsystems, Inc. All rights reserved.