Previous     Contents     Index     DocHome     Next     
iPlanet Web proxy Server 3.6 Administrator's Guide - NT Version



Chapter 10   Caching


This chapter describes how iPlanet Web Proxy Server caches documents. It also describes how you can configure the cache by using the online forms.



How Caching Works



Caching reduces network traffic and offers faster response time for clients who are using the proxy server instead of going directly to remote servers.

When a client requests a web page or document from the proxy server, the proxy server copies the document from the remote server to its local cache directory structure while sending the document to the client.

When a client requests a document that was previously requested and copied into the proxy cache, the proxy returns the document from the cache instead of retrieving the document from the remote server again (see Figure 10-1). If the proxy determines the file is not up to date, it refreshes the document from the remote server and updates its cache before sending it to the client.

Figure 10-1    Proxy document retrieval


Files in the cache are automatically maintained by the iPlanet Web Proxy Server Cache Manager. The Cache Manager automatically cleans the cache on a regular basis to ensure that the cache doesn't get cluttered with out-of-date documents.



Understanding the Cache Structure



A cache consists of one or more partitions. Conceptually, a partition is a storage area on a disk that you set aside for caching. If you wish to have your cache span several disks, you need to configure at least one cache partition for each disk. Each partition can be independently administered. In other words, you can enable, disable, and configure a partition independently of all other partitions.

Storing a large number of cached files in a single location can slow performance; therefore, it is a good idea to create several directories, or sections, in each partition. Sections are the next level under partitions in the cache structure. You can have up to 256 sections in your cache across all partitions. The number of cache sections must be a power of 2 (for example, 1, 2, 4, 8, 16, ..., 256).

The final level in the cache structure hierarchy is the subsection. Subsections are directories within sections. If you choose to have subsections, you may have up to 256 of them, and the number of subsections must be a power of two. Cached files are stored in the lowest level in your cache.

Figure 10-2 shows an example cache structure with partitions and sections. In this figure, the cache directory structure divides the total cache into three partitions. The first partition contains four cache sections, and the second two partitions each contain two sections.

For the Windows NT proxy, each cache section is noted by s for section, followed by two numbers separated by a period. The first of the numbers is the index. For the section shown as s3.8, the 3 indicates the index of the section. The index varies from 0 to n-1 where n is the total number of sections in the cache. The second number is the total number of sections in the cache. Therefore, s3.8 means the fourth section in a cache that has a total of 8 sections.

Figure 10-2    Example of a cache structure


In summary, a cache consists of partitions. In those partitions you may have sections, and within those sections you may have subsections. Cached files are always stored in the lowest level in your cache. Therefore, if your cache has subsections within the sections, the cached files are stored in the subsections. If your cache has sections, but no subsections, the files are stored in the sections.



Note If you are unsure about how many cache sections and subsections to create for your cache, remember that for good cache performance, it is wise to plan for approximately 100 and no more than 500 cached files in each directory.





Distributing Files in the Cache



The proxy server uses a specific algorithm to determine the directory where a document should be stored. This algorithm ensures equal distribution of documents in the base directories, so the directories contain a small and nearly equal number of documents. Equal distribution is important because directories with large numbers of documents tend to cause performance problems.

The Windows NT proxy server uses a hash function to reduce a URL to 16 characters, which it then uses for the filename of the document it stores in the cache. If two URLs hash to the same filename, the previously cached URL is replaced with the more recently accessed one.



Creating a New Cache



Before you can create a new cache, you need to understand the cache structure. For more information on the cache structure, see "Understanding the Cache Structure" on page 92. You can then prepare for creating a new cache by answering the following questions:

  • What size cache do you need? In other words, what amount of disk space will you need to set aside for your cache?

  • How many sections and subsections will you need in your cache?

  • Where is the disk space you have set aside for your cache?

  • How much free disk space do you want to have on the disk(s) at all times?



    Note Sharing a cache between two or more proxy servers may result in a conflict regarding cache contents. Therefore, you should not use the same cache for more than one proxy.



Once you have answered these questions, you can begin creating your new cache.

To create a new cache,

  1. In the Server Manager, choose Caching|Partitions. The Cache Partition Table appears.

  2. Click the Add Partition button. The Cache Partition Configuration form appears.

  3. In the Cache Sections pull-down menu at the bottom of the table, choose the number of sections you want to have in your cache. This is the total number of sections across all partitions in your cache. Once you have set the cache sections number, you will not change it for other partitions that you create unless you restructure your cache.

  4. In the Location field, enter the location of your partition. The location is the full path of the directory where you have set aside disk space for this particular partition.

  5. In the Name field, enter the name of your partition. The name must consist of alphanumeric characters and can be no longer than 64 characters.

  6. In the Max size field, enter the maximum amount of disk space to which this partition can grow. The largest size you should use is 4GB.

  7. In the Min avail field, enter the amount of disk space that you would like to always have free.

  8. In the Lo Section field, enter the index of the first section that will be in the partition you are configuring. If this is the only partition in your cache, this number should be zero.

  9. In the Hi Section field, enter the index of the last section in the partition you are configuring. If you would like to have six sections in a small partition cache, and the lo section number is zero, the hi section number is five. Remember that in a cache, each section must appear in one and only one partition.

  10. If you would like to have directories, or subsections, in the sections in your cache, use the Directories per Section field to enter the number of directories in each section.

  11. In the Free Space Margin field, enter the amount of disk space over the minimum available amount that triggers garbage collection in your cache. Creating a free space margin allows garbage collection to begin before a lack of disk space requires that new caching requests be denied.

  12. In the Maximum Size Trigger field, enter the percentage of the maximum size that will trigger garbage collection. In other words, enter the percentage of the maximum cache size that your cache can occupy before garbage collection occurs.

  13. In the Garbage Collection field, enter the amount of disk space (in terms of percentage of the current size) that you would like the garbage collector to recover each time it runs.

  14. Click OK.

Repeat these steps for all cache partitions that you want to create.



Note If you want to disable a partition, select the Disable checkbox in the partition table for that partition.



Once you have created a new cache, you will want to configure it. There are two forms to use for cache configuration, the Cache Specifics form and the Cache Configuration form. The Cache Specifics form allows you to configure global caching procedures, and the Cache Configuration form allows you to control caching procedures for specific URLs and resources. For more information on using the Cache Specifics form, see "Setting Cache Specifics" on page 97, and for more information on using the Cache Configuration form, see "Configuring the Cache" on page 101.



Restructuring the Cache



Making certain changes to your cache require that you restructure your cache. These changes include:

    • Adding partitions

    • Adding sections

    • Dramatically increasing the size of your cache



      Note Changing the size of your cache is not a cache-restructuring operation unless the size increase is very large. If the size increase is large, you should add sections to your cache so that you can keep the number of files per directory within a reasonable limit for performance.



You can restructure your cache using the Cache Partition Configuration form. The Cache Partition Configuration form consists of several items pertaining to the structure of your cache. These items include:

Location: the directory where you will be setting aside disk space for this partition

Name: the name of your partition; must consist of alphanumeric characters and can be no longer than 64 characters

Status: the status of the partition; can be enabled (open) or disabled (closed)

Max size: the amount of disk space that you have set aside for partition growth

Size: the actual amount of disk space that your partition occupies (You cannot modify this value.)

Min avail: the amount of disk space that you would like always to have free

Avail: the actual amount of disk space that is available in your partition (You cannot modify this value.)

Lo Section: the index of the first section in a partition

Hi Section: the index of the last section in a partition - section numbers cannot overlap across partitions

Directories per Section: the number of directories, or subsections, in each section

Free Space Margin: the amount of disk space over the minimum available amount that triggers garbage collection in your cache. Creating a free space margin allows garbage collection to begin before a lack of disk space requires that new caching requests be denied.

Maximum Size Trigger: the percentage of the maximum size that will trigger garbage collection. In other words, it is the percentage of the maximum cache size that your cache can occupy before garbage collection occurs.

Garbage Collection: the amount of disk space that you would like the garbage collector to recover each time it runs

Cache Sections: the number of sections in your cache.

To restructure your cache,

  1. In the Server Manager, choose Caching|Partitions. The Cache Partition Table appears.

  2. Click the name of the partition that you would like to restructure. The Cache Partition Configuration form appears.

  3. Change the values in the table accordingly.

  4. Click OK.

  5. Restart the proxy.


Warning!
Changing the cache structure after installation requires that you reformat the structure and relocate existing files, therefore causing restructuring to be time consuming. Make sure that all sections in your cache have been assigned to one and only one partition while restructuring.



Setting Cache Specifics



You can enable caching and control which types of protocols your proxy server will cache by setting the cache specifics. Cache specifics include the following items:

  • Whether your cache is enabled or disabled

  • What types of protocols will be cached

  • When to refresh a cached document

  • Whether the proxy should track the number of times a document is accessed and report it back to the remote server

To set cache specifics,

  1. In the Server Manager, choose Caching|Specifics. The Cache Specifics form appears.

  2. Change the information.

  3. Click OK.

The following sections describe the items listed on the Cache Specifics form. These sections include information that will help you to determine which settings will best suit your needs.


Enabling the Cache

Caching is an effective way to reduce network traffic for users of the proxy server. Caching also offers a faster response time for clients by eliminating the need to retrieve a document from a remote server. Your proxy server will function most effectively whenever caching is enabled.

You can enable the cache on the Cache Specifics form.


Caching HTTP Documents

Internally, caching HTTP documents differs from caching FTP documents. HTTP documents offer caching features that documents of the other protocols do not. All HTTP documents have a descriptive header section that the proxy server uses to compare and evaluate the document in the proxy cache and the document on the remote server. When the proxy does an up-to-date check on an HTTP document, the proxy sends one request to the server that tells the server to return the document if the version in the cache is out of date. Often, the document hasn't changed since the last request and therefore is not transferred. This method of checking to see if an HTTP document is up-to-date saves bandwidth and decreases latency.

To reduce transactions with remote servers, the proxy server allows you to set a Cache Expiration setting for HTTP documents. The Cache Expiration setting tells the proxy to estimate if the HTTP document needs an up-to-date check before sending the request to the server. The proxy makes this estimate based on the HTTP document's Last-Modified date found in the header.

With HTTP documents, you can also use a Cache Refresh setting. This option specifies whether the proxy always does an up-to-date check (which would override an Expiration setting) or if the proxy waits a specific period of time before doing a check. Table 10-1 shows what the proxy does if both an Expiration setting and a Refresh setting are specified. Using the Refresh setting decreases latency and saves bandwidth considerably.


Table 10-1    Using the Cache Expiration and Cache Refresh settings with HTTP

Refresh setting

Expiration setting

Results

Always do an up-to-date check  

(Not applicable)  

Always do an up-to-date check  

User-specified interval

Use document's "expires" header  

Do an up-to-date check if interval expired  

Estimate with document's Last-Modified header  

Smaller value1 of the estimate and expires header  

1 Using the smaller value guards against getting stale data from the cache for documents that change frequently.


Setting the HTTP Cache Refresh Interval

If you decide that you want your proxy server to cache HTTP documents, you need to determine whether it should always do an up-to-date check for documents in the cache or if it should check based on a Cache Refresh setting (up-to-date check interval). For HTTP documents, a reasonable refresh interval would be four to eight hours, for example. The longer the refresh interval, the fewer the number of times the proxy connects with remote servers. Even though the proxy doesn't do up-to-date checking during the refresh interval, users can force a refresh by clicking the Reload button in the client (such as Netscape Navigator); this action makes the proxy force an up-to-date check with the remote server.

You can set the refresh interval for HTTP documents on either the Cache Specifics form or the Cache Configuration form. The Cache Specifics form allows you to configure global caching procedures, and the Cache Configuration form allows you to control caching procedures for specific URLs and resources. For more information on using the Cache Specifics form, see Setting Cache Specifics, and for more information on using the Cache Configuration form, see Configuring the Cache.


Setting the HTTP Cache Expiration Policy

You can also set up your server to check if the cached document is up-to-date by using a last-modified factor or explicit expiration information only.

Explicit expiration information is a header found in some HTTP documents that specifies the date and time when that file will become outdated. Not many HTTP documents use explicit Expires headers, so it's better to estimate based on the Last-modified header.

If you decide to have your HTTP documents refresh or expire based upon the Last-modified header, you need to select a fraction to use in the expiration estimation. This fraction, known as the LM factor, is multiplied by the interval between the last modification and the time that the last up-to-date check was performed on the document. The resulting number is compared with the time since the last up-to-date check. If the number is smaller than the time interval, the document is not expired. Smaller fractions make the proxy check documents more often. For example, suppose you have a document that was last changed ten days ago. If you set the last-modified factor to 0.1, the proxy interprets the factor to mean that the document is probably going to remain unchanged for one day (10 * 0.1 = 1). The proxy would, in that case, return the document from the cache if the document was checked less than a day ago.

In this same example, if the cache refresh setting for HTTP documents is set to less than one day, the proxy does the up-to-date check more than once a day. The proxy always uses the value (cache refresh or cache expiration) that requires that it update the files more frequently.

You can set the expiration setting for HTTP documents on either the Cache Specifics form or the Cache Configuration form. The Cache Specifics form allows you to configure global caching procedures and the Cache Configuration form allows you to control caching procedures, for specific URLs and resources. For more information on using the Cache Specifics form, see Setting Cache Specifics, and for more information on using the Cache Configuration form, see Configuring the Cache.


Caching FTP and Gopher Documents

FTP and Gopher do not include a method for checking to see if a document is up-to-date. Therefore, the only way to optimize caching for FTP and Gopher documents is to set a Cache Refresh interval. The Cache Refresh interval is the amount of time the proxy server waits before retrieving the latest version of the document from the remote server. If you do not set a Cache Refresh interval, the proxy will retrieve these documents even if the versions in the cache are up-to-date.


Setting FTP and Gopher Cache Refresh Intervals

If you are setting a cache refresh interval for FTP and Gopher, choose one that you consider safe for the documents the proxy gets. For example, if you store information that rarely changes, use a high number (several days). If the data changes constantly, you'll want the files to be retrieved at least every few hours. During the refresh time, you risk sending an out-of-date file to the client. If the interval is short enough (a few hours), you eliminate most of this risk while getting noticeably faster response time.

You can set the cache refresh interval for FTP and Gopher documents on either the Cache Specifics form or the Cache Configuration form. The Cache Specifics form allows you to configure global caching procedures, and the Cache Configuration form allows you to control caching procedures for specific URLs and resources. For more information on using the Cache Specifics form, see Setting Cache Specifics, and for more information on using the Cache Configuration form, see Configuring the Cache.



Note If your FTP and Gopher documents vary widely (some change often, others rarely), use the Cache Configuration form to create a separate template for each kind of document (for example, create a template with resources ftp://.*.gif) and then set a refresh interval that is appropriate for that resource.





Configuring the Cache



You can configure the kind of caching you want for specific resources, using the Caching Configuration form. You can specify several configuration parameter values for URLs matching the regular expression pattern that you specify. This feature gives you fine control of the proxy cache, based on the type of document cached.

Configuring the cache can include identifying the following items:

  • The cache default

  • How to cache pages that require authentication

  • How to cache queries

  • The minimum and maximum cache file sizes

  • When to refresh a cached document

  • The cache expiration policy

  • The caching behavior for client interruptions



    Note If you set the cache default for a particular resource to either Derived configuration or Don't cache, the cache configuration options will not appear on the Caching Configuration form. However, if you choose a cache default of Cache for a resource, you can specify several other configuration items.



To configure the cache,

  1. In the Server Manager, choose Caching|Configuration. The Caching Configuration form appears.

  2. Select the resource you are editing by either choosing it from the Editing pull-down menu or by clicking the Regular Expression button, entering a regular expression, and clicking OK. For more information on regular expressions, see Understanding Regular Expressions.

  3. Change the configuration information.

  4. Click OK.

The following sections describe the items listed on the Caching Configuration form. These sections include information that will help you to determine which configuration will best suit your needs.


Setting the Cache Default

The proxy server allows you to identify a cache default for specific resources. A resource is a type of file that matches certain criteria that you specify. For instance, you may want your server to automatically cache all documents from the domain company.com. If so, click the Regular Expression button on the top of the Configuration form and, in the field that appears, enter

[a-z] *://[^/:]\.company\.com.*.

Then click the Cache radio button. Your server automatically caches all cacheable documents from that domain. For more information on regular expressions, see Understanding Regular Expressions.



Note If you set the cache default for a particular resource to either Derived configuration or Don't cache, it is not necessary to configure the cache for that resource. However, if you choose a cache default of Cache for a resource, you can specify several other configuration items. For a list of these items, see Configuring the Cache.



You can set the cache default for any resource on the Cache Configuration form. The cache default for HTTP and FTP can also be set on the Cache Specifics form.


Caching Pages that Require Authentication

You can have your server cache files that require user authentication. If you choose to have your proxy server cache these files, it tags the files in the cache so that if a user asks for them, it knows that the files require authentication from the remote server.

Because the proxy server does not know how remote servers authenticate and it does not know users' IDs or passwords, it will simply force an up-to-date check with the remote server each time a request is made for a document that requires authentication. The user therefore must enter an ID and password to gain access to the file. If the user have already accessed that server earlier in the Navigator session, Navigator automatically sends the authentication information without prompting the user for it.

If you do not enable the caching of pages that require authentication, the proxy assumes the default, which is to not cache them.

You can set the policy for caching pages that require authentication on the Cache Configuration form.


Caching Queries


Cached queries only work with HTTP documents.
You can limit the length of queries that are cached, or you can completely inhibit caching of queries. The longer the query, the less likely it is to be repeated, and the less useful it is to cache.

These caching restrictions apply for queries: the access method has to be GET and the response must have a Last-modified header. This requires the query engine to indicate that the query result document can be cached. If the Last-modified header is present, the query engine should support a conditional GET method (with an If-modified-since header) in order to make caching effective; otherwise it should return an Expires header.

If you do not enable the caching of queries, the proxy assumes the default, which is to not cache them.

You can set the query cache policy on the Cache Configuration form.


Setting the Minimum and Maximum Cache File Sizes

You can set the minimum and maximum sizes for files that will be cached by your proxy server. You may want to set a minimum size if you have a fast network connection. If your connection is fast, small files may be retrieved so quickly that it is not necessary for the server to cache them. In this instance, you would want to cache only larger files. You may want to set a maximum file size to make sure that large files do not occupy too much of your proxy's disk space.

You can set the minimum and maximum cache file sizes on the Cache Configuration form.


Setting the Cache Behavior for Client Aborts

If a document is only partly retrieved and the client aborts the data transfer, the proxy has the ability to finish retrieving the document for the purpose of caching it. The proxy's default is to finish retrieving a document for caching if at least 25 percent of it has already been retrieved. Otherwise, the proxy terminates the remote server connection and removes the partial file. You can raise or lower the client interruption percentage on the Cache Configuration form.



Caching Local Hosts



If a URL requested from a local host lacks a domain name, the proxy server will not cache it in order to avoid duplicate caching. For example, if a user requests http://machine/filename.html and http://machine.iplanet.com/filename.html from a local server, both URLs might appear in the cache. Because these files are from a local server, they may be retrieved so quickly that it is not necessary to cache them anyway.

However, if your company has servers in many remote locations, you may want to cache documents from all hosts to reduce network traffic and decrease the time needed to access the files.

To enable the caching of local hosts,

  1. In the Server Manager, choose Caching|Cache Local Hosts.

  2. Select the resource you are editing by either choosing it from the Editing pull-down menu or by clicking the Regular Expression button and entering the name of the resource to edit. For more information on regular expressions, see Understanding Regular Expressions.

  3. Click the enabled button.

  4. Click OK.



Using Cache Batch Updates

The Cache Batch Update feature allows you to pre-load files in a specified web site or do an up-to-date check on documents already in the cache whenever the proxy server is not busy. From the Cache Batch Updates form, you can create, edit, and delete batches of URLs and enable and disable batch updating.


Creating a Batch Update

You can actively (as opposed to on-demand) cache files by specifying files to be batch updated. The proxy server allows you to perform an up-to-date check on several files currently in the cache or pre-load multiple files in a particular web site.

To create a batch update,

  1. In the Server Manager, choose Caching|Batch Updates.

    The Cache Batch Updates form appears.

  2. Select New and Create from the pull-down menus next to Select a configuration to edit.

  3. Click OK.

    A new Cache Batch Update form appears.

  4. In the Name section of the form, enter a name for the new batch update entry.

  5. In the Source section of the form, click the radio button for the type of batch update that you want to create. Click the first radio button if you want to perform an up-to-date check on all documents in the cache. Click the second radio button if you want to cache URLs recursively starting from the given source URL.

  6. In the Source section fields, identify the documents that you want to use in the batch update.

  7. In the Exceptions section, identify any files that you would like to exclude from the batch update.

  8. In the Resources section, enter the maximum number of simultaneous connections and the maximum number of documents to traverse.

  9. In the Timing section, enter the start and end times for the generation of the batch update. Only one batch update can be active at any time, so it is best to not overlap other batch update configurations.

  10. Click OK.



    Note You can create, edit, and delete batch update configurations without having batch updates turned on. However, if you want your batch updates to be updated according to the times you set on the Cache Batch Updates form, you must turn updates on.




Editing or Deleting a Batch Update Configuration

You can edit or delete batch updates using the Cache Batch Updates form. You may want to edit a batch update if you need to exclude certain files or want to update the batch more frequently. You may also want to delete a batch update configuration completely.

To edit or delete a batch update configuration,

  1. In the Server Manager, choose Caching|Batch Updates. The Cache Batch Updates form appears.

  2. If you want to edit a batch, select the name of that batch and "Edit" from the pull-down menus next to Select a configuration to edit. If you want to delete a batch, select the name of that batch and "Delete" from the pull-down menus.

  3. Click OK. The Cache Batch Updates form appears.

  4. Modify the information as you wish.

  5. Click OK.



Accessing Cache Manager Information

You can view the names and attributes of all recorded cached URLs through the Cache Manager information. Cache Manager information is a list of cached documents grouped by access protocol and site name. You can limit the URLs you view in the list by typing a domain name into the Search field. By accessing this information, you can perform various cache management functions such as expiring and removing documents from the cache.

To access Cache Manager information,

  1. In the Server Manager, choose Caching|Cache Management.

  2. Click the Regenerate button to generate a current list of cached URLs.

  3. If you would like to view Cache Manager information for a specific URL, enter a URL or regular expression in the Search field and click the Search button.

  4. If you would like to view Cache Manager information grouped by domain name and host, select a domain name from the list. A list of hosts in that domain appears. Click on the name of a host and a list of URLs appears.

  5. Click on the name of a URL. Detailed information about that URL appears.


Expiring and Removing Files from the Cache

From the Cache Management form you can expire and remove documents from the cache.

To expire or remove files,

  1. In the Server Manager, choose Caching|Cache Management.

  2. Click the Regenerate button to generate a current list of cached URLs.

  3. If you know of a specific URL that you would like to expire or remove, enter that URL or a regular expression that matches that URL in the Search field and click the Search button.

    If you would like to work with URLs grouped by domain name and host, select a domain name from the list. A list of hosts in that domain appears. Click on the name of a host and a list of URLs appears.

  4. To expire individual files, select the Ex radio button next to the URLs for those files and click the Exp/Rem Marked button on the bottom of the form. To expire all of the files in the list, click the Exp All button on the bottom of the form.

    To remove individual files from the cache, select the Rm radio button next to the URLs for those files and click the Exp/Rem Marked button on the bottom of the form. To remove all of the files in the list, click the Rem All button on the bottom of the form.

  5. Click the Regenerate button at the top of the form to regenerate the URL list.


Warning!
Generating a URL database may take a long time. It is possible that Navigator will time out while this utility is running. While the proxy server is generating a URL database, any attempt to run a second instance of the database utility will fail.

You can configure the URL database generation utility to run at regular intervals when the system load is low. To do so, add this line to the urldbgen file in the iplanet\server\bin\proxy directory:

[-p param-list] proxy-id

Unless otherwise noted in the param-list, the output files contain all parameters:

  • content type

  • content length

  • times accessed

  • last accessed time

  • last modified time

  • expiration time

  • last checked time

  • transfer duration

The database generation utility outputs four files and places them in the iplanet\server\proxy-id\urldb directory. All fields in the output files are separated by tabs with one entry per line. The four output files are:

domainlist - A list of domains and the number of sites in each domain

sitelist - A list of sites, the number of URLS in each site, and the total amount of space in the cache for each site.

urllist - A list of URLs and specified parameters.

urldbinfo - A list of meta information about the urldb.



Routing through Proxy Arrays



iPlanet proxy arrays for distributed caching allow multiple proxies to serve as a single cache. In other words, each proxy in the array will contain different cached URLs that can be retrieved by a browser or downstream proxy server. Proxy arrays prevent the duplication of caches that often occurs with multiple proxy servers. Through hash-based routing, proxy arrays route requests to the correct cache in the proxy array.

Proxy arrays also allow incremental scaleability. In other words, if you decide to add another proxy to your proxy array, each member's cache is not invalidated. Only 1/n of the URLs in each member's cache, where n is the number of proxies in your array, will be reassigned to other members.

For each request through a proxy array, a hash function assigns each proxy in the array a score that is based on the requested URL, the proxy's name and the proxy's load factor. The request is then routed to the proxy with the highest score.

Since requests for URLs can come from both clients and proxies, there are two types of routing through proxy arrays: client to proxy routing and proxy to proxy routing.

In client to proxy routing, the client uses the Proxy Auto Configuration (PAC) mechanism to determine which proxy to go through. However, instead of using the standard PAC file, the client uses a special PAC file which computes the hash algorithm to determine the appropriate route for the requested URL. Figure 10-3 shows client to proxy routing. For more information about the PAC file, see Chapter 12, "Using the Client Autoconfiguration File." The proxy server can automatically generate the special PAC file from the Proxy Array Membership Table (PAT) specifications made through the administration interface.

In proxy to proxy routing, proxies use a PAT (Proxy Array Table) file to compute the hash algorithm instead of the PAC file used by clients. The PAT file is an ASCII file that contains information about a proxy array, including the proxies' machine names, IP addresses, ports, load factors, cache sizes, etc. For computing the hash algorithm at the server, it is much more efficient to use a PAT file than a PAC file (which is a JavaScript file that has to be interpreted at run-time), however, most clients do not recognize the PAT file format, and therefore, must use a PAC file. Figure 10-4 shows proxy to proxy routing.

The PAT file will be created on one proxy in the proxy array - the master proxy. The proxy administrator must determine which proxy will be the master proxy. The administrator can change the PAT file from this master proxy server and all other members of the proxy array can then manually or automatically poll the master proxy for these changes. You can configure each member to automatically generate a PAC file from these changes.

You can also chain proxy arrays together for hierarchical routing. If a proxy server routes an incoming request through an upstream proxy array, the upstream proxy array is then known as a parent array. A parent array is a proxy array that a proxy server goes through. In other words, if a client requests a document from Proxy X, and Proxy X does not have the document, it sends the request to Proxy Array Y instead of sending it directly to the remote server. So, Proxy Array Y is a parent array. In Figure 10-4, Proxy Array 1 is a parent array to Proxy Array 2.

All of the proxy servers in a proxy array should be in a single administrative domain. Two proxy arrays in separate administrative domains can communicate, however if the requesting proxy can retrieve cached URLs from more than one proxy array, ICP should be used to determine which array to go to.

Figure 10-3    Client to Proxy Routing


Figure 10-4    Proxy to Proxy Routing

To set up a proxy array,

  1. From the master proxy, create the member list. For more information on creating the member list, see Creating a Proxy Array Member List.

  2. From the master proxy, create a PAT mapping to map the URL "/pat" to the PAT file. For information on creating a PAT mapping, see Creating a URL Mapping.

  3. Configure each non-master member of the array. For more information on configuring non-master members, see Configuring Proxy Array Members.

  4. Enable routing through a proxy array. For more information on enabling routing through a proxy array, see Enabling Routing through a Proxy Array.

  5. Enable your proxy array. Fore more information on enabling a proxy array, see Enabling a Proxy Array.

  6. Generate a PAC file from your PAT file. You only need to generate a PAC file if you are using client to proxy routing. For more information on generating a PAC file from a PAT file, see Generating a PAC File from a PAT File.



    Note If your proxy array is going to route through a parent array, you also need to enable the parent array and configure each member to route through a parent array for desired URLs. For more information on parent arrays, see Routing Through a Parent Array.




Creating a Proxy Array Member List

You should create and update the proxy array member list from the master proxy of the array only. You only need to create the proxy array member list once, but you can modify it at any time. By creating the proxy array member list, you are generating the PAT file to be distributed to all of the proxies in the array and to any downstream proxies.


Warning!
You should only make changes or additions to the proxy array member list through the master proxy in the array. All other members of the array can only read the member list.

  1. From the Server Manager, choose Caching|Proxy Array Configuration. The Proxy Array Configuration form appears.

  2. In the Array name field, enter the name of the array.

  3. In the "Reload Configuration Every" field, enter the number of minutes between each polling for the PAT file.

  4. Click OK.



    Note Be sure to click OK before you begin to add members to the member list.



  5. Click the Add button. The Proxy Array Member form appears.

  6. For each member in the proxy array, enter the following and then click OK:

    • Name - the name of the proxy server you are adding to the member list

    • IP Address - the IP address of the proxy server you are adding to the member list

    • Port - This is the port on which the member polls for the PAT file.

    • Load Factor - an integer that reflects the relative load that should be routed through the member.

    • Status - the status of the member. This value can be either on or off. If you disable a proxy array member, the member's requests will be re-routed through another member.



      Note Be sure to click OK after you enter the information for each proxy array member you are adding.




Deleting Proxy Array Members

Deleting proxy array members will remove them from the proxy array. You can only delete proxy array members from the master proxy.


Warning!
You should only make changes or additions to the proxy array member list through the master proxy in the array. If you modify this list from any other member of the array, all changes will be lost.

To delete members of a proxy array,

  1. From the Server Manager, choose Caching|Proxy Array Configuration. The Proxy Array Configuration form appears.

  2. In the Member List, select the radio button next to the member that you want to delete.

  3. Click the Delete Button.



    Note If you want your changes to take effect and to be distributed to the members of the proxy array, you need to update the Configuration ID on the Proxy Array Configuration form and click OK. iPlanet suggests that to update the configuration ID, you increase it by one.




Editing Proxy Array Member List Information

At any time, you can change the information for the members in the proxy array member list. You can only edit the proxy array member list from the master proxy.


Warning!
You should only make changes or additions to the proxy array member list through the master proxy in the array. If you modify this list from any other member of the array, all changes will be lost.

To edit member list information for any of the members in a proxy array,

  1. From the Server Manager, choose Caching|Proxy Array Configuration. The Proxy Array Configuration form appears.

  2. In the Member List, select the radio button next to the member that you want to edit.

  3. Click the Edit Button. The Proxy Array Member form appears.

  4. Edit the appropriate information.

  5. Click OK.



    Note If you want your changes to take effect and to be distributed to the members of the proxy array, you need to update the Configuration ID on the Proxy Array Configuration form and click OK. iPlanet suggests that to update the configuration ID, you increase it by one.




Configuring Proxy Array Members

You only need to configure each member in the proxy array once, and you must do so from the member itself. You cannot configure a member of the array from another member. You also need to configure the master proxy.



Note You should follow this process for each member of the array.



  1. From the Server Manager, choose Caching|Member Configuration. The Proxy Array Member Configuration form appears.

  2. In the Proxy Array section, indicate whether or not the member needs to poll for the PAT file by selecting the appropriate radio button. The choices are:

    • Non-master member - You should select this option if the member you are configuring is not the master proxy. Any proxy array member that is not a master proxy will need to poll for the PAT file in order to retrieve it from the master proxy.

    • Master member - You should select this option if you are configuring the master proxy. If you are configuring the master proxy, the PAT file is local and does not need to be polled.

  3. If, in Step 2, you chose Don't Poll, Click OK - you are finished with this form. If you chose Poll for PAT file, continue with Step 4.

  4. In the Poll Host field, enter the name of the master proxy that you will be polling for the PAT file.

  5. In the Port field, enter the port at which the master proxy accepts HTTP requests.

  6. In the URL field, enter the URL of the PAT file on the master proxy. If on your master proxy, you have created a PAT mapping to map the PAT file to the URL "/pat", you should enter "/pat" into this URL field.

  7. In the Headers File field, enter the full pathname for a file with any special headers that must be sent with the HTTP request for the PAT file (such as authentication information). This field is optional.

  8. Click OK.


Enabling Routing through a Proxy Array

To enable routing through a proxy array,

  1. From the Server Manager, choose Routing|Routing. The Routing Configuration form appears.

  2. Select the resource you want to route by either choosing it from the Editing pull-down menu or clicking the Regular Expression button, entering a regular expression, and clicking OK.

  3. Select the radio button next to the text "Route through".

  4. Select the checkboxes for proxy array and/or parent array.



    Note You can only enable proxy array routing if the proxy server you are configuring is a member of a proxy array. You can only enable parent routing if a parent array exists. Both routing options are independent of eachother.



  5. If you choose to route through a proxy array and you want to redirect requests to another URL, select the redirect checkbox. Redirecting means that if a member of a proxy array receives a request that it should not service, it tells the client which proxy to contact for that request.


Warning
Redirect is not currently supported by any clients, so you should not use the feature at this time.

  1. Click OK.


Enabling a Proxy Array

To enable a proxy array,

  1. From the Server Manager, choose Server Preferences|System Specifics. The System Specifics form appears.

  2. Select the Yes radio button for the type of array(s) you want to enable - either a normal proxy array or a parent array.



    Note If you are not routing through a proxy array, you should make sure that all clients use a special PAC file to route correctly before you disable the proxy array option. If you disable the parent array option, you should have valid alternative routing options set in the Routing form, such as explicit proxy or a direct connection.



  3. Click OK.


Redirecting Requests in a Proxy Array

If you choose to route through a proxy array, you need to designate whether you want to redirect requests to another URL. Redirecting means that if a member of a proxy array receives a request that it should not service, it tells the client which proxy to contact for that request.


Warning
Redirect is not currently supported by any clients, so you should not use the feature at this time.


Generating a PAC File from a PAT File

Because most clients do not recognize the PAT file format, the clients in client to proxy routing use the Proxy Auto Configuration (PAC) mechanism to receive information about which proxy to go through. However, instead of using the standard PAC file, the client uses a special PAC file derived from the PAT file. This special PAC file computes the hash algorithm to determine the appropriate route for the requested URL.

You can manually or automatically generate a PAC file from the PAT file. If you manually generate the PAC file from a specific member of the proxy array, that member will immediately re-generate the PAC file based on the information currently in the PAT file. If you configure a proxy array member to automatically generate a PAC file, the member will automatically re-generate the file after each time it detects a modified version of the PAT file.



Note If you are not using the proxy array feature for your proxy server, then you should use the Proxy Client Autoconfiguration form to generate your PAC file. For more information see Chapter 12, "Using the Client Autoconfiguration File."




Manually Generating a PAC File from a PAT File

To manually generate a PAC file from a PAT file,



Note The PAC file can only be generated from the master proxy.



  1. From the Server Manager of the master proxy, choose Caching|Proxy Array Configuration.

    The Proxy Array Configuration form appears.

  2. Click the Generate PAC button. The PAC Generation form appears.

  3. If you want to use custom logic in your PAC file, in the Custom Logic File field, enter the name of the file containing the customized logic you would like to include in the generation of your PAC file. This logic is inserted before the proxy array selection logic in the FindProxyForURL function. This function is typically used for local requests which need not go through the proxy array.

    If you have already entered the custom logic file on the Member Configuration form, this field will be populated with that information. You may edit the custom logic filename if you wish, and the changes you make will transfer to the Member Configuration form as well.

  4. In the Default Route field, enter the route a client should take if the proxies in the array are not available.

    If you have already entered the default route on the Member Configuration form, this field will be populated with that information. You may edit the default route if you wish, and the changes you make will transfer to the Member Configuration form as well.

  5. Click OK.


Automatically Generating a PAC File from a PAT File

To automatically generate a PAC file from a PAT file each time a change is detected,

  1. From the Server Manager, choose Caching|Member Configuration. The Member Configuration form appears.

  2. Select the checkbox next to "Auto-generate PAC file".

  3. In the Default Route field, enter the route a client should take if the proxies in the array are not available.

    If you have already entered and saved the default route on the Member Configuration form, this field will be populated with that information. You may edit the default route if you wish, and the changes you make will transfer to the Member Configuration form as well.

  4. If you want to use custom logic in your PAC file, in the Custom Logic File field, enter the name of the file containing the customized logic you would like to include in the generation of your PAC file. This logic is inserted before the proxy array selection logic in the FindProxyFor URL function.

    If you have already entered and saved the custom logic file on the Member Configuration form, this field will be populated with that information. You may edit the custom logic filename if you wish, and the changes you make will transfer to the Member Configuration form as well.

  5. Click OK.


Routing Through a Parent Array

You can configure your proxy or proxy array to route through an upstream parent array instead of going directly to a remote server. To configure a proxy or proxy array member to route through a parent array,

  1. Enable the parent array. For instructions on enabling an array, Enabling a Proxy Array.

  2. Enable routing through the parent array. For instructions on enabling routing through an array, see Enabling Routing through a Proxy Array.

  3. From the Server Manager, choose Caching|Member Configuration. The Proxy Array Member Configuration form appears.

  4. In the Poll Host field in the Parent Array section of the form, enter the host name of the proxy in the parent array that you will poll for the PAT file. This proxy is usually the master proxy of the parent array.

  5. In the Port field in the Parent Array section of the form, enter the Port number of the proxy in the parent array that you will poll for the PAT file.

  6. In the URL field, enter the URL of the PAT file to be polled.

  7. In the URL field, enter the URL of the PAT file on the master proxy. If on your master proxy, you have created a PAT mapping, you should enter the mapping into this URL field.

  8. In the Headers File field in the Parent Array section of the form, full pathname for a file with any special headers that must be sent with the HTTP request for the PAT file (such as authentication information). This field is optional.

  9. Click OK.


Viewing Parent Array Information

If your proxy array is routing through a parent array, you will need information about the members of the parent array. This information is sent from the parent array in the form of a PAT file. The information in this PAT file is displayed on the Parent Array Configuration form.

To view parent array information,

  1. From the Server Manager, choose Caching|Parent Array Configuration. The Parent Array Configuration form appears.

  2. View the information.



Routing Through ICP Neighborhoods

The Internet Cache Protocol (ICP) is an object location protocol that enables caches to communicate with one another. Caches can use ICP to send queries and replies about the existence of cached URLs and about the best locations from which to retrieve those URLs. In a typical ICP exchange, one cache will send an ICP query about a particular URL to all neighboring caches. Those caches will then send back ICP replies that indicate whether or not they contain that URL. If they do not contain the URL, they send back a "MISS". If they do contain the URL, they send back a "HIT".

ICP can be used for communication among proxies located in different administrative domains. It allows a proxy cache in one administrative domain to communicate with a proxy cache in another administrative domain. It is effective for situations in which several proxy servers want to communicate but, cannot all be configured from one master proxy (as they are in a proxy array). Figure 10-5 shows an ICP exchange between proxies in different administrative domains.

The proxies that communicate with each other via ICP are called neighbors. You cannot have more than 64 neighbors in an ICP neighborhood. There are two types of neighbors in an ICP neighborhood, parents and siblings. Only parents can access the remote server if no other neighbors have the requested URL. Your ICP neighborhood can have no parents or it can have more than one parent. Any neighbor in an ICP neighborhood that is not a parent is considered a sibling. Siblings cannot retrieve documents from remote servers unless the sibling is marked as the default route for ICP, and ICP uses the default.

You can use polling rounds to determine the order in which neighbors receive queries. A polling round is an ICP query cycle. For each neighbor, you must assign a polling round. If you configure all neighbors to be in polling round one, then all neighbors will be queried in one cycle. In other words, they will all be queried at the same time. If you configure some of the neighbors to be in polling round 2, then all of the neighbors in polling round one will be queried first and if none of them return a "HIT", all round two proxies will be queried. The maximum number of polling rounds is two.

Since ICP parents are likely to be network bottlenecks, you can use polling rounds to lighten their load. A common setup is to configure all siblings to be in polling round one and all parents to be in polling round two. That way, when the local proxy requests a URL, the request goes to all of the siblings in the neighborhood first. If none of the siblings have the requested URL, the request goes to the parent. If the parent does not have the URL, it will retrieve it from a remote server.

Each neighbor in an ICP neighborhood must have at least one ICP server running. If a neighbor does not have an ICP server running, it cannot answer the ICP requests from their neighbors. Enabling ICP on your proxy server starts the ICP server if it is not already running.

Figure 10-5    An ICP exchange


To set up ICP, follow these steps:

  1. Add parent(s) to your ICP neighborhood. (This step is only necessary if you want parents in your ICP neighborhood.) For more information on adding parents to an ICP neighborhood, see Adding Parents to an ICP Neighborhood.

  2. Add sibling(s) to your ICP neighborhood. For more information on adding siblings to your ICP neighborhood, see Adding Siblings to an ICP Neighborhood.

  3. Configure each neighbor in the ICP neighborhood. For more information on configuring ICP neighbors, see Configuring Individual ICP Neighbors.

  4. Enable ICP. For information on enabling ICP, see Enabling ICP.

  5. If your proxy has siblings or parents in its ICP neighborhood, enable routing through an ICP neighborhood. For more information on enabling routing through an ICP neighborhood, see Enabling Routing Through an ICP Neighborhood.


Adding Parents to an ICP Neighborhood

To add parent proxies to an ICP neighborhood,

  1. From the Server Manager, choose Caching|ICP.

    The ICP Configuration form appears.

  2. In the Parent List section of the form, click the Add Parent button.

    The ICP Parent form appears.

  3. In the Machine Address field, enter the IP address or host name of the parent proxy you are adding to the ICP neighborhood.

  4. In the ICP Port field, enter the port number on which the parent proxy will listen for ICP messages.

  5. In the Multicast Address field, you can enter the multicast address to which the parent listens. A multicast address is an IP address to which multiple servers can listen. Using a multicast address allows a proxy to send one query to the network that all neighbors who are listening to that multicast address can see; therefore, eliminating the need to send a query to each neighbor separately. Using multicast is optional.



    Note Neighbors in different polling rounds should not listen to the same multicast address.



  6. In the TTL field, enter the number of subnets that the multicast message will be forwarded to. If the TTL is set to 1, the multicast message will only be forwarded to the local subnet. If the TTL is 2, the message will go to all subnets that are one level away, and so on.



    Note Multicast makes it possible for two unrelated neighbors to send ICP messages to eachother. Therefore, if you want to prevent unrelated neighbors from receiving ICP messages from the proxies in your ICP neighborhood, you should set a low TTL value in the TTL field.



  7. In the Proxy Port field, enter the port for the proxy server on the parent.

  8. From the Polling Round pull-down, choose the polling round that you want the parent to be in. The default polling round is 1. For more information on polling rounds see page 121.

  9. Click OK.


Removing Parents from an ICP Neighborhood

To remove parent proxies from an ICP neighborhood,

  1. From the Server Manager, choose Caching|ICP. The ICP Configuration form appears.

  2. Click the radio button next to the parent you want to remove.

  3. Click the Remove button.


Editing Configurations for Parents in an ICP neighborhood

To edit the machine address, port number, multicast address, time to live value, proxy port number, or polling round value for a parent proxy,

  1. From the Server Manager, choose Caching|ICP.

    The ICP Configuration form appears.

  2. Click the radio button next to the parent you want to edit.

  3. Click the Edit button.

  4. Modify the appropriate information.

  5. Click OK.


Adding Siblings to an ICP Neighborhood

To add sibling proxies to an ICP neighborhood,

  1. From the Server Manager, choose Caching|ICP.

    The ICP Configuration form appears.

  2. In the Sibling List section of the form, click the Add Sibling button.

    The ICP Sibling form appears.

  3. In the Machine Address field, enter the IP address or host name of the sibling proxy you are adding to the ICP neighborhood.

  4. In the Port field, enter the port number on which the sibling proxy will listen for ICP messages.

  5. In the Multicast Address field, enter the multicast address to which the sibling listens. A multicast address is an IP address to which multiple servers can listen. Using a multicast address allows a proxy to send one query to the network that all neighbors who are listening to that multicast address can see; therefore, eliminating the need to send a query to each neighbor separately.



    Note Neighbors in different polling rounds should not listen to the same multicast address.



  6. In the TTL field, enter the number of subnets that the multicast message will be forwarded to. If the TTL is set to 1, the multicast message will only be forwarded to the local subnet. If the TTL is 2, the message will go to all subnets that are one level away.



    Note Multicast makes it possible for two unrelated neighbors to send ICP messages to eachother. Therefore, if you want to prevent unrelated neighbors from receiving ICP messages from the proxies in your ICP neighborhood, you should set a low TTL value in the TTL field.



  7. In the Proxy Port field, enter the port for the proxy server on the sibling.

  8. From the Polling Round pull-down, choose the polling round that you want the sibling to be in. The default polling round is 1.

    For more information on polling rounds see page 121.

  9. Click OK.


Removing Siblings from an ICP Neighborhood

To remove sibling proxies from an ICP neighborhood,

  1. From the Server Manager, choose Caching|ICP.

    The ICP Configuration form appears.

  2. Click the radio button next to the sibling you want to remove.

  3. Click the Remove button.


Editing Configurations for Siblings in an ICP Neighborhood

To edit the machine address, port number, multicast address, time to live value, proxy port number, or polling round value for a sibling proxy,

  1. From the Server Manager, choose Caching|ICP.

    The ICP Configuration form appears.

  2. Click the radio button next to the sibling you want to edit.

  3. Click the Edit button.

  4. Modify the appropriate information.

  5. Click OK.


Configuring Individual ICP Neighbors

You will need to configure each neighbor, or local proxy, in your ICP neighborhood. To configure the local proxy server in your ICP neighborhood,

  1. From the Server Manager, choose Caching|ICP.

    The ICP Configuration form appears.

  2. In the Binding Address field, enter the IP address to which the neighbor server will bind.

  3. In the Port field, enter the port number to which the neighbor server will listen for ICP.

  4. In the Multicast Address field, enter the multicast address to which the neighbor listens. A multicast address is an IP address to which multiple servers can listen. Using a multicast address allows a proxy to send one query to the network that all neighbors who are listening to that multicast address can see; therefore, eliminating the need to send a query to each neighbor separately.

    If both a multicast address and bind address are specified for the neighbor, the neighbor uses the bind address to send replies and uses multicast to listen. If neither a bind address or a multicast address is specified, the operating system will decide which address to use to send the data.

  5. In the Default Route field, enter the name or IP address of the proxy to which the neighbor should route a request when none of the neighboring proxies respond with a "hit". If you enter the word "origin" into this field, or if you leave it blank, the default route will be to the origin server.



    Note If you choose "first responding parent" from the No Hit Behavior pull-down discussed in Step 7, the route you enter in the Default Route field will have no effect. The proxy only uses this route if you choose the default no hit behavior.



  6. In the second Port field, enter the port number of the default route machine that you entered into the Default Route field.

  7. From the "On no hits, route through:" pull-down, choose the neighbor's behavior when none of the siblings in the ICP neighborhood have the requested URL in their caches. You can choose:

    • first responding parent - the neighbor will retrieve the requested URL through the parent that first responds with a "miss"

    • default - the neighbor will retrieve the requested URL through the machine specified in the Default Route field.

  8. In the Server Count field, enter the number of threads that will service ICP requests.

  9. In the Timeout field, enter the maximum amount of time the neighbor will wait for an ICP response in each round.

  10. Click OK.


Enabling ICP

To enable ICP,

  1. From the Server Manager, choose Server Preferences|System Specifics. The System Specifics form appears.

  2. Select the Yes radio button for ICP.

  3. Click OK.


Enabling Routing Through an ICP Neighborhood

To enable routing through an ICP neighborhood,

  1. From the Server Manager, choose Routing|Routing. The Routing Configuration form appears.

  2. Select the resource you want to route by either choosing it from the Editing pull-down menu or clicking the Regular Expression button, entering a regular expression, and clicking OK.

  3. Select the radio button next to the text "Route through".

  4. Select the checkbox next to ICP.

  5. If you want the client to retrieve a document directly from the ICP neighbor that has the document instead of going through another neighbor to get it, select the checkbox next to the text "redirect".


Warning
Redirect is not currently supported by any clients, so you should not use the feature at this time.

  1. Click OK.



    Note You only need to enable routing through an ICP neighborhood if your proxy has other siblings or parents in the ICP neighborhood. If your proxy is a parent to another proxy and does not have any siblings or parents of its own, then you only need to enable ICP for that proxy. You do not need to enable routing through an ICP neighborhood.




Previous     Contents     Index     DocHome     Next     
Copyright © 2001 Sun Microsystems, Inc. Some preexisting portions Copyright © 2001 Netscape Communications Corp. All rights reserved.

Last Updated March 28, 2001