THE IMPORTANCE OF CACHING CANNOT BE OVERSTATED. If every portlet had to be freshly generated for each request, portal performance could become unacceptably slow. Portlets should not require heavy or frequent processing. Caching is the functionality that allows the Portal Server to request portlet content, save this content, and return the saved content to users when appropriate. Caching is indexed on the settings sent by the portlet.
Caching on the Portal Server can be set in two ways: programmatically through HTTP headers and/or using the administrative settings in the Portlet Web Service Editor. You should always implement caching programmatically, although the administrator can still choose to override caching through administrative settings.
This page explains how caching should be implemented in portlet development. For a full explanation of HTTP caching, see RFC 2616 (http://www.w3.org/Protocols/rfc2616/rfc2616.html)
Caching is the temporary storage of Web objects (such as HTML documents) for later retrieval. There are three significant advantages to caching:
Reduced bandwidth consumption (fewer requests and responses across the network)
Reduced server load (fewer requests for a server to handle)
Reduced latency (responses to cached requests are available immediately, and are easily accessible to the client being served)
Efficient caching makes every Web application faster and less expensive. The only portlets that should not be cached are those in which data must be continuously updated.
The Portal Server relies on caching to improve performance. Portlet content is cached by the Portal Server and returned when later requests match the cache’s existing settings.
Note: By default, the page ID is not factored in when content is added to the Portal Server cache, so you could get a cached copy of a portlet from page A while on page B. You can choose to add the page ID to the cache key by sending it to the portlet. For details, see Portlet Cache Key below.
When the Portal Server processes a request for a portal page, it looks individually at each portlet on the page and checks it against the cache:
The Portal Server assembles a cache key used to uniquely identify each portlet in the cache.
The Portal Server checks the cache for a matching cache key entry:
If the Portal Server finds a match that is not expired, it returns the content in the cache and does not make a request to the remote server.
If there is no matching cache key for the portlet or if the cache key has expired, the Portal Server makes a request to the remote server. If the matching cache entry uses Etag or Last-Modified caching, the Portal Server also sends the appropriate caching header to the remote server in the request.
The response comes back from the remote server; the Portal Server checks for caching headers:
If the headers include an Expires header, the Portal Server stores the new portlet content (along with a new expiration date) in its cache.
If the headers use Etag or Last-Modified caching, the existing cache entry might be revalidated (in the case of ‘304-Not Modified’) or new portlet content might be stored in the cache.
The portal caches gatewayed content to complement, not replace, browser caching. Public content is accessible to multiple users without any user-specific information (based on HTTP headers). The gateway calculates the cache headers sent to the browser to ensure that the content is properly cached on the client side.
In version 5.0 and above, the portlet cache contains sections of finished markup and sections of markup that require further transformation. Post-cache processing means content can be more timely and personalized. Adaptive tags enable certain portlets (i.e., Community banners) to be cached publicly for extended periods of time and yet contain user specific and page-specific information, as well as the current date and time.
The cache key for a Portlet entry consists of the following:
All seven types of settings stored in the ALI database: Portlet settings, User settings, Community settings, CommunityPortlet settings, Administrative settings, Session preferences, and User Information
All three values for the CanSet header: CanSetPersonal, CanSetCommunity, CanSetAdmin
User ID (only if private caching is used)
URI to the page on the remote server
Portlet last-modified date
The data below can be added to the cache key by setting options in the Web Service editor (on the Advanced Settings page).
Experience Definition ID
Activity Rights (only the Activity Rights configured in the Web Service editor are included in the cache key)
The data below is deliberately not included in the cache key:
StyleSheetURI: Portal stylesheets are applied at runtime, depending on the user preference. Portlet content does not depend on the particular stylesheet that the user has selected.
HostpageURI: All parts of the Hostpage URI value are covered separately. The cache key includes Community ID, so it already distinguishes between My Pages and Community pages. The User ID is added if private caching is used.
The Portal Server caches all text (i.e., nonbinary) content returned by GET requests. Even if gateway caching is disabled (via PTSpy), portlet caching still takes place.
Gatewayed content can be cached by a proxy server or by the user’s browser. Beware browser caching of gatewayed content; it is a good idea to clear your browser cache often during development. (In Internet Explorer, select Tools | Internet Options and click the Delete Files button on the General tab under Temporary Internet Files.)
An incorrectly set Expires header can cause browsers to cache gatewayed content. See HTTP Headers below for details on using the Expires header.
Portlet caching is controlled both by the programmer and by the administrator who registers the Portlet object using the Portlet editor. A portlet’s caching strategy should take all possibilities into account and use the most efficient combination for its specific functionality.
Each and every portlet needs a tailored caching strategy to fit its specific functionality. A portlet that takes too long to generate can degrade the performance of every My Page that displays it.
These questions can help you determine the appropriate caching strategy:
Will the content accessed by the portlet change? How often?
How time-critical is the content?
What processes are involved in producing portlet content? How expensive are they in terms of server time and impact?
Is the portlet the only client with access to the back-end application?
Is the content different for specific users?
Can users share cached content?
Determine how often portlet content must be updated, dependent on data update frequency and business needs. Find the longest time interval between data refreshes that will not negatively affect the validity of the content or the business goals of the portlet.
Since caching is indexed on the settings used by a portlet, new content is always requested when settings change (assuming that no cached content exists for that combination of settings).
There are two common situations in which a developer might mistakenly decide that a portlet cannot be cached:
In-place refresh: You might think that caching would "break" a portlet that uses in-place refresh, because the portlet would be redirected to the original (cached) content. This can be avoided if a unique setting is updated on every action that causes a redraw, effectively "flushing" the cache. (In-place refresh renews the portlet display by causing the browser to refresh the portal page at a set interval.)
Invisible preferences: If the content of the portlet is dependent on something other than preferences (e.g., the portlet keys off the User ID to display a name or uses ALI security to filter a list), caching can still be implemented with “invisible preferences” (in this case, User ID). As with in-place refresh, invisible preferences are set solely for the purpose of creating a different cache entry. They are set programmatically, without the user’s knowledge.
There are many ways to control caching on the Portal Server. The sections that follow explain basic HTTP cache headers and portal settings. For sample code that shows how to implement caching within the structure of an application, see the sample code on the ALUI Developer Center on dev2dev. Note: In ALI 6.0, portlet caching cannot exceed 15 minutes.
RFC 2616 provides thorough guidelines and requirements for HTTP caching. There are two major types of HTTP caching headers: Expiration and ETag/Last-Modified.
Expiration caching says to the Portal Server, “Do not send a request for X seconds.” The Portal Server abides by these headers only if the cached portlet content was generated between the minimum and maximum times set by the administrator in the Portlet editor. The following headers allow you to set expiration caching
Last-Modified caching says to the Portal Server, “Send a request each time, but I might respond that nothing has changed and prompt you to use cached content.” The following headers allow you to set expiration caching
The Expires header specifies when content will expire, or how long content is “fresh.” After this time, the Portal server will always check back with the remote server to see if the content has changed.
Most Web servers allows setting an absolute time to expire, a time based on the last time that the client saw the object (last access time), or a time based on the last time the document changed on your server (last modification time).
In JSP, setting caching to forever using the Expires header is as simple as using the code that follows:
The .NET Framework’s System.Web.HttpCachePolicy class provides a range of methods to handle caching, but it can also be used to set HTTP headers explicitly (see MSDN for API documentation: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemwebhttpcachepolicyclasssetexpirestopic.asp).
The Response.Cache.SetExpires method allows you to set the Expires header in a number of ways. The following code snippet sets it to forever:
Note: In .NET, the Web Form page (.aspx) can also use standard ASP methods to set HTTP headers.
Don’t use Expires = 0 to prevent caching. The Expires header is sent by the remote server and passed through to the browser by the Portal Server. Unless the time on all three machines is synchronized, an Expires=0 header can mistakenly return cached content. For example, imagine the following situation:
Assume the remote server is three minutes ahead of the user's machine (i.e., it's 12:03 on remote server when it is 12:00 for the user). The user views a portlet page at 12:00 (client time) and gets an Expires = 0 header from the remote server, telling content to expire immediately, at 12:03 (remote server time). However, the browser on the user's machine caches the page, since from the browser's standpoint, the expires date is in the future. When the user looks at the same page again, at 12:01, the browser shows the cached copy, not the new page.
This situation commonly happens with gatewayed preference pages. If the time is always synchronized between the Portal Server, remote server, and the user's machine, none of this will happen. However, that cannot be realistically achieved; even if time on Portal Server and remote server is synchronized, time on the user's machine might still not be accurate.
To solve this problem, instead of using Response.Expires = 0, set the Expires header to a fixed date that is definitely in the past. For example, set the date to January 1, 1970. In JSP, the response.setDateHeader method takes in a header name and an integer that specifies the date in terms of milliseconds since midnight on January 1, 1970. The following code expires the page on January 1, 2003.
The Cache-Control header can be used to expire content immediately or disable caching altogether. The value of this header determines whether cached portlet content can be shared among different users. I
public allows any cached content to be shared across users with identical sets of preferences using the same Portal Server. This value should be used whenever possible.
private tells the Portal Server not to share cached content. The User ID is added to the cache key so that a separate copy is retained in the cache for each individual user. This value should only be used to protect sensitive information, e.g., an e-mail inbox portlet. (User settings can also make public content effectively private.)
max-age=[seconds] specifies the maximum amount of time that an object is considered fresh. Similar to the Expires header, this directive allows more flexibility. [seconds] is the number of seconds from the time of the request that the object should remain fresh.
must-revalidate tells the cache that it must obey any freshness information it receives about an object. HTTP allows caches to take liberties with the freshness of objects; by specifying this header, you're telling the cache to strictly follow your rules.
no-cache disables caching completely. Neither the client nor the Portal Server responds to subsequent requests with a cached version.
In JSP, use the setHeader method to configure the Cache-Control header:
The example below expires the content immediately using the maximum age header.
In the .NET Framework, the Cache-Control header is accessed through the System.Web.HttpCachePolicy class. To set the header to public, private or no-cache, use the Response.Cache.SetCacheability method:
To set a maximum age for content in .NET, use the Response.Cache.SetMaxAge method. The example below expires the content immediately.
TimeSpan ts = new TimeSpan(0,0,0);
To set the header to must-revalidate in .NET, use the Response.Cache.SetRevalidation method.
A value of no-cache in the Pragma header prevents caching only when used over a secure connection. However, a Pragma=no-cache tag is treated identically to Expires=-1 if used in a nonsecure page; the page is cached, but immediately expired.
Note: There are some known issues in IE with disabling caching using the Pragma tag, due to browser buffering. See the Microsoft Knowledge Base for details.
The Last-Modified response header specifies the last time a change was made in the returned content, in the form of a time stamp. When an object stored in the cache includes a Last-Modified header, the Portal Server can use this value to ask the remote server if the object has changed since the last time it was seen. The Portal Server sends the value from the Last-Modified header to the remote server in the If-Modified-Since Request header. The portlet code on the remote server uses this header to determine if the content being requested has changed since this date, and responds with either fresh content or a 304 Not Modified Response. If the Portal Server receives the latter, it displays the cached content.
JSP portlets can access the value in the If-Modified-Since request header using the getLastModified(HttpServletRequest req) method provided by the Java class HttpServlet.
In .NET, the Response.Cache.SetLastModified method allows you to set the Last-Modified header to the date of your choice. Alternately, the SetLastModifiedFromFileDependencies method sets the header based on the time stamps of the handler’s file dependencies.
Note: In order for validators such as the Last-Modified header to work, the server must generate valid Last-Modified headers and the server clock must be reliable.
The ETag header is new to HTTP 1.1; it is very similar to Last-Modified. The main difference is that ETag does not have to be a time stamp. ETag values are unique identifiers that are generated by the server and changed every time the object is modified.
The remote server sends the ETag header to the Portal Server with portlet content. When another request is made for the same content, the Portal Server sends the value in the ETag header back to the remote server in the If-None-Match header. The portlet then determines, based on the value received, whether to send back fresh content or a 304 Not Modified response.
Some methods in the JSP IDK use the ETag header to implement caching. The methods use an entity key based on the Portlet ID and Community ID or allow the developer to pass in a unique key.
In .NET, use the Response.Cache.SetETag method to pass in the string to be used as the ETag. The SetETagFromFileDependencies method creates an ETag by combining the file names and last modified timestamps for all files on which the handler is dependent.
On the HTTP Configuration page of the Portlet Web Service editor, the portal administrator can set minimum and maximum amounts of time for validation of cached portlet content. The default is a minimum of 0 seconds and a maximum of 20 days. (For details on the Portlet Web Service editor, see ALI Portlet Configuration.)
The minimum and maximum caching settings in the Portlet editor affect caching as follows.
The Portal Server never makes a request to the remote server before the Minimum Cache Time if there is content in the cache. (In version 6.0, the portlet cache is limited to 15 minutes, so a request will always be made after 15 minutes.) Multiple requests made for the same portlet with identical cachekeys within this minimum time always receive cached content. As noted earlier, setting the Cache-Control header to “no-cache” overrides editor caching settings; content will not be cached.
The Portal Server always makes a request to the remote server after the Maximum Cache Time. Cached content might or might not be returned, based on other information (i.e., the Last-Modified header).
The Portal Server might or might not make a request to the remote server if content has been cached in between the Minimum and Maximum Cache Time. The Portal Server observes programmatic caching (i.e., the Expires header) in the window between the minimum and maximum times.
Note: Setting the Cache-Control header to “no-cache” overrides editor settings; content will never be cached.
For example, the minimum caching time for a particular portlet is set to ten minutes, and the maximum caching time is set to one hour. Client A requests the portlet content. Five minutes later, Client B, with an identical set of preferences, requests the same content. Five minutes is under the minimum caching time set in the Portlet editor, so cached content is returned, no matter what type of programmatic caching has been implemented by the portlet. (Remember, the Portal Server only abides by headers if cached content was generated between the minimum and maximum caching times set in the editor. An Expires header set to two minutes does not refresh the cache in this example.) If no copies of the content existed for Client B’s particular collection of settings or no content was cached, the remote server would be called to generate content that matched that group of settings.
To continue the example, Client A requests the portlet content again, and there is a matching copy of the content in the cache that is 15 minutes old. This is over the minimum caching time and under the maximum. In this case, whether or not new content is generated depends on the HTTP headers sent by the portlet. If the portlet has not specified any caching programmatically, the Portal Server asks the remote server for fresh content. If the portlet set the Expires header to 30 minutes, new content is not generated. If ETag or Last-Modified caching was implemented, new content is only returned if content has changed.
Finally, Client A requests the same content two hours later, and the matching copy was generated more than an hour before. Since this is over the maximum caching time set in the Portlet editor, the Portal Server requests new content from the remote server, regardless of the caching specified programmatically by the portlet. Of course, if the portlet has implemented ETag or Last-Modified caching, new content is only returned if content has changed.
If your portlet requires specific editor settings for its caching strategy, you must include this information in your Installation Guide.
Always use programmatic caching. Using HTTP headers to control caching is always preferable. Administrators can override some programmatic caching, but they cannot be relied upon to set caching correctly. For example, instead of requiring the portal administrator to set a minimum caching setting, send an Expires header
While caching is an integral and necessary part of portlet design, it is helpful to disable it while developing and debugging. Otherwise, it can be very difficult to view the results of any modifications you have made. To disable the caching implemented by the Portal Server, go to the HTTP Configuration page of the Portlet Web Service editor (shown under Portlet Settings above) and set the minimum and maximum caching times to 0. Clear the checkbox marked “Suppress errors where possible (use cached content instead).”
Note: After the code has been developed and debugged, make sure to turn caching on and test the performance of your portlet. For details on troubleshooting portlets, see Portlet Debugging. If you using the ALI Logging Utilities to debug caching, turn on all types of tracing for the OpenKernel.OpenHttp.Cache component.
Next: Portlet Testing Checklist