About the BI Server Query Cache

You can configure the BI Server to maintain a local, disk-based cache of query result sets (query cache).

The query cache enables the BI Server to satisfy many subsequent query requests without accessing back-end data sources, thereby increasing query performance.

As updates occur on the back-end databases, the query cache entries can become stale. Therefore, you must periodically remove entries from the query cache using one of the following methods:

  • Manually. In the Model Administration Tool, in the Manage menu, select Cache to open the Cache Manager. The Cache Manager provides the most flexibility in choosing which cache entries to purge and when to purge them, but it requires manual intervention. See Use the Cache Manager.

  • Automatically. In the Model Administration Tool, you can disable cache for the system, set caching attributes for a specific physical table, and use BI Server event tables to purge cache automatically. See Monitor and Manage the Cache.

  • Programmatically. The BI Server provides ODBC-extension functions for purging cache entries programmatically. These functions give you the choice and the timing flexibility of the Cache Manager with the automation of event tables. You can write your own scripts to call these functions at times that fit your needs. See Purge and Maintain Cache Using ODBC Procedures.

The parameters that control query caching are located in Fusion Middleware Control and in the NQSConfig.INI file, described in Configuration File Settings. See Use Agents to Seed the Oracle BI Server Query Cache.

This section contains the following topics:

Query Cache Architecture

The query cache consists of cache storage space, cache metadata, and cache detection in query compilation.

The process of the BI Server accessing the cache metadata is very fast. If the metadata shows a cache hit, then the bulk of the query processing is eliminated, and the results are immediately returned to the user. The process of adding the new results to the cache is independent of the results being returned to the user; the only effect on the running query is the resources that are consumed in the process of writing the cached results.

Query cache entries are portable across different operating systems, such as Windows or Linux 64-bit architecture. Incompatible cache entries are automatically removed.

Note that query cache entries are not portable across different updates of Oracle Analytics Server.

Caching occurs by default at the subrequest level, which results in multiple cache entries for some SQL statements. Caching subrequests improves performance and the cache hit ratio, especially for queries that combine real-time and historical data. To disable subrequest caching, set the NQSConfig.INI file parameter DISABLE_SUBREQUEST_CACHING to YES. See Configuration File Settings.

Advantages of Caching

The fastest way to process a query is to skip the bulk of the processing and use a precomputed answer.

With query caching, Oracle BI Server stores the precomputed results of queries in a local cache. If another query can use those results, then all database processing for that query is eliminated. This can result in dramatic improvements in the average query response time.

In addition to improving performance, being able to answer a query from a local cache conserves network resources and processing time on the database server. Network resources are conserved because intermediate results aren't returned to Oracle BI Server . Not running the query on the database frees the database server to do other work. If the database uses a charge back system, then running less queries might also cut costs in the budget.

Another benefit of using the cache to answer a query is savings in processing time on Oracle BI Server , especially if the query results are retrieved from multiple databases. Depending on the query, there might be considerable join and sort processing in the server. If the query is already calculated, then this processing is avoided, freeing server resources for other tasks.

To summarize, query caching can dramatically improve query performance and reduce network traffic, database processing, and processing overhead.

Costs of Caching

Query caching has many obvious benefits, but also certain costs.

  • Disk space for the cache

  • Administrative costs of managing the cache

  • Potential for cached results being stale

  • CPU and disk I/O on server computer

With cache management, the benefits typically far outweigh the costs.

Disk Space

The query cache requires dedicated disk space.

How much space depends on the query volume, the size of the query result sets, and how much disk space that you choose to allocate to the cache. For performance purposes, use a disk exclusively for caching, and ensure that it is a high performance, high reliability type of disk system.

Administrative Tasks Associated with Caching

Some administrative tasks are associated with caching. You must set the cache persistence time for each physical table appropriately, knowing how often data in that table is updated.

When the frequency of the update varies, you must keep track of when changes occur and purge the cache manually when necessary.

You can also create a cache event polling table and modify applications to update the polling table when changes to the databases occur, making the system event-driven.

The Oracle BI Server also provides ODBC-extension functions for purging cache entries programmatically. You can write your own scripts to call these functions at the appropriate times.

Keep the Cache Up To Date

If the cache entries aren't purged when the data in the underlying databases changes, then queries can potentially return results that are out of date.

You must evaluate whether this is acceptable. It might be acceptable to allow the cache to contain some stale data. You must decide what level of stale data is acceptable and then configure (and follow) a set of rules to reflect those levels.

For example, suppose an application analyzes corporate data from a large conglomerate, and you're performing yearly summaries of the different divisions in the company. New data doesn't materially affect the queries because the new data affects only next year's summaries. In this case, the trade-offs for deciding whether to purge the cache might favor leaving the entries in the cache.

Suppose, however, that the databases are updated three times a day and you're performing queries on the current day's activities. In this case, you must purge the cache much more often, or perhaps consider not using the cache at all.

Another scenario is that you rebuild the dataset from the beginning at periodic intervals (for example, once per week). In this example, you can purge the entire cache as part of the process of rebuilding the dataset, ensuring that you never have stale data in the cache.

Whatever your situation, you must evaluate what is acceptable for noncurrent information returned to the users.

CPU Usage and Disk I/O

Although usually it is very minor, query caching does require a small amount of CPU time and adds to the disk I/O.

In most cases, the CPU usage and disk I/O is insignificant. The disk I/O might be noticeable only when queries return large datasets.

Cache Sharing Across Users

If shared logon is enabled for a particular connection pool, then the cache can be shared across users and doesn't need to be seeded for each user.

If shared logon isn't enabled and a user-specific database login is used, then each user generates their own cache entry.

See Managing Metadata Repositories for Oracle Analytics Server for information about enabling shared logon for connection pools.

About the Refresh Interval for XML Data Sources

Typically, XML data sources are updated frequently and in real time. Setting a refresh interval for XML data sources is analogous to setting cache persistence for database tables.

The refresh interval is a time interval after which the XML data sources are to be queried again directly, rather than using results in cache. This refresh interval is specified on the XML tab of the Connection Pool dialog.

The default interval setting is Infinite, meaning that the XML data source isn't automatically refreshed.

The refresh interval setting determines the time interval after which the Oracle BI Server XML Gateway connection is refreshed, as follows:

  • For URLs that begin with http:// or https://, the gateway is refreshed when it detects that the interval has expired.

  • For URLs that reside on a local or network drive, the gateway is refreshed when the interval has expired and the system detects that the URLs have been modified.

For more information about XML data sources, see Managing Metadata Repositories for Oracle Analytics Server.

About the Global Cache

In a clustered environment, Presentation Services can be configured to access a shared cache called the global cache.

This global cache resides on a shared file system storage device and stores purging events, seeding events (often generated by agents), and result sets that are associated with seeding events. The seeding and purging events are sorted by time and stored on the shared storage as a logical event queue. Individual Presentation Services nodes push to and pull from the logical event queue. Each Presentation Services still maintains its own local query cache for regular queries.

The figure below depicts global caching in a clustered environment. It shows three Presentation Services nodes sharing a global cache. The global cache stores seeding or purging events held in a logical event queue. The arrows from Node 2 and Node 3 to the shared cache show Presentation Services Node 2 pushing a seeding event to the queue and Presentation Services Node 3 pushing a purging event to the queue. The arrows from the shared storage to each Presentation Services node show each node pulling from the common location. This occurs on a periodic basis and enables participating Presentation Services nodes to obtain updates to the logical event queue made by other Presentation Services.

The Presentation Services node processes a seeding or purging event locally first in its caching system. It then pushes the event to the global cache on the shared storage. During the push event, the active Presentation Services node locks the logical event queue on the shared storage and then pushes in the seeding or purging event. If there's a conflict between seeding and purging (for example, one node wants to seed a query and another node wants to purge the same query), then the event that comes in last wins.

The logical event queue in the global cache on the shared storage is composed of seeding and purging events from individual Presentation Services nodes. The queue is sorted according to the timestamp of the events. Hence, clocks on all Presentation Services nodes participating in cluster must be synchronized.

Each Presentation Services node polls the global cache on a periodic basis for new cache entries. This polling frequency is configurable. A snapshot of the queued logical events on the shared storage is pulled back to the node and a local logical event queue is constructed and then processed.

Note:

The process of populating or purging seeded caches across all Presentation Services nodes that participate in the cluster doesn't occur in real time, and the elapse of the process is affected by multiple factors, such as the predefined polling interval, network bandwidth, and CPU loads.

Because the query cache result set tends to get large, network bandwidth might pose a constraint. Therefore, the following must be chosen carefully:

  • The set of caches that qualify for seeded cache

  • The time interval for Oracle Analytics nodes to pick up seeded caches from shared storage (to avoid network congestion)

The primary global cache parameters are configured in Fusion Middleware Control. Additional, optional parameters are configured in the NQSConfig.INI file for each Presentation Services node that participates in the cluster. See Use Fusion Middleware Control to Set Global Cache Parameters and Manually Edit Additional Global Cache Parameters.

A seeding or purging procedure is submitted to a specific Presentation Services node. If that Presentation Services is a node in a BI cluster and the global cache parameters have been defined in Presentation Services configuration files, then the seeding or purging events are propagated across all Presentation Services nodes that participate in the same clustered environment.