Siebel Business Analytics Server Administration Guide > Query Caching in the Analytics Server >
Strategies for Using the Cache
One of the main advantages of query caching is to improve apparent query performance. It may be valuable to seed the cache during off hours by running queries and caching their results. A good seeding strategy requires that you know when cache hits occur.
If you want to seed the cache for all users, you might seed the cache with the following query:
Select User, SRs
After seeding the cache using
Select User, SRs, the following queries will all be cache hits:
Select User, SRs where user = valueof(nq_SESSION.USER) (and the user was USER1)
Select User, SRs where user = valueof(nq_SESSION.USER) (and the user was USER2)
Select User, SRs where user = valueof(nq_SESSION.USER) (and the user was USER3)
When caching is enabled, each query is evaluated to see if it qualifies for a cache hit. A cache hit means that the server was able to use cache to answer the query and did not go to the database at all.
NOTE: The Analytics Server can use query cache to answer queries at the same or higher level of aggregation.
A cache hit occurs only if all of the conditions described in this section are met.
- WHERE clause semantically the same or a logical subset. For the query to qualify as a cache hit, the WHERE clause constraints need to be either equivalent to the cached results, or a subset of the cached results.
A WHERE clause that is a logical subset of a cached query qualifies for a cache hit if the subset meets one of the following criterion:
select employeename, region
from employee, geography
where region in ('EAST', 'WEST')
qualifies as a hit on the following cached query:
select employeename, region
from employee, geography
where region in ('NORTH', 'SOUTH', 'EAST', 'WEST')
where revenue < 1000
qualifies as a cache hit on a comparable query with the predicate:
where revenue < 5000
- A subset of columns in the SELECT list. All of the columns in the SELECT list of a new query have to exist in the cached query in order to qualify for a cache hit, or they must be able to be calculated from the columns in the query.
- Columns in the SELECT list can be composed of expressions on the columns of the cached queries. The Analytics Server can calculate expressions on cached results to answer the new query, but all the columns have to be in the cached result.
For example, the query:
select product, month, averageprice from sales where year = 2000
will hit cache on the query:
select product, month, dollars, unitsales from sales where year = 2000
because averageprice can be computed from dollars and unitsales (averageprice = dollars/unitsales).
- Equivalent join conditions. The resultant joined logical table of a new query request has to be the same as (or a subset of) the cached results to qualify for a cache hit.
- DISTINCT attribute the same. If a cached query eliminates duplicate records with DISTINCT processing (for example, SELECT DISTINCT...), requests for the cached columns have to also include the DISTINCT processing; a request for the same column without the DISTINCT processing will be a cache miss.
- Compatible aggregation levels. Queries that request an aggregated level of information can use cached results at a lower level of aggregation.
For example, the following query:
select supplier, region, city, qtysold
requests the quantity sold at the supplier and region and city level, while the following query:
select city, qtysold
requests the quantity sold at the city level. The second query would result in a cache hit on the first query.
- Limited additional aggregation. For example, if a query with the column qtysold is cached, a request for RANK(qtysold) results in a cache miss. Additionally, a query requesting qtysold at the country level can get a cache hit from a query requesting qtysold at the country, region level.
- ORDER BY clause made up of columns in the select list. Queries that order by columns not contained in the select list result in cache misses.
Running a Suite of Queries to Populate the Cache
To maximize potential cache hits, one strategy is to run a suite of queries just for the purpose of populating the cache. The following are some recommendations for the types of queries to use when creating a suite of queries with which to seed the cache.
- Common prebuilt queries.
Queries that are commonly run, particularly ones that are expensive to process, are excellent cache seeding queries. Queries whose results are embedded in Intelligence Dashboards would be good examples of common queries.
- SELECT lists with no expressions.
Eliminating expressions on SELECT list columns expands the possibility for cache hits. A cached column with an expression can only answer a new query with the same expression; a cached column with no expressions can answer a request for that column with any expression. For example, a cached request such as:
SELECT QUANTITY, REVENUE...
can answer a new query such as:
but not the reverse.
- No WHERE clause.
If there is no WHERE clause in a cached result, it can be used to answer queries satisfying the cache hit rules for the select list with any WHERE clause that includes columns in the projection list.
In general, the best queries to seed cache with are queries that heavily consume database processing resources and that are likely to be reissued. Be careful not to seed the cache with simple queries that return many rows. These queries (for example, SELECT * FROM PRODUCTS, where PRODUCTS maps directly to a single database table) require very little database processing. Their expense is network and disk overhead—factors that caching will not alleviate.
NOTE: When the Analytics Server refreshes repository variables, it will examine business models to determine if they reference those repository variables. If they do, the Analytics Server purges all cache for those business models.