Tracking Content Access

9 Tracking Content Access

This chapter describes how to obtain information about the activity of content items in a Content Server instance, which can be tracked using the Oracle WebCenter Content Server Content Tracker component.

This chapter covers the following topics:

9.1 About Content Tracker

Content Tracker is an optional component automatically installed with Oracle WebCenter Content Server. The Content Tracker component when enabled provides information about system usage such as which content items are most frequently accessed and what content is most valuable to users or specific groups. Knowing the consumption patterns of an organization's content enables more effective delivery of appropriate, user-centric information.

For detailed information about customizing Content Tracker, see Developing with Oracle WebCenter Content.

9.2 Understanding the Content Tracker Functionality

Content Tracker monitors activity on a Content Server instance and records selected details of those activities. It then generates reports that illustrate the way the system is being used. This section includes an overview about Content Tracker and Content Tracker Reports functionality.

Content Tracker incorporates several optimization functions which are controlled by configuration variables. The default values for the variables set Content Tracker to function as efficiently as possible for use in high volume production environments. For more information about Content Tracker configuration variables, see Configuration Reference for Oracle WebCenter Content.

This section covers the following topics:

9.2.1 Content Tracker

Content Tracker monitors a system and records information collected from different sources about activities. The information is merged and written to a set of tables in the Content Server database. Content Tracker can monitor activity based on:

Content item usage: Data is obtained from Web filter log files, the Content Server database, and other external applications such as portals and websites. Content item access data includes dates, times, content IDs, and current metadata.
Services: Services that return content, and services that handle search requests, are tracked. By default, Content Tracker logs only the services that have content access event types but by changing the configuration, Content Tracker can monitor any service, even custom services.
User accesses: Information is gathered about non-content access events, such as the collection and synthesis of user profile summaries. This data includes user names and user profile information.

9.2.2 Data Recording and Reduction

Content Tracker records data from the following sources:

Web server filter plug-in: When content is requested with a static URL, the Web server filter plug-in records request details, saving the information in event log files. The event log files are used as input by the Content Tracker data reduction process.
Service handler filter: Content Tracker monitors services that return content. When one of these services is called, details of the service are copied and saved in the SctAccessLog table.
Logging service: Content Tracker supports a single-service call to log an event. You can call it directly with a URL, as an action in a service script, or from Idoc Script.
Database tables: When configured to collect and process user profile information, a data reduction process queries selected database tables to obtain information about active users during the reporting period.
Application API: An interface is available to register other components and applications for tracking. This interface allows cooperating applications, such as Site Studio, to log event information in real time. The application API is designed as a code-to-code call which does not involve a service. The API is not meant for general use. If you are building an application and are interested in using this interface, contact Consulting Services.

The data reduction process gathers and merges the data obtained from the data recording sources. Until this reduction process has finished, the data in the Content Tracker tables is incomplete. The reduction is run one time for each day's data gathered. You can run the reduction manually, or schedule it to run automatically, usually during an off-peak period when the system load is light.

9.2.3 Content Tracker Terminology

The following terminology is used with Content Tracker:

Data collection: Gathering content access information and writing the information to event log files.
Data reduction: Processing the information from data collection and merging it into a database table.
Data Engine Control Center: The interface that provides access to the user-controlled functions of the Data Engine. It has the following tabs:
- Collection: Used to enable data collection.
- Reduction: Used to stop and start data reduction (merging data into database tables).
- Schedule: Used to enable automatic data reduction.
- Snapshot: Used to enable activity metrics. The term snapshot also denotes an information set representing the world at a particular time.
- Services: Used to add, configure, and edit service calls to be logged. It is also used to define the specific event details logged for a given service.
Service definitions: The ResultSet structure in the service call configuration file (SctServiceFilter.hda) that contains entries to define each service call to be logged. The service definition ResultSet is named ServiceExtraInfo.
Service entry: The entry in the service definition ResultSet (ServiceExtraInfo) that defines a service call to be logged. The ServiceExtraInfo ResultSet contains one service entry for each service to be logged.
Field map: A secondary ResultSet in the service call configuration file (SctServiceFilter.hda) that defines the service call data and the specific location where data is to be logged.
Top Content Items: Most frequently accessed content items in the system.
Content Dashboard: A page that provides overview information about the access of a specific content item.

9.2.4 Installation Considerations

Content Tracker is supported on most hardware and networked configurations but some hardware and software combinations require special consideration:

Set the SctUseGMT configuration variable to true to use Greenwich Mean Time (GMT). It is set to false by default, to use local time. For more information about configuration variables see Configuration Reference for Oracle WebCenter Content.

When upgrading from an earlier version of Content Tracker there is a one-time retreat (or advance, depending on location) in access times. To accommodate the biannual daylight savings time changes, discontinuities in recorded user access times are used (contingent on the use of local time and the location).

9.3 Operational Details

Depending on Content Tracker configuration, it can perform Data Collection of event information such as dynamic and static content accesses and service calls. Both types of data are recorded in a Combined Output Table(SctAccessLog). Service calls are inserted into the log in real time but the static URL information must first undergo the Data Reduction process (either manual or scheduled).

After activity data is collected, Content Tracker combines, analyzes and synthesizes the event information and loads the summarized activity into database tables.

9.3.1 Data Collection

Data collection is the initial step in any tracking function. Content Tracker data collection includes collecting information from static URL references and service call events. Data is collected using several different methods:

9.3.1.1 Service Handler Filter

Using the service handler filter, Content Tracker obtains information about dynamic content requests that come through the Web server, and also about other types of activity, such as calls from applications. The service request details are obtained from the DataBinder that accompanies the service call, and the information is stored in the Combined Output Table (SctAccessLog) in real time.

The SctServiceFilter.hda configuration file is used to determine which service calls are logged. It uses a ResultSet structure that includes one service definition entry for each service to be logged. When using the extended service logging function, the file also contains field maps corresponding to service definition entries.

The ServiceExtraInfo ResultSet is included in the SctServiceFilter.hda file. This ResultSet contains one or more service entries defining the services to be logged. Additional field map ResultSets are used to support the extended service logging function. Each service that has additional data values tracked must have a field map ResultSet in the SctServiceFilter.hda file to define the data fields, locations, and database destination columns for the related service.

9.3.1.2 Web Server Filter Plug-in

Managed content retrieved with a static URL does not usually invoke a service. The Content Tracker Web server filter plug-in collects the access event details (static URL references) and records them in raw Content Tracker Event Logs (sctlog files). The information in these files requires an explicit reduction (either interactive or scheduled) before it is included in the Combined Output Table with the service call data.

9.3.1.3 Logging Service

The logging service is a single-service call that can be called directly with a URL or as an action in a service script. It can also be called from Idoc Script using the executeService function. The calling application is responsible for setting the fields to be recorded in the service DataBinder, including the descriptive fields listed in the Content Tracker service filter configuration file (SctServiceFilter.hda).

There should be no duplication or conflicts between services logged with the service handler filter and those logged with the Content Tracker logging service. If a service is named in the Content Tracker service handler filter file then such services are automatically logged so there is no need for the Content Tracker logging service to do it. However, Content Tracker does not attempt to prevent such duplication.

9.3.1.4 Enabling or Disabling Data Collection

To enable or disable data collection:

Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Collection tab, select (to enable collection) or clear (to disable collection) the Enable Data Collection check box.
Click OK.

Do not exit the applet. Wait until a confirmation message displays. If you exit before the confirmation message, the requested change(s) may not occur.
After the confirmation message is displayed, click OK.
Restart the Web server and Content Server, in that order.

9.3.2 Data Reduction

During data reduction, the static URL information captured by the Web server filter plug-in is merged and written into the output table the service call data. Depending on configuration, at the time of the reduction the Content Tracker user metadata database tables are also updated with information collected from the static URL accesses and from the service call event records:

9.3.2.1 Standard Data Reduction Process

During the data reduction process, the static URL information is extracted from the raw data files and combined with the service information stored in the SctAccessLog table. By default, Content Tracker collects and records data only for the SctAccessLog table. Although the user data output tables exist, Content Tracker does not populate them.

Depending on how Content Tracker is configured, this reduction process can:

Combine access information for static URL content access with service details.
Summarize information about user accounts that were active during the reporting period. This information is written to the Content Tracker's user metadata database tables.

9.3.2.2 Data Reduction Process with Activity Metrics

Content Tracker provides the option to selectively generate search relevancy data and store it in custom metadata fields. You can use the snapshot function to choose which activity metrics to activate. The logged data provides content item usage information that indicates the popularity of content items.

By default, Content Tracker collects and records data only for the SctAccessLog table. Although the user data output tables exist, Content Tracker does not populate them unless the Snapshot function is activated. However, using the snapshot function affects Content Tracker's performance.

If the snapshot function and activity metrics are activated, the values in the custom metadata fields are updated following the reduction processing phase. When users access content items, the values of the applicable search relevance metadata fields change. During the later post-reduction step, Content Tracker uses SQL queries to determine which content items were accessed during the reporting period. Content Tracker updates the database table metadata fields with the new values and initiates a re-indexing cycle. However, only the content items whose access count metadata values have changed are re-indexed.

The post-reduction step is necessary to process and tabulate the activity metrics for each affected content item and to load the data into the assigned custom metadata fields. It also initiates a re-indexing cycle on the content items with changed activity metrics values to ensure that the data is part of the search index and is accessible to select and order search results.

Figure 9-1 Data Reduction Process with Activity Metrics

This figure is described in surrounding text

9.3.2.3 Data Reduction Cycles

Reduced table data is moved from the primary tables to the corresponding archive tables when the associated raw data is moved from recent to archive status. The primary tables contain the output for reduction data in the new and recent cycles and the archive tables contain output for reduction data in the archive cycle.

Raw data is demoted from new to recent when the data is reduced and it is older than day. Thus, the 'new cycle' indicates that the data is for the current day or is unreduced data from previous dates. The 'recent cycle' indicates that the data is from yesterday or earlier and has been reduced.

Raw data is demoted to archive (and the corresponding rows in the SctAccessLog table are moved to the SctAccessLogArchive table) when the number of recent sets reaches a configured threshold number and a reduction process is run, either manually or through the scheduler. For more information about configuring the threshold number for recent sets, see the SctMaxRecentCount configuration variable information in Configuration Reference for Oracle WebCenter Content. If a reduction process is never run, the raw data remains in the recent cycle indefinitely.

9.3.2.4 Access Modes and Data Reduction

The access mode used to access content items determines how those accesses are recorded in the SctAccessLog table. If content items are accessed through a service (that is, viewing the actual native file), the events are recorded in the SctAccessLog table in real time. In this case, the activity is recorded immediately and is not dependent on the reduction process.

If content items are accessed using static URLs (that is, viewing the Web location file), the Web server filter plug-in records the events in a static log file. During the data reduction process, the static log files for a specified date are gathered and the data is moved into the SctAccessLog table. In this case, if data reduction is not performed for a given date, there are no static URL records in the SctAccessLog and no evidence that these accesses ever occurred.

The difference in the way static and service accesses are processed has implications for interval counts. For example, a user might access a content item twice on Saturday, one time through the Web location file (static access) and one time through the native file (service access). The service access is recorded in the SctAccessLog table but the Web location access is not. If Sunday's data is reduced, only the service access (not the static access) is included in the summaries of the short and long access count intervals. However, if Saturday's data is also reduced, both the service and static accesses are recorded in the SctAccessLog table and included in the both intervals.

9.3.2.5 Reduction Sequence for Event Logs

Data sets are usually reduced in chronological (calendar) order to ensure that the information included in reports is current. The order in which the raw data log files are reduced determines what specific user access data is logged and counted. During reduction, the SctAccessLog and user metadata database tables are modified with data from the raw data files.

When using the snapshot function to gather search relevance information, the metadata fields associated with the activated activity metrics are also updated during data reduction. The activity metrics use custom metadata fields included in the DocMeta database table.

Content Tracker changes the activity metrics values according to the applicable data in the reduction data set. To ensure that data values are complete and current, perform data reduction on a daily basis. If the data sets are reduced out of order, re-reducing the current or most recent data set corrects the counts. However, it is always preferable to consistently reduce data in calendar order.

The following scenarios show how the reduction sequence affects the stored data.

Scenario 1:

Depending on how content items are accessed, if activity on certain days (such as Saturdays and Sundays) is never reduced, then accesses that occur on those days might never be logged or counted. For more information, see Access Modes and Data Reduction. Similarly, if a content item is accessed on Tuesday and reductions are done for Monday and Wednesday, the Tuesday access is might not be counted toward the last access of that content item.

Scenario 2:

If there was a significant increase in accesses in the last few days, and you reduce data from two weeks earlier, the long and short access metrics for content items do not reflect the recent activity. Instead, the interval values from two weeks earlier override today's values. Reducing the current or most recent data set corrects the counts.

The reduction order does not adversely affect the Last Access date. The reduction process only changes the Last Access date if the most recent access in the reduction data set is more recent than the current Last Access value in Content Server's DocMeta database table.

If you have reduced a recent data set and a particular content item had been accessed, the Last Access field is updated with the most recent access date in the reduction data set. If you then re-reduce an older data set, the older access date for this content item does not overwrite the current value.

Scenario 3:

Reducing the data sets in an arbitrary order interferes with the demotion of "recent" data files to "archive" data files. The movement of the associated table records is based on the age, archive tables are intended to store the "oldest" data. If the data sets are reduced in random order, it is not apparent which data is the oldest.

For more information about recent and archive data files, see User Metadata Tables and Data Reduction Cycles.

9.3.2.6 Reduction Schedules

You can configure reduction runs to run on a scheduled basis to periodically reduce the raw data. A steady flow of raw data goes into the recent and archive repositories, and a similarly steady flow of reduced data goes from the primary tables to the archive tables.

Note that if the Content Tracker Data Engine is disabled the day before a scheduled reduction run, no data is collected. If it is enabled on the day of the scheduled reduction run, the scheduler does not run because no data is available.

Data reductions scheduled for a given day are performed on data collected during the previous day. The previous day is defined as the 24-hour period beginning and ending at midnight (system time). To conserve CPU resources, you can schedule reduction runs for early morning hours when the system load is generally the lowest.

An error can be issued if the scheduled reduction is set to run within a few minutes after midnight. If this occurs, reschedule the reduction to run later.

9.3.2.7 Running Data Reduction Manually

To manually reduce data:

Choose Administration then Content Tracker Administration from the main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Reduction tab, click (to highlight) the set of input data to reduce. Information on this page includes the following:
- Cycle: The status of the input data. Values include new (the input data has not been reduced), recent (data has been reduced but is not archived), and archive (data has been reduced and remains in archive cycle until deleted).
- Available date: Date when the data was collected.
- Status: The status of the reduction. Values include ready (input data is available to be reduced), running (data is being reduced), and archiving (data is being moved from recent to archive cycle).
- Percent Done: the progress of the reduction cycle.
- Completion Date: Date and time the reduction completed.
Click Reduce Data.
Click Yes to reduce the data.

Note:

If the current date's data is reduced, the status in the Cycle column stays as 'new' even though the data is reduced.

9.3.2.8 Setting Data Reduction to Run Automatically

To set data reduction to run automatically:

Choose Administration then Content Tracker Administration from the main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Schedule tab, select the Scheduling Enabled check box.
Select check boxes for the days when data reduction occurs.
Select the hour and minute when data reduction occurs.
Click OK.

Do not exit the applet. Wait until a confirmation message displays. If you exit before the confirmation message, the requested change(s) may not occur.
Click OK when the confirmation message is displayed.

9.3.2.9 Deleting Data Files

To delete data files:

Choose Administration then Content Tracker Administration from the main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Reduction tab, click (to highlight) the set of input data to delete.
Click Delete or Delete Archive.
Click OK to delete the data.

9.3.3 Content Tracker Event Logs

Content Tracker supports multiple input files for different event log types and for configurations with multiple Web servers. Each Web server filter plug-in instance uses a unique tag as a file name suffix for the event logs. The unique suffix contains the Web server host name plus the server port number.

The reduction process searches for and merges multiple raw event logs named sctLog-yyyymmdd-myhostmyport.txt. The raw event logs are processed individually.

Content Tracker may not always capture a user name for a content access event, even if the user is logged into Content Server. In this case, the item was accessed with a static URL request and, in general, the browser does not provide a user name unless the Web server asks it to send the user's credentials. If the item is public content, the Web server does not ask the browser to send user credentials, and the user accessing the URL is unknown.

To record the user name for every document access, make sure the content is not accessible to the guest role. If the content is not public, the user's credentials are required to access the items and a user name is recorded in the raw event log entry.

Depending on Content Tracker configuration, when raw data log files in the new cycle are reduced, the Data Engine moves the data files into the following subdirectories:

The default number of data sets that the recent/ directory can hold is 60 sets (dates) of input data log files. When the number of data sets is exceeded, the eldest are moved to the /archive directory.

cs_root/data/contenttracker/data/recent/yyyymmdd/
By default, Content Tracker does not archive data. Instead, the expired rows are discarded to ensure optimal performance. If appropriately configured, Content Tracker uses the archive/ directory to hold all input data log files that were moved out of the "recent" cycle.

cs_root/data/contenttracker/data/archive/yyyymmdd/

When raw data files are reduced, another file (reduction_ts-yyyymmdd.txt) is generated as a time stamp file.

9.3.4 Combined Output Table

The SctAccessLog table contains entries for all static and dynamic content access event records. The SctAccessLog table is organized using one line per event in the reporting period. The rows in the table are tagged according to type:

S indicates the records logged for service calls.
W identifies the records logged for static URL requests.

By default, Content Tracker does not log accesses to GIF, JPG, JS, CSS, CAB, and CLASS file types. Therefore, entries for these file types are not included in the combined output table after data reduction.

The Content Tracker Web server filter plug-in cannot distinguish between URLs for user content and those used by the user interface. References to UI objects, such as client.cab, can appear in the static access logs. To eliminate these false positives, use the SctIgnoreDirectories configuration variable to define a list of directory roots to be ignored by the Content Tracker filter. To log these file types, change the default setting for the SctIgnoreFileTypes configuration variable to the type (gif,jpg,js,css). For more information about using configuration variables, see Configuration Reference for Oracle WebCenter Content.

The following table describes the information collected for each record in the SctAccessLog table. By default, Content Tracker does not collect data to populate certain columns for bulky and rarely used items.

Column Name	Type /Size	Column Definition
SctDateStamp	datetime	Local date when data collected in format YYYYMMDD, depending on customer location and time of day event occurs. This may differ from date recorded for eventDate. Time set to 00:00:00 Data source: Internal
SctSequence	int /8	Sequence unique to entry type Data source: Internal
SctEntryType	char /1	Entry type. Values are W or S Data source: Internal
eventDate	datetime	GMT time and date when request completed. The date depends on customer location and time of day event occurs. This may differ from date recorded for SctDateStamp)
SctParentSequence	integer	Sequence of outermost Service Event in tree, if any.
c_ip	varchar /15	IP of client
cs_username	varchar /255
cs_method	varchar /10	GET
cs_uriStem	varchar /255	Stem of URI
cs_uriQuery	varchar /[maxUrlLen]	Query portion. For example, `IdcService=GET_FILE&dID=42...`
cs_host	varchar /255	Content Server server name
cs_userAgent	varchar /255	Client User Agent Ident By default, this column contains either `browser` or the suffix of any string beginning with `java:`. This simplification ensures optimal performance for Content Tracker.
cs_cookie	varchar /[maxUrlLen]	Current cookie
cs_referer	varchar /[maxUrlLen]	URL leading to this request
sc_scs_dID	int /8	dID Data source: from query or derived from URL (reverse lookup)
sc_scs_dUser	varchar /50	dUser Data source: Service DataBinder `dUser`
sc_scs_idcService	varchar /255	Name of IdcService. For example, `GET_FILE` Data source: from query or Service DataBinder `IdcService`
sc_scs_dDocName	varchar /30	dDocName Data source: from query of Service DataBinder `dDocName`
sc_scs_callingProduct	varchar /255	Arbitrary identifier Data source: SctServiceFilter config file or Service DataBinder `sctCallingProduct`
sc_scs_eventType	varchar /255	Arbitrary identifier Data source: SctServiceFilter config file or Service DataBinder `sctEventType`
sc_scs_status	varchar /10	Service execution status Data source: Service DataBinder `StatusCode`
sc_scs_reference	varchar /255	`web`, `native`, `sdc_url` Values indicate the rendition of the accessed file. `web` is a converted file (PDF), `native` is the original file and `sdc_url` is HTML. Data source: algorithmically from query parameters or ServiceFilter config file
comp_username	varchar /50	Computed user name. If a Service, obtained from UserData Service Object or HTTP_INTERNETUSER or REMOTE_USER or dUser. If a static URL, obtained from auth-user or internetuser.
comp_validRef	char /1	Indicates if the referenced object exists and is available to the requesting user. `1` if the access was a Web reference (W), and ispromptlogin and isaccessdenied are both NULL, and the static URL exists at reduction time. Or, if the access was a service call (S) and the sc_scs_status field is NULL. `NULL` if the static URL did not exist at reduction time, or the user logon failed, or the logon succeeded but the user was not authorized to view the object.
sc_scs_isPrompt	char /1	`1` if true Data source: Plug-in immediateResponseEvent field "`ispromptlogin`"
sc_scs_isAccessDenied	char /1	`1` if true Data source: Plug-in immediateResponseEvent field `isaccessdenied`
sc_scs_inetUser	varchar /50	Internet user name (if security problem) Data source: Plug-in immediateResponseEvent field `internetuser`
sc_scs_authUser	varchar /50	Authorization user name (if security problem) Data source: Plug-in immediateResponseEvent field `auth-user`
sc_scs_inetPassword	varchar /8	Internet password (if security problem) Data source: Plug-in immediateResponseEvent field `internetpassword`
sc_scs_serviceMsg	varchar /255	Content Server service completion status Data source: Service DataBinder `StatusMessage`
extField_1 through extField_10	varchar /255	General purpose columns to use with the extended service tracking function. In the field map ResultSets, the DataBinder fields are mapped to these columns.

9.3.5 Data Output

When Content Tracker is appropriately configured, the static and dynamic content access request information and all metadata fields are accessible. The logged metadata includes content item and user metadata:

9.3.5.1 Content Item Metadata

Content Tracker uses standard Content Server metadata tables for content item metadata.

9.3.5.2 User Metadata Tables

Content Tracker user metadata database tables are updated with information collected about active users during the reporting time period. These tables retain data about user profiles at the time the data reduction runs. The names of the user metadata tables are formed from a root which indicates the class of information contained, and an Sct prefix to distinguish the table from native Content Server tables.

By default, Content Tracker does not archive data so expired rows are not moved from the Primary tables to the Archive tables. Instead, the expired rows are discarded, ensuring optimal performance. Two complete sets of user metadata database tables are created:

Primary: Named SctUserInfo, and so on, which contain the output for reduction data in the new and recent cycles.
Archive: Named SctUserInfoArchive, and so on, which contain output for reduction data in the archive cycle.

If Content Tracker is configured to run archives, reduction data files are moved from recent to archive status and the associated table records are moved from the Primary table to the Archive table. This prevents excessive buildup of rows in the Primary tables, and ensures that queries performed against recent data complete quickly. Rows in the Archive table are not deleted. They can be moved or deleted using any SQL query tool. To delete all the rows in the Archive tables, delete the tables themselves. They are re-created during the next Content Server restart. Reports are not run against archive data.

The following tables are created:

The SctAccounts table contains a list of all accounts. It is organized using one line for each account.
The SctGroups table contains a list of all user groups current at time of reduction. It is organized using one line per content item group.
The SctUserAccounts table contains entries for all users listed in the SctUserInfo table and who are assigned accounts defined in the current instance. A separate entry exists for each user-account combination.

In multiple proxy instances, the group and account information of a user may not be determined by Content Tracker. When the current instance is a proxy, the group information for an active user defined in a different proxy is replaced by a placeholder line in SctUserGroups for that user. The line contains the user name and a hyphen (-) placeholder for the group. If at least one account is defined in the current instance, a similar entry is created in SctUserAccounts for any user who is defined in a different proxy.
The SctUserGroups table is organized using one line for each user's group for each user active during the reporting period. It references those users who logged on during the data collection period. If Content Tracker is running in a proxied Content Server configuration, only groups defined in the current instance are listed. For example, a user named "joe" is defined in the master instance and has access to groups "Public" and "Plastics" in the master instance. If "joe" logs on to a proxy instance and the group "Plastics" is not defined in the proxy, only the association between "joe" and "Public" appear in SctUserGroups.
The SctUserInfo table is organized using one line per user. It includes all users known to the current instance and additional users from a different instance who logged on to the current instance during the data collection period. In a proxied configuration, users local to one instance are usually visible from the UserAdmin application to other instances. If a user is defined locally with the same name in two instances, only the local user is visible in each of these instances.

For example, the "sysadmin" defined in the master is not the "sysadmin" appearing in the UserAdmin application for a proxy. These two different users could both log in during the same data collection period. The user from the master logs on as "sysadmin" and the proxy user logs on as "cs_2/sysadmin" (for example). The SctUserInfo file generated for this period has separate entries for "sysadmin" and "cs_2/sysadmin".

9.3.5.3 Reduction Log Files

When data reduction is run, the Content Tracker Data Engine generates a summary results log file, named reduction-yyyymmdd.log. The reduction logs can be useful to help diagnose data reduction errors.

9.3.6 Tracking Limitations

In some cases, Content Tracker has limitation in tracking data. This section provides an overview of those limitations.

9.3.6.1 Tracking Limitations in Single-Box Clusters

Currently, Content Tracker does not support multi-node clusters that are installed in a single server. This is true even though multiple network cards are installed and each cluster node has its own IP address. In this case, the Content Server instance for each cluster node can successfully bind its IntradocServerPort to its specific IP address.

Unfortunately, only one cluster node is able to bind its Incoming Provider ServerPort to its specified IP address. Consequently, all of the cluster nodes share and alternately use the same Incoming Provider ServerPort. As a result, the SctLock provider for Content Tracker can only track document accesses on one cluster node at a time.

9.3.6.2 Static URLs and WebDAV

The access counts determined by Content Tracker are generally correct, but in some circumstances the software cannot determine if the content was actually delivered to the requesting user, or if it was, which revision of the content was delivered:

Repeated requests through WebDAV: If a user accesses a document with a WebDAV client then re-accesses the same document later, only the first WebDAV request is recorded. Access count reports for such content are usually lower than the actual number.
Static URLs: A user saves a URL for a content file, but the content is later revised in such a way that the saved URL is no longer valid. If the user attempts to access the content with the saved URL, an error occurs. Content Tracker records this as a successful access even though content was not delivered. Access count reports for such content are usually higher than the actual number.
Static URLs and wrong dID: If a user accesses content using a URL and the content is revised or the security group is changes before the Content Tracker data reduction operation is performed, the user is reported as seeing the latest revision. Access count reports for such content are usually attributed to a newer revision than actual. To minimize this effect, schedule or run data reductions on a regular basis.

This section covers the following topics:

9.3.6.2.1 Wrong dID Reported for Access by Saved Static URL

Scenario: User accesses content via the "Web Location" (URL). The content is then revised before the Content Tracker data reduction operation is performed. The user is reported as seeing the latest revision, not the revision that the user actually saw. Access counts reported for such content tend to be attributed to a newer revision than actual. Minimize this effect by scheduling or running Content Tracker data reductions on a regular basis.

Details: This is related to False Positive for Access by Saved (stale) Static URL, described above. That is, the web server uses the entire web location, (for example, DomainHome/ucm/cs/groups/public/documents/adacct/xyzzy.doc), to locate and deliver the content, while Content Tracker uses only the ContentID portion to determine the dID and dDocName values. Moreover, Content Tracker makes this determination during data reduction, not at the time the access actually occurs.

There are some implications of this not immediately obvious, such as when the group and/or security of the revision are changed from the original. For example, if a user accesses "Public" Revision 1 of a document through a static URL, and the document is subsequently revised to Revision 2 and changed to "Secure" before the Content Tracker data reduction takes place, Tracker reports that the user saw the Secure version. This may also occur when the content file type changes. If the user accesses an original .xml version, which is then superseded by an entirely different .doc before the data reduction is performed, Tracker reports the user saw the .doc revision, not the actual .xml version.

9.3.6.2.2 False Positive for Access by Saved (stale) Static URL

Scenario: User saves a "Web Location" (URL) for a content file. The content is subsequently revised in such a way that the saved URL is no longer valid. The user then attempts to access the content through the (now stale) URL, and gets a "Page Cannot be Found" error (HTTP 404). Content Tracker may record this as a successful access even though the content was not actually delivered to the user. Access counts reported for such content tend to be higher than actual.

Details: The "Web Location" of a content file is the means by which a user can access content via a "static URL". The specific file path in the URL is used in two, slightly different contexts: It is used by the web server to locate the content file in the Content Server repository, and it is also used by Content Tracker to determine the dID and dDocName of the content file during the data reduction process. The problem occurs when the content is revised in such a way that the web location for a given Content ID changes between the time the URL is saved and the time the access is attempted.

For example, if a Word document is checked in then revised to an XML equivalent, the web location for the latest revision of the content changes from the first line of code shown to the second line of code shown, where "xyzzy" is the assigned Content ID.

DomainHome/ucm/cs/groups/public/documents/adacct/xyzzy.doc

DomainHome/ucm/cs/groups/public/documents/adacct/xyzzy.xml

The original revision is renamed as:

DomainHome/ucm/cs/groups/public/documents/adacct/xyzzy~1.doc

This means the original Web Location no longer works as a static URL. The Content ID obtained from the original URL, however, matches the latest revision.

9.3.6.2.3 Missed Accesses for Content Repeatedly Requested via WebDAV

Scenario: User accesses a document via a WebDAV client, then accesses the same document in the same manner later. Only the first WebDAV request for the document is recorded. Access counts reported for such content tend to be lower than actual.

Details: WebDAV clients typically use some form of object 'caching' to reduce the amount of network traffic. If a user requests a particular object, the client first determines if it already has a copy of the object in a local store. If it does not, the client contacts the server and negotiate a transfer. This transfer is recorded as a COLLECTION_GET_FILE service request.

If the client already has a copy of the object, it contacts the server to determine if the object has changed since the client local copy was obtained. If it has changed, then a new copy is transferred and the COLLECTION_GET_FILE service details is recorded.

If the client copy of the object is still current, then no transfer takes place, and the client presents the saved copy of the object to the user. In this case, the content access is not counted even though the user appears to get a "new" copy of the original content.

9.3.6.3 Data Directory Protections

Content Tracker's Web server filter plug-in runs in the authorization context of the user whose access request is being processed. In some cases, the owner of the request processing thread is a system account. In others, it is a requesting user or another type of non-system account used by the application.

The filter records the information in raw event logs. If the log file does not exist a new one is created using the default protection and authorization credentials of the user who owns the event thread. If the user account has write permission to the data directory, the content access data is recorded. Otherwise, the logging request fails and the access event details are not recorded.

To ensure that Content Tracker can properly record user access requests, the data directory must be configured to accept the account authorization credentials for all users. Granting world write permission (or the equivalent) is one method. Allowing unlimited write access is recommended unless security concerns prohibit this level of unrestricted access.

9.3.6.4 ExtranetLook Component

The ExtranetLook component (if enabled) allows customizations of cookie-based login forms and pages for anonymous-type users. The component uses a built-in Web server plug-in that monitors requests and determines if a request is authenticated based on cookie settings. When a user requests access to a content item, Content Tracker must function within the authorization context of the user's account.

After collecting the access information, Content Tracker tries to record the event data in the log file. If the user's account permissions allow access to Content Tracker's data directory, then the request activity is logged. However, if the account does not have write authorization, the logging request fails and the request activity is not recorded.

9.4 Data Tracking Functions

This section describes the different data tracking functions available with Content Tracker:

9.4.1 Activity Snapshots

The activity snapshots feature captures user metadata that is relevant for each recorded content item access:

9.4.1.1 Search Relevance Metrics

When activated, the activity metrics and corresponding metadata fields provide search relevance information about user accesses of content items. An optional automatic load function allows users to update the last access activity metric to ensure that checked-in content items are appropriately time-stamped.

Content Tracker optionally fills the search relevance custom metadata fields with content item usage information that indicates the popularity of particular content items. This information includes the date of the most recent access and the number of accesses in two distinct time intervals.

Information generated from these activity metrics functions is used in various ways. For example, you can order search results according to which content items have been recently viewed or the most viewed in the last week.

If the snapshot function is activated, the values in the search relevance metadata fields are updated during a post-reduction step. During this processing step, Content Tracker uses SQL queries to determine which content items have changed activity metrics values. Content Tracker updates the applicable database tables with the new values and initiates a re-indexing cycle. However, only the content items that have changed metadata values are re-indexed.

9.4.1.2 Enabling the Snapshot Function

To use these optional features, first enable the snapshot post-processing function which activates the activity metrics choices. Then selectively enable the activity metrics and assign their preselected custom metadata fields.

To enable the snapshot function and activate the activity metrics:

Choose Administration then Content Tracker Administration from the main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Snapshot tab, select Enable Snapshot.
Click OK.
In the confirmation window, click OK.

9.4.1.3 Creating the Search Relevance Metadata Fields

Before implementing the snapshot function, decide which custom metadata fields to associate with each of the enabled activity metrics. Also, the custom metadata fields must exist and must be of the correct type. Depending on which activity metrics to be enabled, create one or more custom metadata fields using an applicable procedure.

Add the following specific information for the activity metrics:

Last Access Metric
- Field Type: Date
- Default Value: Optional. If not specified, the field is not populated until a content item is checked in and a data reduction run. Some applications require a default value and in those cases, enter a value in the Default Value field that ensures the Last Access field is populated with the date and time of the content check in. For more information, see Setting a Check-in Time Value for the Last Access Field.
- Enable for Search Interface: Optional. Check to make the field available for searching.
Short and Long Access Metric
- Field Type: Integer
- Enable for Search Interface: Optional. Check to make the field available for searching.

Indexing a custom metadata field is optional, although indexing makes searches on this field more efficient. Indexing also allows users to query the accumulated search relevance statistics and generate useful data. For example, you can create a list of content items ordered by their popularity, and so on.

9.4.1.4 Setting a Check-in Time Value for the Last Access Field

The Last Access Date field is normally updated by Content Tracker when a managed object is requested by a user and a data reduction run. The field can be empty (NULL) until the next data reduction is run. Some applications require that the date and time of content check in be recorded immediately in the Last Access field.

Use any of the following methods to populate the Last Access field:

Using the Configuration Manager: When adding the metadata field, enter an expression that populates the field with the date and time of content check in (for example, a default value of <$dateCurrent()$> populates the field with the current check-in date and time). After setting the value, fill the field for existing content using the Autoload option.
Using the Autoload option: This option allows retroactive replacement of NULL values in the Last Access field with the current date and time. The only records affected are those where the Last Access metadata field is empty (NULL)
1. Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
2. Click the Data Engine Control Center: Snapshot tab.
3. Select one or more of the activity metric check boxes to enable them. Enter the name of the custom metadata field to be linked to the activity metric (for example, xLastAccess, xShortAccess, or xLongAccess).
4. Select the Autoload check box.
5. Click OK.
  
  A confirmation dialog box opens and the current date and time are inserted into the applicable Last Access fields (those with NULL values) in the DocMeta database table.
  
  Please note:
  - Autoload is primarily intended for use with applications that count check-in operations as an access activity.
  - Autoload backfills the current date and time for all existing content that does not have a date value in the Last Access field. Any content checked in after the Last Access field is defined should have the field automatically populated with the check-in date and time as a default value.
  - Running Autoload can affect every record in the DocMeta database table. Use this option sparingly.
  - The only DocMeta records affected are those where the Last Access metadata field is empty (NULL).
  - Autoload is persistent. The state of the Autoload check box is saved with all the other Snapshot settings. To prevent inadvertent use of this option, clear the Autoload check box and re-save activity metrics field settings immediately after performing the autoload function.
  - Content Server's indexer is not automatically run after Autoload completes the update. You must decide when to rebuild the collection.
  - By default, the Autoload query sets the Last Access metadata field to the current date and time. You can customize the query as needed.

9.4.1.5 Populating the Last Access Field for Batch Loads and Archives

To ensure proper retention of archived and batch loaded content, set the Last Access field date for the import/insert. Otherwise the access date for these content items is NULL, and retention based on this field fails. Also consider how the date can affect retention management. For example, an import of 1998 data is probably better tagged with that date than the date when the import was performed to accurately reflect the retention quality of the content.

The name of the Last Access field is based on the name specified when the field was created. For example, if the name Last Access is used, xLastAccess would be used in the import/insert.

For more information about using the Batch Loader utility, see Administering Oracle WebCenter Content.

The following steps provide a general outline of the procedure to populate the Last Access field using Batch Loader:

Access the Batch Loader.

Create a record that establishes an appropriate Last Access date. For example:

# This is a comment
Action=insert
dDocName=Sample1
dDocType=ADACCT
xLastAccess=5/1/1998
dDocTitle=Batch Load record insert example
dDocAuthor=sysadmin
dSecurityGroup=Public
primaryFile=links.doc
dInDate=8/15/2001
<<EOD>>

Run the Batch Loader to process the file record.

9.4.1.6 Linking Activity Metrics to Metadata Fields

After the activity metrics options have been activated, they must be individually selected to enable them. Enabling the activity metrics also activates their corresponding custom metadata fields.

To enable the activity metrics and activate their corresponding custom metadata fields:

Choose Administration then Content Tracker Administration from the main menu. Choose Data Engine Control Center.
Click the Data Engine Control Center: Snapshot tab.
Select one or more of the activity metric check boxes to enable them. Enter the name of the custom metadata field to be linked to the activity metric (for example, xLastAccess, xShortAccess, or xLongAccess).
For the Short and Long Access Counts, enter the applicable interval amounts in days. For example, 7 days for the Short Access Count and 28 days for the Long Access Count.

The two Access Count metrics differ only in the accounting period (for example, last 30 days versus last 90 days, last week versus last year, and so on). The time intervals specified in the activity metrics are independent of each other. For example, you can set the number of days in the first interval period (Short Access) to more than those in the second interval period (Long Access).

Access counts are only tabulated for reduced dates. If data is not reduced for one or more days, the accesses on those days are not logged or counted. Do not reduce data in random order because the Access Count metrics are affected by the reduction date order.
Click OK when done.
In the confirmation window, click OK.

Note that the fields are case-sensitive. Make sure all field values are spelled and capitalized correctly.

Content Tracker uses the following error checks to validate each enabled activity metric field value:

Checks the DocMeta database table to ensure that the custom metadata field actually exists.
Ensures that the custom metadata field is of the correct type (for example, that the Last Access metadata field is of type Date, an so on).
Checks to explicitly exclude the dID metadata field.

9.4.1.7 Editing the Snapshot Configuration

To modify the snapshot activity metrics settings:

Choose Administration then Content Tracker Administration from the main menu. Choose Data Engine Control Center.
Click the Data Engine Control Center: Snapshot tab.
Make the necessary changes in the activity metrics fields.
Click OK.
In the confirmation window, click OK.

9.4.2 Service Calls

Content Tracker enables the logging of service calls with data values relevant to the associated services. Every service to be logged must have a service entry in the service call configuration file (SctServiceFilter.hda). In addition to the logged services, you can include the corresponding field map ResultSets in the SctServiceFilter.hda.

For more information about managing service calls, see Developing with Oracle WebCenter Content.

9.4.3 Web Beacon Functionality

Important:

The implementation requirements for the Web beacon feature are contingent on the system configurations involved. All of the factors cannot be addressed in this documentation. Information about the access records collected and processed by Content Tracker are an indication of general user activity and not exact counts.

A Web beacon is a managed object that facilitates specialized tracking support for indirect user accesses to Web pages or other managed content. In earlier releases, Content Tracker was unable to gather data from cached pages and pages generated from cached services. When users access cached Web pages and content items, Content Server and Content Tracker are unaware that these requests ever happened. Without using Web beacon referencing, Content Tracker does not record and count such requests.

The Web beacon involves the use of client side embedded references that are invisible references to the managed beacon objects within Content Server. Content Tracker can record and count user access requests for managed content items that have been copied by an external entity for redistribution without obtaining content directly from Content Server.

Web beacon functionality is useful for reverse proxy activity.

Two situations in particular merit the use of the Web beacon functionality: reverse proxy activity and when using Site Studio.

In a reverse proxy scenario, the reverse proxy server is positioned between the users and Content Server. The reverse proxy server caches managed content items by making a copy of requested objects. The next time another user asks for the document, it displays its copy from the private cache. If the reverse proxy server does not have the object in its cache, it requests a copy.

Because it is delivering cached content, the reverse proxy server does not directly interact with Content Server. Therefore, Content Tracker cannot detect these requests and does not track this type of user access activity.

A reverse proxy server is often used to improve Web performance by caching or by providing controlled Web access to applications and sites behind a firewall. Such a configuration provides load balancing by moving copies of frequently accessed content to a Web server where it is updated on a scheduled basis.

For the Web beacon feature to work, each user access includes an additional request to the managed beacon object in Content Server. The additional request adds overhead, but the Web beacon object is very small and does not significantly interfere with the reverse proxy server's performance. Note that it is only necessary to embed the Web beacon references in objects you specifically want to track.

If your Website is intended for an external audience, you may decide to create a copy of the site and transfer it to another server. In addition to being viewed publicly, this solution also ensures that site development remains separate from the production site. In this arrangement, however, implement the Web beacon feature to ensure that Content Tracker can collect and process user activity.

For more information about managing Web beacon objects, see Developing with Oracle WebCenter Content.