10 Tracking Content Access

Content Tracker and Content Tracker Reports are optional components automatically installed with Oracle WebCenter Content Server. They are separate modules but when enabled they provide information about system usage such as which content items are most frequently accessed and what content is most valuable to users or specific groups. Knowing the consumption patterns of an organization's content enables more effective delivery of appropriate, user-centric information.

For detailed information about customizing Content Tracker, see Oracle Fusion Middleware Developing with Oracle WebCenter Content.

This chapter covers the following topics:

Section 10.1, "Understanding the Content Tracker Functionality"
Section 10.2, "Operational Details"
Section 10.3, "Data Tracking Functions"
Section 10.4, "Content Tracker Reports"

10.1 Understanding the Content Tracker Functionality

Content Tracker monitors activity on a Content Server instance and records selected details of those activities. It then generates reports that illustrate the way the system is being used. This section includes an overview about Content Tracker and Content Tracker Reports functionality.

Content Tracker incorporates several optimization functions which are controlled by configuration variables. The default values for the variables set Content Tracker to function as efficiently as possible for use in high volume production environments. For more information about Content Tracker configuration variables, see Oracle Fusion Middleware Configuration Reference for Oracle WebCenter Content.

This section covers the following topics:

Section 10.1.1, "Content Tracker and Content Tracker Reports"
Section 10.1.2, "Data Recording, Reduction, and Reporting"
Section 10.1.3, "Content Tracker Terminology"
Section 10.1.4, "Installation Considerations"

10.1.1 Content Tracker and Content Tracker Reports

Content Tracker monitors a system and records information collected from different sources about activities. The information is merged and written to a set of tables in the Content Server database. Content Tracker can monitor activity based on:

Content item usage: Data is obtained from Web filter log files, the Content Server database, and other external applications such as portals and websites. Content item access data includes dates, times, content IDs, and current metadata.
Services: Services that return content, and services that handle search requests, are tracked. By default, Content Tracker logs only the services that have content access event types but by changing the configuration, Content Tracker can monitor any service, even custom services.
User accesses: Information is gathered about non-content access events, such as the collection and synthesis of user profile summaries. This data includes user names and user profile information.

After Content Tracker extracts data and populates database tables, you can use Content Tracker Reports to:

Generate reports: Content Tracker Reports queries the tables and generates summary reports of activities and usage history of content items. The reports help analyze specific groups of content or users based on metadata, file extensions, or user profiles. Use the pre-defined reports, customize them, or use a compatible third-party reporting package.
Optimize content management practices: You can use the reported data for content retention management. Depending on the access frequency, some items could be archived or deleted. Applications can also use the data to provide portlets with the top content for particular types of users.

10.1.2 Data Recording, Reduction, and Reporting

Content Tracker records data from the following sources:

Web server filter plug-in: When content is requested with a static URL, the Web server filter plug-in records request details, saving the information in event log files. The event log files are used as input by the Content Tracker data reduction process.
Service handler filter: Content Tracker monitors services that return content. When one of these services is called, details of the service are copied and saved in the SctAccessLog table.
Logging service: Content Tracker supports a single-service call to log an event. You can call it directly with a URL, as an action in a service script, or from Idoc Script.
Database tables: When configured to collect and process user profile information, a data reduction process queries selected database tables to obtain information about active users during the reporting period.
Application API: An interface is available to register other components and applications for tracking. This interface allows cooperating applications, such as Site Studio, to log event information in real time. The application API is designed as a code-to-code call which does not involve a service. The API is not meant for general use. If you are building an application and are interested in using this interface, contact Consulting Services.

The data reduction process gathers and merges the data obtained from the data recording sources. Until this reduction process has finished, the data in the Content Tracker tables is incomplete. The reduction is run one time for each day's data gathered. You can run the reduction manually, or schedule it to run automatically, usually during an off-peak period when the system load is light.

Content Tracker Reports provide reports that answer commonly asked questions about activity and usage. Depending on how Content Tracker is configured, these reports can also indicate which searches are used most often, and which users have been most active. The reports are available directly through an interface or as an action on the Content Information page. The available categories of pre-defined report options depend on Content Tracker configuration. The reports, the underlying queries, and the output formatting can be customized.

10.1.3 Content Tracker Terminology

The following terminology is used with Content Tracker:

Data collection: Gathering content access information and writing the information to event log files.
Data reduction: Processing the information from data collection and merging it into a database table.
Data Engine Control Center: The interface that provides access to the user-controlled functions of the Data Engine. It has the following tabs:
- Collection: Used to enable data collection.
- Reduction: Used to stop and start data reduction (merging data into database tables).
- Schedule: Used to enable automatic data reduction.
- Snapshot: Used to enable activity metrics. The term snapshot also denotes an information set representing the world at a particular time.
- Services: Used to add, configure, and edit service calls to be logged. It is also used to define the specific event details logged for a given service.
Service definitions: The ResultSet structure in the service call configuration file (SctServiceFilter.hda) that contains entries to define each service call to be logged. The service definition ResultSet is named ServiceExtraInfo.
Service entry: The entry in the service definition ResultSet (ServiceExtraInfo) that defines a service call to be logged. The ServiceExtraInfo ResultSet contains one service entry for each service to be logged.
Field map: A secondary ResultSet in the service call configuration file (SctServiceFilter.hda) that defines the service call data and the specific location where data is to be logged.
Top Content Items: Most frequently accessed content items in the system.
Content Dashboard: A page that provides overview information about the access of a specific content item.

10.1.4 Installation Considerations

Content Tracker is supported on most hardware and networked configurations but some hardware and software combinations require special consideration:

Section 10.2.6.1, "Tracking Limitations in Single-Box Clusters"
Section 10.2.6.2, "Static URLs and WebDAV"
Section 10.2.6.4, "ExtranetLook Component"
Section 10.3.1.1, "Search Relevance Metrics"

Set the SctUseGMT configuration variable to true to use Greenwich Mean Time (GMT). It is set to false by default, to use local time. For more information about configuration variables see Oracle Fusion Middleware Configuration Reference for Oracle WebCenter Content.

When upgrading from an earlier version of Content Tracker there is a one-time retreat (or advance, depending on location) in access times. To accommodate the biannual daylight savings time changes, discontinuities in recorded user access times are used (contingent on the use of local time and the location).

10.2 Operational Details

Depending on Content Tracker configuration, it can perform Data Collection of event information such as dynamic and static content accesses and service calls. Both types of data are recorded in a Combined Output Table (SctAccessLog). Service calls are inserted into the log in real time but the static URL information must first undergo the Data Reduction process (either manual or scheduled).

After activity data is collected, Content Tracker combines, analyzes and synthesizes the event information and loads the summarized activity into database tables. After reduction, this data becomes available for Content Tracker Reports:

Section 10.2.1, "Data Collection"
Section 10.2.2, "Data Reduction"
Section 10.2.3, "Content Tracker Event Logs"
Section 10.2.4, "Combined Output Table"
Section 10.2.5, "Data Output"
Section 10.2.6, "Tracking Limitations"

10.2.1 Data Collection

Data collection is the initial step in any tracking function. Content Tracker data collection includes collecting information from static URL references and service call events. Data is collected using several different methods:

Section 10.2.1.1, "Service Handler Filter"
Section 10.2.1.2, "Web Server Filter Plug-in"
Section 10.2.1.3, "Logging Service"

10.2.1.1 Service Handler Filter

Using the service handler filter, Content Tracker obtains information about dynamic content requests that come through the Web server, and also about other types of activity, such as calls from applications. The service request details are obtained from the DataBinder that accompanies the service call, and the information is stored in the Combined Output Table (SctAccessLog) in real time.

The SctServiceFilter.hda configuration file is used to determine which service calls are logged. It uses a ResultSet structure that includes one service definition entry for each service to be logged. When using the extended service logging function, the file also contains field maps corresponding to service definition entries.

The ServiceExtraInfo ResultSet is included in the SctServiceFilter.hda file. This ResultSet contains one or more service entries defining the services to be logged. Additional field map ResultSets are used to support the extended service logging function. Each service that has additional data values tracked must have a field map ResultSet in the SctServiceFilter.hda file to define the data fields, locations, and database destination columns for the related service.

10.2.1.2 Web Server Filter Plug-in

Managed content retrieved with a static URL does not usually invoke a service. The Content Tracker Web server filter plug-in collects the access event details (static URL references) and records them in raw Content Tracker Event Logs (sctlog files). The information in these files requires an explicit reduction (either interactive or scheduled) before it is included in the Combined Output Table with the service call data.

10.2.1.3 Logging Service

The logging service is a single-service call that can be called directly with a URL or as an action in a service script. It can also be called from Idoc Script using the executeService function. The calling application is responsible for setting the fields to be recorded in the service DataBinder, including the descriptive fields listed in the Content Tracker service filter configuration file (SctServiceFilter.hda).

There should be no duplication or conflicts between services logged with the service handler filter and those logged with the Content Tracker logging service. If a service is named in the Content Tracker service handler filter file then such services are automatically logged so there is no need for the Content Tracker logging service to do it. However, Content Tracker does not attempt to prevent such duplication.

10.2.1.4 Enabling or Disabling Data Collection

To enable or disable data collection:

Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Collection tab, select (to enable collection) or clear (to disable collection) the Enable Data Collection check box.
Click OK.

Do not exit the applet. Wait until a confirmation message displays. If you exit before the confirmation message, the requested change(s) may not occur.
After the confirmation message is displayed, click OK.
Restart the Web server and Content Server, in that order.

10.2.2 Data Reduction

During data reduction, the static URL information captured by the Web server filter plug-in is merged and written into the output table the service call data. Depending on configuration, at the time of the reduction the Content Tracker user metadata database tables are also updated with information collected from the static URL accesses and from the service call event records:

Section 10.2.2.1, "Standard Data Reduction Process"
Section 10.2.2.2, "Data Reduction Process with Activity Metrics"
Section 10.2.2.3, "Data Reduction Cycles"
Section 10.2.2.4, "Access Modes and Data Reduction"
Section 10.2.2.5, "Reduction Sequence for Event Logs"
Section 10.2.2.6, "Reduction Schedules"
Section 10.2.2.7, "Running Data Reduction Manually"
Section 10.2.2.8, "Setting Data Reduction to Run Automatically"
Section 10.2.2.9, "Deleting Data Files"

10.2.2.1 Standard Data Reduction Process

During the data reduction process, the static URL information is extracted from the raw data files and combined with the service information stored in the SctAccessLog table. By default, Content Tracker collects and records data only for the SctAccessLog table. Although the user data output tables exist, Content Tracker does not populate them.

Depending on how Content Tracker is configured, this reduction process can:

Combine access information for static URL content access with service details.
Summarize information about user accounts that were active during the reporting period. This information is written to the Content Tracker's user metadata database tables.

Figure 10-1 Standard Data Reduction Process

This figure is described in surrounding text

10.2.2.2 Data Reduction Process with Activity Metrics

Content Tracker provides the option to selectively generate search relevancy data and store it in custom metadata fields. You can use the snapshot function to choose which activity metrics to activate. The logged data provides content item usage information that indicates the popularity of content items.

By default, Content Tracker collects and records data only for the SctAccessLog table. Although the user data output tables exist, Content Tracker does not populate them unless the Snapshot function is activated. However, using the snapshot function affects Content Tracker's performance.

If the snapshot function and activity metrics are activated, the values in the custom metadata fields are updated following the reduction processing phase. When users access content items, the values of the applicable search relevance metadata fields change. During the later post-reduction step, Content Tracker uses SQL queries to determine which content items were accessed during the reporting period. Content Tracker updates the database table metadata fields with the new values and initiates a re-indexing cycle. However, only the content items whose access count metadata values have changed are re-indexed.

The post-reduction step is necessary to process and tabulate the activity metrics for each affected content item and to load the data into the assigned custom metadata fields. It also initiates a re-indexing cycle on the content items with changed activity metrics values to ensure that the data is part of the search index and is accessible to select and order search results.

Figure 10-2 Data Reduction Process with Activity Metrics

10.2.2.3 Data Reduction Cycles

Reduced table data is moved from the primary tables to the corresponding archive tables when the associated raw data is moved from recent to archive status. The primary tables contain the output for reduction data in the new and recent cycles and the archive tables contain output for reduction data in the archive cycle.

Raw data is demoted from new to recent when the data is reduced and it is older than day. Thus, the 'new cycle' indicates that the data is for the current day or is unreduced data from previous dates. The 'recent cycle' indicates that the data is from yesterday or earlier and has been reduced.

Raw data is demoted to archive (and the corresponding rows in the SctAccessLog table are moved to the SctAccessLogArchive table) when the number of recent sets reaches a configured threshold number and a reduction process is run, either manually or through the scheduler. For more information about configuring the threshold number for recent sets, see the SctMaxRecentCount configuration variable information in Oracle Fusion Middleware Configuration Reference for Oracle WebCenter Content. If a reduction process is never run, the raw data remains in the recent cycle indefinitely.

10.2.2.4 Access Modes and Data Reduction

The access mode used to access content items determines how those accesses are recorded in the SctAccessLog table. If content items are accessed through a service (that is, viewing the actual native file), the events are recorded in the SctAccessLog table in real time. In this case, the activity is recorded immediately and is not dependent on the reduction process.

If content items are accessed using static URLs (that is, viewing the Web location file), the Web server filter plug-in records the events in a static log file. During the data reduction process, the static log files for a specified date are gathered and the data is moved into the SctAccessLog table. In this case, if data reduction is not performed for a given date, there are no static URL records in the SctAccessLog and no evidence that these accesses ever occurred.

The difference in the way static and service accesses are processed has implications for interval counts. For example, a user might access a content item twice on Saturday, one time through the Web location file (static access) and one time through the native file (service access). The service access is recorded in the SctAccessLog table but the Web location access is not. If Sunday's data is reduced, only the service access (not the static access) is included in the summaries of the short and long access count intervals. However, if Saturday's data is also reduced, both the service and static accesses are recorded in the SctAccessLog table and included in the both intervals.

10.2.2.5 Reduction Sequence for Event Logs

Data sets are usually reduced in chronological (calendar) order to ensure that the information included in reports is current. The order in which the raw data log files are reduced determines what specific user access data is logged and counted. During reduction, the SctAccessLog and user metadata database tables are modified with data from the raw data files.

When using the snapshot function to gather search relevance information, the metadata fields associated with the activated activity metrics are also updated during data reduction. The activity metrics use custom metadata fields included in the DocMeta database table.

Content Tracker changes the activity metrics values according to the applicable data in the reduction data set. To ensure that data values are complete and current, perform data reduction on a daily basis. If the data sets are reduced out of order, re-reducing the current or most recent data set corrects the counts. However, it is always preferable to consistently reduce data in calendar order.

The following scenarios show how the reduction sequence affects the stored data.

Scenario 1:

Depending on how content items are accessed, if activity on certain days (such as Saturdays and Sundays) is never reduced, then accesses that occur on those days might never be logged or counted. For more information, see Section 10.2.2.4. Similarly, if a content item is accessed on Tuesday and reductions are done for Monday and Wednesday, the Tuesday access is might not be counted toward the last access of that content item.

Scenario 2:

If there was a significant increase in accesses in the last few days, and you reduce data from two weeks earlier, the long and short access metrics for content items do not reflect the recent activity. Instead, the interval values from two weeks earlier override today's values. Reducing the current or most recent data set corrects the counts.

The reduction order does not adversely affect the Last Access date. The reduction process only changes the Last Access date if the most recent access in the reduction data set is more recent than the current Last Access value in Content Server's DocMeta database table.

If you have reduced a recent data set and a particular content item had been accessed, the Last Access field is updated with the most recent access date in the reduction data set. If you then re-reduce an older data set, the older access date for this content item does not overwrite the current value.

Scenario 3:

Reducing the data sets in an arbitrary order interferes with the demotion of "recent" data files to "archive" data files. The movement of the associated table records is based on the age, archive tables are intended to store the "oldest" data. If the data sets are reduced in random order, it is not apparent which data is the oldest.

For more information about recent and archive data files, see Section 10.2.5.2 and Section 10.2.2.3.

10.2.2.6 Reduction Schedules

You can configure reduction runs to run on a scheduled basis to periodically reduce the raw data. A steady flow of raw data goes into the recent and archive repositories, and a similarly steady flow of reduced data goes from the primary tables to the archive tables.

Note that if the Content Tracker Data Engine is disabled the day before a scheduled reduction run, no data is collected. If it is enabled on the day of the scheduled reduction run, the scheduler does not run because no data is available.

Data reductions scheduled for a given day are performed on data collected during the previous day. The previous day is defined as the 24-hour period beginning and ending at midnight (system time). To conserve CPU resources, you can schedule reduction runs for early morning hours when the system load is generally the lowest.

An error can be issued if the scheduled reduction is set to run within a few minutes after midnight. If this occurs, reschedule the reduction to run later.

10.2.2.7 Running Data Reduction Manually

To manually reduce data:

Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Reduction tab, click (to highlight) the set of input data to reduce. Information on this page includes the following:
- Cycle: The status of the input data. Values include new (the input data has not been reduced), recent (data has been reduced but is not archived), and archive (data has been reduced and remains in archive cycle until deleted).
- Available date: Date when the data was collected.
- Status: The status of the reduction. Values include ready (input data is available to be reduced), running (data is being reduced), and archiving (data is being moved from recent to archive cycle).
- Percent Done: the progress of the reduction cycle.
- Completion Date: Date and time the reduction completed.
Click Reduce Data.
Click Yes to reduce the data.

Note:

If the current date's data is reduced, the status in the Cycle column stays as 'new' even though the data is reduced.

10.2.2.8 Setting Data Reduction to Run Automatically

To set data reduction to run automatically:

Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Schedule tab, select the Scheduling Enabled check box.
Select check boxes for the days when data reduction occurs.
Select the hour and minute when data reduction occurs.
Click OK.

Do not exit the applet. Wait until a confirmation message displays. If you exit before the confirmation message, the requested change(s) may not occur.
Click OK when the confirmation message is displayed.

10.2.2.9 Deleting Data Files

To delete data files:

Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Reduction tab, click (to highlight) the set of input data to delete.
Click Delete or Delete Archive.
Click OK to delete the data.

10.2.3 Content Tracker Event Logs

Content Tracker supports multiple input files for different event log types and for configurations with multiple Web servers. Each Web server filter plug-in instance uses a unique tag as a file name suffix for the event logs. The unique suffix contains the Web server host name plus the server port number.

The reduction process searches for and merges multiple raw event logs named sctLog-yyyymmdd-myhostmyport.txt. The raw event logs are processed individually.

Content Tracker may not always capture a user name for a content access event, even if the user is logged into Content Server. In this case, the item was accessed with a static URL request and, in general, the browser does not provide a user name unless the Web server asks it to send the user's credentials. If the item is public content, the Web server does not ask the browser to send user credentials, and the user accessing the URL is unknown.

To record the user name for every document access, make sure the content is not accessible to the guest role. If the content is not public, the user's credentials are required to access the items and a user name is recorded in the raw event log entry.

Depending on Content Tracker configuration, when raw data log files in the "new" cycle are reduced, the Data Engine moves the data files into the following subdirectories:

The default number of data sets that the recent/ directory can hold is 60 sets (dates) of input data log files. When the number of data sets is exceeded, the eldest are moved to the /archive directory.

cs_root/data/contenttracker/data/recent/yyyymmdd/
By default, Content Tracker does not archive data. Instead, the expired rows are discarded to ensure optimal performance. If appropriately configured, Content Tracker uses the archive/ directory to hold all input data log files that were moved out of the "recent" cycle.

cs_root/data/contenttracker/data/archive/yyyymmdd/

When raw data files are reduced, another file (reduction_ts-yyyymmdd.txt) is generated as a time stamp file.

10.2.4 Combined Output Table

The SctAccessLog table contains entries for all static and dynamic content access event records. The SctAccessLog table is organized using one line per event in the reporting period. The rows in the table are tagged according to type:

S indicates the records logged for service calls.
W identifies the records logged for static URL requests.

By default, Content Tracker does not log accesses to GIF, JPG, JS, CSS, CAB, and CLASS file types. Therefore, entries for these file types are not included in the combined output table after data reduction.

The Content Tracker Web server filter plug-in cannot distinguish between URLs for user content and those used by the user interface. References to UI objects, such as client.cab, can appear in the static access logs. To eliminate these false positives, use the SctIgnoreDirectories configuration variable to define a list of directory roots to be ignored by the Content Tracker filter. To log these file types, change the default setting for the SctIgnoreFileTypes configuration variable to the type (gif,jpg,js,css). For more information about using configuration variables, see Oracle Fusion Middleware Configuration Reference for Oracle WebCenter Content.

The following table describes the information collected for each record in the SctAccessLog table. By default, Content Tracker does not collect data to populate certain columns for bulky and rarely used items.

Column Name	Type / Size	Column Definition
SctDateStamp	datetime	Local date when data collected in format YYYYMMDD, depending on customer location and time of day event occurs. This may differ from date recorded for eventDate. Time set to 00:00:00 Data source: Internal
SctSequence	int / 8	Sequence unique to entry type Data source: Internal
SctEntryType	char / 1	Entry type. Values are W or S Data source: Internal
eventDate	datetime	GMT time and date when request completed. The date depends on customer location and time of day event occurs. This may differ from date recorded for SctDateStamp)
SctParentSequence	integer	Sequence of outermost Service Event in tree, if any.
c_ip	varchar / 15	IP of client
cs_username	varchar / 255
cs_method	varchar / 10	GET
cs_uriStem	varchar / 255	Stem of URI
cs_uriQuery	varchar / [maxUrlLen]	Query portion. For example, `IdcService=GET_FILE&dID=42...`
cs_host	varchar / 255	Content Server server name
cs_userAgent	varchar / 255	Client User Agent Ident By default, this column contains either `browser` or the suffix of any string beginning with `java:`. This simplification ensures optimal performance for Content Tracker.
cs_cookie	varchar / [maxUrlLen]	Current cookie
cs_referer	varchar / [maxUrlLen]	URL leading to this request
sc_scs_dID	int / 8	dID Data source: from query or derived from URL (reverse lookup)
sc_scs_dUser	varchar / 50	dUser Data source: Service DataBinder `dUser`
sc_scs_idcService	varchar / 255	Name of IdcService. For example, `GET_FILE` Data source: from query or Service DataBinder `IdcService`
sc_scs_dDocName	varchar / 30	dDocName Data source: from query of Service DataBinder `dDocName`
sc_scs_callingProduct	varchar / 255	Arbitrary identifier Data source: SctServiceFilter config file or Service DataBinder `sctCallingProduct`
sc_scs_eventType	varchar / 255	Arbitrary identifier Data source: SctServiceFilter config file or Service DataBinder `sctEventType`
sc_scs_status	varchar / 10	Service execution status Data source: Service DataBinder `StatusCode`
sc_scs_reference	varchar / 255	`web`, `native`, `sdc_url` Values indicate the rendition of the accessed file. `web` is a converted file (PDF), `native` is the original file and `sdc_url` is HTML. Data source: algorithmically from query parameters or ServiceFilter config file
comp_username	varchar / 50	Computed user name. If a Service, obtained from UserData Service Object or HTTP_INTERNETUSER or REMOTE_USER or dUser. If a static URL, obtained from auth-user or internetuser.
comp_validRef	char / 1	Indicates if the referenced object exists and is available to the requesting user. `1` if the access was a Web reference (W), and ispromptlogin and isaccessdenied are both NULL, and the static URL exists at reduction time. Or, if the access was a service call (S) and the sc_scs_status field is NULL. `NULL` if the static URL did not exist at reduction time, or the user logon failed, or the logon succeeded but the user was not authorized to view the object.
sc_scs_isPrompt	char / 1	`1` if true Data source: Plug-in immediateResponseEvent field "`ispromptlogin`"
sc_scs_isAccessDenied	char / 1	`1` if true Data source: Plug-in immediateResponseEvent field `isaccessdenied`
sc_scs_inetUser	varchar / 50	Internet user name (if security problem) Data source: Plug-in immediateResponseEvent field `internetuser`
sc_scs_authUser	varchar / 50	Authorization user name (if security problem) Data source: Plug-in immediateResponseEvent field `auth-user`
sc_scs_inetPassword	varchar / 8	Internet password (if security problem) Data source: Plug-in immediateResponseEvent field `internetpassword`
sc_scs_serviceMsg	varchar / 255	Content Server service completion status Data source: Service DataBinder `StatusMessage`
extField_1 through extField_10	varchar / 255	General purpose columns to use with the extended service tracking function. In the field map ResultSets, the DataBinder fields are mapped to these columns.

10.2.5 Data Output

When Content Tracker is appropriately configured, the static and dynamic content access request information and all metadata fields are accessible for use in reports generated by the Content Tracker Reports component. The logged metadata includes content item and user metadata:

Section 10.2.5.1, "Content Item Metadata"
Section 10.2.5.2, "User Metadata Tables"
Section 10.2.5.3, "Reduction Log Files"

10.2.5.1 Content Item Metadata

Content Tracker uses standard Content Server metadata tables for content item metadata. Thus Content Tracker reports reflect current content item metadata. If content item metadata changed since a content item was accessed, any generated reports reflect the changed metadata.

10.2.5.2 User Metadata Tables

Content Tracker user metadata database tables are updated with information collected about active users during the reporting time period. These tables retain data about user profiles at the time the data reduction runs. The names of the user metadata tables are formed from a root which indicates the class of information contained, and an Sct prefix to distinguish the table from native Content Server tables.

By default, Content Tracker does not archive data so expired rows are not moved from the Primary tables to the Archive tables. Instead, the expired rows are discarded, ensuring optimal performance. Two complete sets of user metadata database tables are created:

Primary: Named SctUserInfo, and so on, which contain the output for reduction data in the new and recent cycles.
Archive: Named SctUserInfoArchive, and so on, which contain output for reduction data in the archive cycle.

If Content Tracker is configured to run archives, reduction data files are moved from recent to archive status and the associated table records are moved from the Primary table to the Archive table. This prevents excessive buildup of rows in the Primary tables, and ensures that queries performed against recent data complete quickly. Rows in the Archive table are not deleted. They can be moved or deleted using any SQL query tool. To delete all the rows in the Archive tables, delete the tables themselves. They are re-created during the next Content Server restart. Reports are not run against archive data.

The following tables are created:

The SctAccounts table contains a list of all accounts. It is organized using one line for each account.
The SctGroups table contains a list of all user groups current at time of reduction. It is organized using one line per content item group.
The SctUserAccounts table contains entries for all users listed in the SctUserInfo table and who are assigned accounts defined in the current instance. A separate entry exists for each user-account combination.

In multiple proxy instances, the group and account information of a user may not be determined by Content Tracker. When the current instance is a proxy, the group information for an active user defined in a different proxy is replaced by a placeholder line in SctUserGroups for that user. The line contains the user name and a hyphen (-) placeholder for the group. If at least one account is defined in the current instance, a similar entry is created in SctUserAccounts for any user who is defined in a different proxy.
The SctUserGroups table is organized using one line for each user's group for each user active during the reporting period. It references those users who logged on during the data collection period. If Content Tracker is running in a proxied Content Server configuration, only groups defined in the current instance are listed. For example, a user named "joe" is defined in the master instance and has access to groups "Public" and "Plastics" in the master instance. If "joe" logs on to a proxy instance and the group "Plastics" is not defined in the proxy, only the association between "joe" and "Public" appear in SctUserGroups.
The SctUserInfo table is organized using one line per user. It includes all users known to the current instance and additional users from a different instance who logged on to the current instance during the data collection period. In a proxied configuration, users local to one instance are usually visible from the UserAdmin application to other instances. If a user is defined locally with the same name in two instances, only the local user is visible in each of these instances.

For example, the "sysadmin" defined in the master is not the "sysadmin" appearing in the UserAdmin application for a proxy. These two different users could both log in during the same data collection period. The user from the master logs on as "sysadmin" and the proxy user logs on as "cs_2/sysadmin" (for example). The SctUserInfo file generated for this period has separate entries for "sysadmin" and "cs_2/sysadmin".

10.2.5.3 Reduction Log Files

When data reduction is run, the Content Tracker Data Engine generates a summary results log file, named reduction-yyyymmdd.log. The reduction logs can be useful to help diagnose data reduction errors.

10.2.6 Tracking Limitations

In some cases, Content Tracker has limitation in tracking data. This section provides an overview of those limitations.

Section 10.2.6.1, "Tracking Limitations in Single-Box Clusters"
Section 10.2.6.2, "Static URLs and WebDAV"
Section 10.2.6.3, "Data Directory Protections"
Section 10.2.6.4, "ExtranetLook Component"

10.2.6.1 Tracking Limitations in Single-Box Clusters

Currently, Content Tracker and Content Tracker Reports do not support multi-node clusters that are installed in a single server. This is true even though multiple network cards are installed and each cluster node has its own IP address. In this case, the Content Server instance for each cluster node can successfully bind its IntradocServerPort to its specific IP address.

Unfortunately, only one cluster node is able to bind its Incoming Provider ServerPort to its specified IP address. Consequently, all of the cluster nodes share and alternately use the same Incoming Provider ServerPort. As a result, the SctLock provider for Content Tracker can only track document accesses on one cluster node at a time.

10.2.6.2 Static URLs and WebDAV

The access counts determined by Content Tracker are generally correct, but in some circumstances the software cannot determine if the content was actually delivered to the requesting user, or if it was, which revision of the content was delivered:

Repeated requests through WebDAV: If a user accesses a document with a WebDAV client then re-accesses the same document later, only the first WebDAV request is recorded. Access count reports for such content are usually lower than the actual number.
Static URLs: A user saves a URL for a content file, but the content is later revised in such a way that the saved URL is no longer valid. If the user attempts to access the content with the saved URL, an error occurs. Content Tracker records this as a successful access even though content was not delivered. Access count reports for such content are usually higher than the actual number.
Static URLs and wrong dID: If a user accesses content using a URL and the content is revised or the security group is changes before the Content Tracker data reduction operation is performed, the user is reported as seeing the latest revision. If the user accesses an original version which is then superseded by a different version, Content Tracker reports that the user got the revision, not the actual document. Access count reports for such content are usually attributed to a newer revision than actual. To minimize this effect, schedule or run data reductions on a regular basis.

This section covers the following topics:

Section 10.2.6.2.1, "Wrong dID Reported for Access by Saved Static URL"
Section 10.2.6.2.2, "False Positive for Access by Saved (stale) Static URL"
Section 10.2.6.2.3, "Missed Accesses for Content Repeatedly Requested via WebDAV"

10.2.6.2.1 Wrong dID Reported for Access by Saved Static URL

Scenario: User accesses content via the "Web Location" (URL). The content is then revised before the Content Tracker data reduction operation is performed. The user is reported as seeing the latest revision, not the revision that the user actually saw. Access counts reported for such content tend to be attributed to a newer revision than actual. Minimize this effect by scheduling or running Content Tracker data reductions on a regular basis.

Details: This is related to False Positive for Access by Saved (stale) Static URL, described above. That is, the web server uses the entire web location, (for example, DomainHome/ucm/cs/groups/public/documents/adacct/xyzzy.doc), to locate and deliver the content, while Content Tracker uses only the ContentID portion to determine the dID and dDocName values. Moreover, Content Tracker makes this determination during data reduction, not at the time the access actually occurs. Content Tracker reports the user as having seen the revision current at the time of the reduction, not the one that was current at the time of the access.

There are some implications of this not immediately obvious, such as when the group and/or security of the revision are changed from the original. For example, if a user accesses "Public" Revision 1 of a document through a static URL, and the document is subsequently revised to Revision 2 and changed to "Secure" before the Content Tracker data reduction takes place, Tracker reports that the user saw the Secure version. This may also occur when the content file type changes. If the user accesses an original .xml version, which is then superseded by an entirely different .doc before the data reduction is performed, Tracker reports the user saw the .doc revision, not the actual .xml version.

10.2.6.2.2 False Positive for Access by Saved (stale) Static URL

Scenario: User saves a "Web Location" (URL) for a content file. The content is subsequently revised in such a way that the saved URL is no longer valid. The user then attempts to access the content via the (now stale) URL, and gets a "Page Cannot be Found" error (HTTP 404). Content Tracker may record this as a successful access even though the content was not actually delivered to the user. Access counts reported for such content tend to be higher than actual.

Details: The "Web Location" of a content file is the means by which a user can access content via a "static URL". The specific file path in the URL is used in two, slightly different contexts: It is used by the web server to locate the content file in the Content Server repository, and it is also used by Content Tracker to determine the dID and dDocName of the content file during the data reduction process. The problem occurs when the content is revised in such a way that the web location for a given Content ID changes between the time the URL is saved and the time the access is attempted.

For example, if a Word document is checked in then revised to an XML equivalent, the web location for the latest revision of the content changes from:

DomainHome/ucm/cs/groups/public/documents/adacct/xyzzy.doc

DomainHome/ucm/cs/groups/public/documents/adacct/xyzzy.xml

where: "xyzzy" is the assigned Content ID.

The original revision is renamed as:

DomainHome/ucm/cs/groups/public/documents/adacct/xyzzy~1.doc

This means the original Web Location no longer works as a static URL. The Content ID obtained from the original URL, however, matches the latest revision. Content Tracker reports this as an access to Content ID "xyzzy", even though the Web server was unable to deliver the requested file to the user.

10.2.6.2.3 Missed Accesses for Content Repeatedly Requested via WebDAV

Scenario: User accesses a document via a WebDAV client, then accesses the same document in the same manner later. Only the first WebDAV request for the document is recorded. Access counts reported for such content tend to be lower than actual.

Details: WebDAV clients typically use some form of object 'caching' to reduce the amount of network traffic. If a user requests a particular object, the client first determines if it already has a copy of the object in a local store. If it does not, the client contacts the server and negotiate a transfer. This transfer is recorded as a COLLECTION_GET_FILE service request.

If the client already has a copy of the object, it contacts the server to determine if the object has changed since the client local copy was obtained. If it has changed, then a new copy is transferred and the COLLECTION_GET_FILE service details is recorded.

If the client copy of the object is still current, then no transfer takes place, and the client presents the saved copy of the object to the user. In this case, the content access is not counted even though the user appears to get a "new" copy of the original content.

10.2.6.3 Data Directory Protections

Content Tracker's Web server filter plug-in runs in the authorization context of the user whose access request is being processed. In some cases, the owner of the request processing thread is a system account. In others, it is a requesting user or another type of non-system account used by the application.

The filter records the information in raw event logs. If the log file does not exist a new one is created using the default protection and authorization credentials of the user who owns the event thread. If the user account has write permission to the data directory, the content access data is recorded. Otherwise, the logging request fails and the access event details are not recorded.

To ensure that Content Tracker can properly record user access requests, the data directory must be configured to accept the account authorization credentials for all users. Granting world write permission (or the equivalent) is one method. Allowing unlimited write access is recommended unless security concerns prohibit this level of unrestricted access.

10.2.6.4 ExtranetLook Component

The ExtranetLook component (if enabled) allows customizations of cookie-based login forms and pages for anonymous-type users. The component uses a built-in Web server plug-in that monitors requests and determines if a request is authenticated based on cookie settings. When a user requests access to a content item, Content Tracker must function within the authorization context of the user's account.

After collecting the access information, Content Tracker tries to record the event data in the log file. If the user's account permissions allow access to Content Tracker's data directory, then the request activity is logged. However, if the account does not have write authorization, the logging request fails and the request activity is not recorded.

10.3 Data Tracking Functions

This section describes the different data tracking functions available with Content Tracker:

Section 10.3.1, "Activity Snapshots"
Section 10.3.2, "Service Calls"
Section 10.3.3, "Web Beacon Functionality"

10.3.1 Activity Snapshots

The activity snapshots feature captures user metadata that is relevant for each recorded content item access:

Section 10.3.1.1, "Search Relevance Metrics"
Section 10.3.1.2, "Enabling the Snapshot Function"
Section 10.3.1.3, "Creating the Search Relevance Metadata Fields"
Section 10.3.1.4, "Setting a Check-in Time Value for the Last Access Field"
Section 10.3.1.5, "Populating the Last Access Field for Batch Loads and Archives"
Section 10.3.1.6, "Linking Activity Metrics to Metadata Fields"
Section 10.3.1.7, "Editing the Snapshot Configuration"

10.3.1.1 Search Relevance Metrics

When activated, the activity metrics and corresponding metadata fields provide search relevance information about user accesses of content items. An optional automatic load function allows users to update the last access activity metric to ensure that checked-in content items are appropriately timestamped.

Content Tracker optionally fills the search relevance custom metadata fields with content item usage information that indicates the popularity of particular content items. This information includes the date of the most recent access and the number of accesses in two distinct time intervals.

Information generated from these activity metrics functions is used in various ways. For example, you can order search results according to which content items have been recently viewed or the most viewed in the last week.

If the snapshot function is activated, the values in the search relevance metadata fields are updated during a post-reduction step. During this processing step, Content Tracker uses SQL queries to determine which content items have changed activity metrics values. Content Tracker updates the applicable database tables with the new values and initiates a re-indexing cycle. However, only the content items that have changed metadata values are re-indexed.

10.3.1.2 Enabling the Snapshot Function

To use these optional features, first enable the snapshot post-processing function which activates the activity metrics choices. Then selectively enable the activity metrics and assign their preselected custom metadata fields.

To enable the snapshot function and activate the activity metrics:

Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
On the Data Engine Control Center: Snapshot tab, select Enable Snapshot.
Click OK.
In the confirmation window, click OK.

10.3.1.3 Creating the Search Relevance Metadata Fields

Before implementing the snapshot function, decide which custom metadata fields to associate with each of the enabled activity metrics. Also, the custom metadata fields must exist and must be of the correct type. Depending on which activity metrics to be enabled, create one or more custom metadata fields using an applicable procedure.

Add the following specific information for the activity metrics:

Last Access Metric
- Field Type: Date
- Default Value: Optional. If not specified, the field is not populated until a content item is checked in and a data reduction run. Some applications require a default value and in those cases, enter a value in the Default Value field that ensures the Last Access field is populated with the date and time of the content check in. For more information, see Section 10.3.1.4.
- Enable for Search Interface: Optional. Check to make the field available for searching.
Short and Long Access Metric
- Field Type: Integer
- Enable for Search Interface: Optional. Check to make the field available for searching.

Indexing a custom metadata field is optional, although indexing makes searches on this field more efficient. Indexing also allows users to query the accumulated search relevance statistics and generate useful data. For example, you can create a list of content items ordered by their popularity, and so on.

10.3.1.4 Setting a Check-in Time Value for the Last Access Field

The Last Access Date field is normally updated by Content Tracker when a managed object is requested by a user and a data reduction run. The field can be empty (NULL) until the next data reduction is run. Some applications require that the date and time of content check in be recorded immediately in the Last Access field.

Use any of the following methods to populate the Last Access field:

Using the Configuration Manager: When adding the metadata field, enter an expression that populates the field with the date and time of content check in (for example, a default value of <$dateCurrent()$> populates the field with the current check-in date and time). After setting the value, fill the field for existing content using the Autoload option.
Using the Autoload option: This option allows retroactive replacement of NULL values in the Last Access field with the current date and time. The only records affected are those where the Last Access metadata field is empty (NULL)
1. Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
2. Click the Data Engine Control Center: Snapshot tab.
3. Select one or more of the activity metric check boxes to enable them. Enter the name of the custom metadata field to be linked to the activity metric (for example, xLastAccess, xShortAccess, or xLongAccess).
4. Select the Autoload check box.
5. Click OK.
  
  A confirmation dialog box opens and the current date and time are inserted into the applicable Last Access fields (those with NULL values) in the DocMeta database table.
  
  Please note:
  - Autoload is primarily intended for use with applications that count check-in operations as an access activity.
  - Autoload backfills the current date and time for all existing content that does not have a date value in the Last Access field. Any content checked in after the Last Access field is defined should have the field automatically populated with the check-in date and time as a default value.
  - Running Autoload can affect every record in the DocMeta database table. Use this option sparingly.
  - The only DocMeta records affected are those where the Last Access metadata field is empty (NULL).
  - Autoload is persistent. The state of the Autoload check box is saved with all the other Snapshot settings. To prevent inadvertent use of this option, clear the Autoload check box and re-save activity metrics field settings immediately after performing the autoload function.
  - Content Server's indexer is not automatically run after Autoload completes the update. You must decide when to rebuild the collection.
  - By default, the Autoload query sets the Last Access metadata field to the current date and time. You can customize the query as needed.

10.3.1.5 Populating the Last Access Field for Batch Loads and Archives

To ensure proper retention of archived and batch loaded content, set the Last Access field date for the import/insert. Otherwise the access date for these content items is NULL, and retention based on this field fails. Also consider how the date can affect retention management. For example, an import of 1998 data is probably better tagged with that date than the date when the import was performed to accurately reflect the retention quality of the content.

The name of the Last Access field is based on the name specified when the field was created. For example, if the name Last Access is used, xLastAccess would be used in the import/insert.

For more information about using the Batch Loader utility, see Oracle Fusion Middleware Administering Oracle WebCenter Content.

The following steps provide a general outline of the procedure to populate the Last Access field using Batch Loader:

Access the Batch Loader.

Create a record that establishes an appropriate Last Access date. For example:

# This is a comment
Action=insert
dDocName=Sample1
dDocType=ADACCT
xLastAccess=5/1/1998
dDocTitle=Batch Load record insert example
dDocAuthor=sysadmin
dSecurityGroup=Public
primaryFile=links.doc
dInDate=8/15/2001
<<EOD>>

Run the Batch Loader to process the file record.

10.3.1.6 Linking Activity Metrics to Metadata Fields

After the activity metrics options have been activated, they must be individually selected to enable them. Enabling the activity metrics also activates their corresponding custom metadata fields.

To enable the activity metrics and activate their corresponding custom metadata fields:

Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
Click the Data Engine Control Center: Snapshot tab.
Select one or more of the activity metric check boxes to enable them. Enter the name of the custom metadata field to be linked to the activity metric (for example, xLastAccess, xShortAccess, or xLongAccess).
For the Short and Long Access Counts, enter the applicable interval amounts in days. For example, 7 days for the Short Access Count and 28 days for the Long Access Count.

The two Access Count metrics differ only in the accounting period (for example, last 30 days versus last 90 days, last week versus last year, and so on). The time intervals specified in the activity metrics are independent of each other. For example, you can set the number of days in the first interval period (Short Access) to more than those in the second interval period (Long Access).

Access counts are only tabulated for reduced dates. If data is not reduced for one or more days, the accesses on those days are not logged or counted. Do not reduce data in random order because the Access Count metrics are affected by the reduction date order.
Click OK when done.
In the confirmation window, click OK.

Note that the fields are case-sensitive. Make sure all field values are spelled and capitalized correctly.

Content Tracker uses the following error checks to validate each enabled activity metric field value:

Checks the DocMeta database table to ensure that the custom metadata field actually exists.
Ensures that the custom metadata field is of the correct type (for example, that the Last Access metadata field is of type Date, an so on).
Checks to explicitly exclude the dID metadata field.

10.3.1.7 Editing the Snapshot Configuration

To modify the snapshot activity metrics settings:

Choose Administration then Content Tracker Administration from the Main menu. Choose Data Engine Control Center.
Click the Data Engine Control Center: Snapshot tab.
Make the necessary changes in the activity metrics fields.
Click OK.
In the confirmation window, click OK.

10.3.2 Service Calls

Content Tracker enables the logging of service calls with data values relevant to the associated services. Every service to be logged must have a service entry in the service call configuration file (SctServiceFilter.hda). In addition to the logged services, you can include the corresponding field map ResultSets in the SctServiceFilter.hda.

For more information about managing service calls, see Oracle Fusion Middleware Developing with Oracle WebCenter Content.

10.3.3 Web Beacon Functionality

Important:

The implementation requirements for the Web beacon feature are contingent on the system configurations involved. All of the factors cannot be addressed in this documentation. Information about the access records collected and processed by Content Tracker are an indication of general user activity and not exact counts.

A Web beacon is a managed object that facilitates specialized tracking support for indirect user accesses to Web pages or other managed content. In earlier releases, Content Tracker was unable to gather data from cached pages and pages generated from cached services. When users access cached Web pages and content items, Content Server and Content Tracker are unaware that these requests ever happened. Without using Web beacon referencing, Content Tracker does not record and count such requests.

The Web beacon involves the use of client side embedded references that are invisible references to the managed beacon objects within Content Server. Content Tracker can record and count user access requests for managed content items that have been copied by an external entity for redistribution without obtaining content directly from Content Server.

Web beacon functionality is useful for reverse proxy activity.

Two situations in particular merit the use of the Web beacon functionality: reverse proxy activity and when using Site Studio.

In a reverse proxy scenario, the reverse proxy server is positioned between the users and Content Server. The reverse proxy server caches managed content items by making a copy of requested objects. The next time another user asks for the document, it displays its copy from the private cache. If the reverse proxy server does not have the object in its cache, it requests a copy.

Because it is delivering cached content, the reverse proxy server does not directly interact with Content Server. Therefore, Content Tracker cannot detect these requests and does not track this type of user access activity.

A reverse proxy server is often used to improve Web performance by caching or by providing controlled Web access to applications and sites behind a firewall. Such a configuration provides load balancing by moving copies of frequently accessed content to a Web server where it is updated on a scheduled basis.

For the Web beacon feature to work, each user access includes an additional request to the managed beacon object in Content Server. The additional request adds overhead, but the Web beacon object is very small and does not significantly interfere with the reverse proxy server's performance. Note that it is only necessary to embed the Web beacon references in objects you specifically want to track.

Another usage scenario involves Site Studio, a product that is used to create websites which are stored and managed in Content Server. When Site Studio and Content Server are located on the same server, Content Tracker is configured to automatically track the applicable user accesses. The gathered Site Studio activity data is then used in pre-defined reports, as described in Section 10.4.1.4.

If your Website is intended for an external audience, you may decide to create a copy of the site and transfer it to another server. In addition to being viewed publicly, this solution also ensures that site development remains separate from the production site. In this arrangement, however, implement the Web beacon feature to ensure that Content Tracker can collect and process user activity.

For more information about managing Web beacon objects, see Developing Oracle WebContent.

10.4 Content Tracker Reports

Content Tracker Reports uses the captured and reduced data to generate reports that outline the usage history of particular pieces of content. Use the pre-defined reports or create custom queries for the information to be tracked. Any external commercial reporting tool can be used. For more information, see Section 10.4.2.5.

By default, Content Tracker collects and records only content access event data and excludes information gathering on non-content access events like searches and the collection and synthesis of user profile summaries. Because of this exclusion, some pre-defined report options are not displayed on the Content Tracker Report Generator main page. To make all pre-defined report options available, modify the config.cfg file to change the setting of the optimization functions, or use the Component Manager Update function.

You can derive reports from a variety of criteria, including specific users, groups of users, and any set of content that you can define with a query or a group of metadata values. Based on the variables in the system (such as number of users, amount of content, metadata count, and so on), Content Tracker Reports enables hundreds of key metrics to be included in reports. Specialized reports enable you to understand and disclose which content is most relevant to users.

This section covers the following topics:

Section 10.4.1, "Report Features and Considerations"
Section 10.4.2, "Report Creation Types"
Section 10.4.3, "Security Checks and Query Results"
Section 10.4.4, "Using Content Tracker Reports"

10.4.1 Report Features and Considerations

Consider the following issues when using report functionality:

Section 10.4.1.1, "Oracle and DB2 Case Sensitivity"
Section 10.4.1.2, "Access Control Lists and Content Tracker Reports Secure Mode"
Section 10.4.1.3, "User Authentication/Authorization and Auditing"
Section 10.4.1.4, "Site Studio Website Activity Reporting"

10.4.1.1 Oracle and DB2 Case Sensitivity

If Oracle database or DB2 is used as the Content Server database, metadata values are case sensitive. You must enter values in the query report criteria with the correct capitalization or Content Tracker Reports may not return all possible matching files.

For example, if the content type in the Oracle or DB2 Content Server database is AdAcc but the user enters it in the query field as adacc, ADACC, or Adacc, Content Tracker Reports does not return any results. In this case, and for all of the metadata fields in each of the pre-defined query reports, the entered value must match the capitalization of the metadata value.

10.4.1.2 Access Control Lists and Content Tracker Reports Secure Mode

The SctrEnableSecurityChecks configuration variable is set when the Content Tracker Reports component is installed and enables of one of two security modes: secure and non-secure. Changing the variable provides the option to employ individual user role and account information to restrict the visibility of content item information in report results.

You can control what content items and metadata that users can see in their generated reports. Ideally, users should not see anything through Content Tracker Reports that they couldn't find through a Content Server search. If secure mode is used, the information in any generated report is filtered based on the user's role and account privileges.

However, if Access Control Lists (ACLs) are enabled on the instance, the secure mode option in Content Tracker Reports does not work. During installation, leave the security checks preference check box blank. On an ACL-based system, the secure mode must be disabled. In this case, it is possible for users other than a system administrator to see information about content items that they would not otherwise be authorized to access and view.

Note:

For more information about the security checks installation preference and how it affects the report queries and report results, see Section 10.4.3.

10.4.1.3 User Authentication/Authorization and Auditing

An auditing feature is available that monitors unsuccessful attempts to access the system or permission-protected content items. Two reports are available that can help analyze attempted security breaches that include failed user logins and unsuccessful attempts to access secure content items. Information about failed access attempts is essential to safeguard system and content security and to maintain proper maintain audit trails and records.

The available auditing reports include:

Authorization Failures by User: This report provides access authorization denial information that includes user names and their IP addresses. Although these users have system access privileges, their role/account memberships can restrict them from accessing particular content items (such as access to payroll content).
Login Failures: This report provides login/authentication failure information that includes user names and their IP addresses. The logged data does not distinguish between external, internal, and global users because, without a successful login, it is impossible to differentiate user types.

10.4.1.4 Site Studio Website Activity Reporting

When using Site Studio, Content Tracker is automatically configured to track Site Studio activity. Content Tracker Reports uses the logged data to generate the pre-defined reports that summarize the website access results. The Site Studio-specific Web access reports are included on the Content Tracker Report Generator main page when Site Studio is installed.

The Site Studio pre-defined reports use the default Content Tracker Reports formatting and provide drill-down report capabilities. The top level reports for both are summary reports that use Site ID and Accesses as their general criteria. The drill-down reports provide the relevant statistics:

Website Content Accesses: This report is ID-based at the top level and in later drill-down reports, the results are listed by Content ID and Relative URL.The information shows what URLs are being used to access a website. However, there are cases where many different URLs actually display the same page. The results of this report also provide the total number of visits to the nodes, regardless of how the user got there.

Website Accesses by URL: This report provides summaries of the website relative URLs and the relevant activity sums.

10.4.2 Report Creation Types

There are three methods to produce Content Tracker Reports:

Section 10.4.2.1, "Pre-Defined Reports"
Section 10.4.2.2, "Custom Reports"
Section 10.4.2.3, "Generating Reports"
Section 10.4.2.4, "Accessing Reports from the Information Page"
Section 10.4.2.5, "External Report Generators"

10.4.2.1 Pre-Defined Reports

Content Tracker Reports provides several pre-defined report options used to generate reports for the most commonly requested topics.

Reports produced with the Content Tracker Report Generator have the same general format and visual layout. The information in the reports is extracted from the reduced data in the SctAccessLog database table and other Content Server database tables, if needed.

Users who request and open content items are included in the compiled results. The opened content item can be the Web location file (the absolute path to the content item), an HTML version (by using Dynamic Converter), or the actual native file. Users that open only the Content Information page are not included in the tracked data.

Information must first be accumulated by Content Tracker then undergo a data reduction cycle. Manually reducing the data immediately updates the database tables so the generated query reports also display the updated information. Otherwise, there is a one-day delay from the time a user accesses a content item until that information is included in the access history results.

When a generated query report contains an active link to a specific content item, click the link to open the corresponding Content Dashboard. The content dashboard shows the versions of content item and the access times. Click Versions Separated to show all version or click All Versions Together to combine the view of the versions.

You can generate various levels of report results for each pre-defined report. Depending on the search criteria entered on the Content Tracker Report Generator main page, the results are filtered accordingly. The top level reports are summary reports and provide very general information. Use the links on the top level reports to drill down to more specific information.

10.4.2.2 Custom Reports

In addition to the sample reports provided with Content Tracker Reports, you can create custom queries to track information. Consider the following when creating custom reports:

When using Oracle Database and aliases to display the column names in the generated report, add the aliases to the /shared/config/resources/upper_clmns_map.htm file. For example, if the Name and Access_Date_GMT column headers are used, enter the following lines in the upper_clmns_map.htm file:
```
<tr>
<td>NAME</td>
<td>Name</td>
</tr>
<tr>
<td>ACCESS_DATE_GMT</td>
<td>Access_Date_GMT</td>
</tr>
```
When using the extended service tracking function, be aware that the name of the service is always logged to the sc_scs_idcService column. When you design queries that reference the contents of the extended fields, include the service as a qualifier in the query.
After successfully adding the custom report query to the report query file, you can use it to view the results.

10.4.2.3 Generating Reports

To generate a pre-defined or custom report:

Choose Administration then Content Tracker Reports from the Main menu.
Select the report type.
Enter any search and filtering criteria in the applicable fields.
Click Submit.

10.4.2.4 Accessing Reports from the Information Page

You can generate the Access History Report for any content item from the Information page of that content item.

Search for a content item and click the associated Info icon.
On the Content Information page, select View Access History Report from the Global Actions list.
On the Content Access Report, click Accesses.
On the Content Access Report, click Users.

10.4.2.5 External Report Generators

You can use commercial report generation tools to produce basic text reports or more sophisticated graphics such as bar graphs or pie charts from the data collected by Content Tracker.

This guide assumes that users have a working knowledge of the external reporting tool they are using to create custom reports. For this reason, only basic guidelines that are applicable to most commercially available reporting products are discussed here.

10.4.2.5.1 Using an External Report Generator

To generate custom reports from an external reporting tool:

Open the external reporting tool application
Set up an ODBC connection (if appropriate) to the database.
Select the database tables to use in the report.
Link the selected tables based on key IDs or fields common within the files. Ideally, each selected table could be linked using the same key ID or field if it is common to each table.
Choose and integrate the fields from each table into the report form. In most cases, the fields can be selected, dragged, and dropped onto the form.

In this step, design the customized report. The specific fields, select display as columns on the final, basic text report that the external reporting application generates.
Create custom parameters, criteria or both if the external reporting application supports these options.

For example, one type of custom parameter would allow you to either have queried information hard-coded into the final report or use a prompt to obtain input directly form the end user. Additionally, creating specific sort criteria can strategically restrict and optimize the aggregate data included in the final report.
Specify the sorting order of the selected fields and format the final report output.
Preview the final report (optional).
Check the report into a delivery mechanism.

Generally, you can format and deliver the final report as Web-viewable pages or as a printable file. The external reporting application can also use the data results to create attractive graphics such as bar graphs or pie charts.

Additionally, you can import the saved file into other products such as Microsoft Excel or Word files.

10.4.3 Security Checks and Query Results

During the installation process for Content Tracker Reports, you can use individual user role and account information to restrict the visibility of content item information in report results to control what content items and metadata that users see in their generated reports. Ideally, users should not see anything through Content Tracker Reports that they couldn't find through a Content Server search.

Caution:

If Access Control Lists (ACLs) are enabled on the instance, the secure mode option in Content Tracker Reports does not work. For more information, see Section 10.4.1.2.

The SctrEnableSecurityChecks configuration variable is set when the Content Tracker Reports component is installed. This variable enables the use of either secure or non-secure mode. During installation, select a mode by checking or not checking the security check box. After installation, change the setting using the Component Manager.

If SctrEnableSecurityChecks=True, Content Tracker Reports operates in secure mode. The same security criteria (role and account qualifications) used to limit search results are also applied to the Content Tracker Reports queries and the generated reports. Two different users running the Top Content Items report could see different results. If set to false (the default), a user other than a system administrator can see information about content items that they would not be authorized to access and view.

The contenttrackerreports_query.htm file contains all the Content Tracker Report Generator's queries that produce the pre-defined and custom reports. To support non-secure and secure modes, this file contains two sets of queries.

Note:

For localization support, the word "document" was changed to "content item" in the pre-defined report names. However, the corresponding report queries still include an abbreviation for the word document (doc). The report query names have not been changed in the contenttrackerreports_query.htm file.

For example, the "Top Content Items" report is a pre-defined report listed on the Content Tracker Report Generator main page. The corresponding report queries in the contenttrackerreports_query.htm file use the pre-existing naming conventions:

qSctrTopDocs (non-secure version)

qSctrTopDocs_SEC (secure version)

This section covers the following topics:

Section 10.4.3.1, "Security Mode Examples"
Section 10.4.3.2, "Pre-Defined Reports and Security Modes"
Section 10.4.3.3, "Custom Reports and Security Modes"
Section 10.4.3.4, "Changing the Security Configuration"
Section 10.4.3.5, "Security Mode Selection"
Section 10.4.3.6, "Enabling or Disabling Security Checks for Report Queries"
Section 10.4.3.7, "Customization for Report Query Security"
Section 10.4.4.6, "Creating Custom Report Queries"

10.4.3.1 Security Mode Examples

A user might have admin, contributor, guest, and sysmanager privileges (a semi-admin user) but does not have the proper role/account membership to see a particular content item (such as the payroll report). The assigned privileges allow this user to access the Content Server Admin page, and the Content Tracker Report Generator main page. But when this user performs a standard search in Content Server, the results page would not reveal that the payroll report exists.

If the security checks preference variable is enabled, Content Tracker Reports enforces the same role/account membership checks. Then, depending on the user requesting a specific report, the role/account matching activity determines what content item usage data is included.

As demonstrated in the following examples, the report results generated for a specific user (the semi-admin user described above) are contingent upon if the preference variable is enabled or not.

Secure mode example:

When the security checks preference is enabled, Content Tracker Reports is running in secure mode and checks for role/account matches. In this case, the semi-admin user is not entitled to retrieve and view confidential data. Due to the restrictions associated with this user's role/account privileges, the payroll content item remains completely invisible. The data is not included in report results and the user is unaware of its existence.
Non-secure mode example:

When the security checks preference is disabled, Content Tracker Reports is running in non-secure mode and does not check for role/account matches. In this case, although the semi-admin user is not entitled to access or view the payroll report, some confidential information associated with the payroll content item can nevertheless be retrieved.

At the very least, the user can discover the payroll report's existence and view some of its metadata The danger in this situation depends on what kind of information the metadata contains. In some cases, even knowing the content item exists could be a serious breach of security.

Note:

This kind of security breach is not limited to semi-admin users. For example, a non-privileged user (that is, someone not ordinarily authorized to view a particular content item on a search results page) might gain access to the Content Tracker Report Generator main page. This could occur either by reaching the Admin page or by guessing a URL. In this case, the user would see a report containing some of the metadata describing the prohibited content item.

10.4.3.2 Pre-Defined Reports and Security Modes

Most pre-defined report queries have both secure and non-secure forms included in the contenttrackerreports_query.htm file. If the search results of a query can be affected by user role and account privileges, then secure variants of the non-secure queries are included. If the security checks variable is enabled, then the secure forms of queries take precedence and are executed instead of the corresponding non-secure queries.

It is not possible to selectively enable or disable the security checks preference variable for individual report queries. However, you can manage secure and non-secure queries by customizing the contenttrackerreports_query.htm file. you can disable security checks (account matching) for a particular query by deleting or renaming the secure form of the query.

10.4.3.3 Custom Reports and Security Modes

In addition to the pre-defined reports, you can create custom reports based on search queries tailored to specific needs. In addition to creating custom reports, selective security checks for the reports can be implemented. You can include both the non-secure and secure forms of the query in the contenttrackerreports_query.htm file.

For example, you can add a custom report with both query forms. If the non-secure query name is qMyTopTwenty, then the secure query name would be qMyTopTwenty_SEC. If the security checks preference variable is enabled, the report is generated using the secure query (qMyTopTwenty_SEC). If the security checks preference variable is not enabled, the report is generated using the non-secure query (qMyTopTwenty).

Note:

The secure form of a custom query should follow the specific pattern of the existing secure queries in the contenttrackerreports_query.htm file. For more information, see Section 10.4.3.8.

10.4.3.4 Changing the Security Configuration

To manually enable or disable the ScrtEnableSecurityChecks setting:

Choose Administration then Admin Server from the Main menu.
On the Content Admin Server page, click the name of the instance on which to change the Web beacon preference setting.
On the Content Admin Server instance_name page, click Component Manager.
On the Component Manager page in the Update Component configuration field, select Content Tracker Reports from the list.
Click Update.
On the Update Component Configuration page in the SctWebBeaconIDList preference field, enter the applicable Web beacon object dDocNames separated by commas.
Click Update.
Restart Content Server to apply the changes.

10.4.3.5 Security Mode Selection

To generate a requested report, Content Tracker Reports must select and execute the applicable non-secure or secure query:

Section 10.4.3.5.1, "Query Type Selection Process"
Section 10.4.3.5.2, "Report Query Selection Example"

10.4.3.5.1 Query Type Selection Process

Content Tracker Reports chooses a report query based on the following process:

When a user submits a report request, the name of that report query is fed to a dedicated Content Tracker Reports service.
The Content Tracker Reports service enforces the security checks setting as follows:
- If the security checks preference is disabled:
  
  Content Tracker Reports is running in non-secure mode and does not perform role/account matching (user role and account privilege verification). The Content Tracker Reports service searches for the non-secure version of the query and uses it to generate the requested report. It is irrelevant if there is a secure version of the report query.
  
  In non-secure mode, only non-secure queries are used to generate reports. As a result, all users see the same report results regardless of their individual role and account memberships.
- If the security checks preference is enabled:
  
  Content Tracker Reports is running in secure mode and performs role/account matching (user role and account privilege verification).
  
  To begin processing:
  
  The Content Tracker Reports service appends the "_SEC" suffix to the submitted query name and searches the contenttrackerreports_query.htm file for this variant of the requested query.
  
  During the search:
  - If the secure form of the query is found, then it is used to generate the requested report.
    
    This means that the security checks to enforce role/account matching are performed and the query results are limited by the role and account privileges of the user requesting the report. Accordingly, different users may see different data results.
  - If the secure form of the query is not found, then the non-secure variant is used.
    
    This actually produces the same result as if the security checks preference was disabled. This means, role/account permissions are not authenticated and the content item data is not filtered. Thus the results included in reports are identical for all users. It is possible for users without proper permissions to view confidential information.

10.4.3.5.2 Report Query Selection Example

When a user requests the User Type report:

The report query name (qSctrUsersByType) is passed to the Content Tracker Reports service.
The Content Tracker Reports service evaluates the request based on the security checks preference variable:
1. If security checks are disabled (set to false), then the service finds the qSctrUsersByType query in the contenttrackerreports_query.htm file.
2. If security checks are enabled (set to true), then the service adds a security suffix to the query name (qSctrUsersByType_SEC) and searches for this variant in the contenttrackerreports_query.htm file.
Depending on the security checks status, Content Tracker Reports uses the applicable query to generate the Users by User Type report.

Figure 10-3 Report Query Selection Process

10.4.3.6 Enabling or Disabling Security Checks for Report Queries

To disable or enable security checks (account matching) for particular report queries:

In a text editor, open the necessary file:

IntradocDir/custom/ContentTrackerReports/resources/contenttrackerreports_query.htm
Locate the secure version of the query to disable.
Rename the query. For example, to disable the qSctrUsersByType_SEC query, you can add the suffix "_disabled" to the query name:
```
qSctrUsersByType_SEC_disabled
```
Renaming the query ensures that the Content Tracker Reports service cannot find the secure query in the contenttrackerreports_query.htm file. Instead, the non-secure version (qSctrUsersByType) is used.

Note:

Renaming a secure query is a temporary disabling solution. To use the secure version of a query later, re-enable it by restoring its original name. If you delete the secure version of the query, you must re-create the entire secure version of the query to use it again.
Save and close the contenttrackerreports_query.htm file.
Restart the Content Server to apply the changes.

10.4.3.7 Customization for Report Query Security

In secure mode, Content Tracker Reports always gives priority to the secure forms of queries. If a a secure form of a query is found in the contenttrackerreports_query.htm file, then it is used to generate the report instead of the corresponding non-secure query.

It is not possible to selectively enable or disable the security checks preference variable for individual report queries. However, it is possible to manage secure and non-secure queries by customizing the contenttrackerreports_query.htm file.

Customizing the report query file involves:

Selectively enabling or disabling security checks (account matching) for specific report queries.
Creating one or more non-secure custom report queries and, depending on the security requirements of the information, selectively including the corresponding secure version.

10.4.3.8 Creating Secure Report Queries

To create a secure version of a non-secure report query:

In a text editor, open the contenttrackerreports_query.htm file:

IntradocDir/custom/ContentTrackerReports/resources/contenttrackerreports_query.htm
Locate the query for which to create a secure version. For consistency, add your secure query immediately following the corresponding non-secure version.
Design your secure SQL report query: It might be helpful to review Step 2 in the procedure for Creating Custom Report Queries.
Adjust your query to ensure that it follows the pattern of the existing secure queries:
1. In the FROM clause, include the Revisions table.
2. In the WHERE clause, include the %SCTR_SECURITY_CLAUSE% token. This token acts as a placeholder for the WHERE clause that the Content Tracker Reports service inserts.
3. Complete the query following the established pattern in the existing secure queries.
Save and close the contenttrackerreports_query.htm file.
Restart the Content Server to apply the changes.

10.4.4 Using Content Tracker Reports

This section provides information and task procedures about Content Tracker Reports functions:

Section 10.4.4.1, "Generating Reports"
Section 10.4.4.2, "Accessing Drill Down Reports"
Section 10.4.4.3, "Accessing Reports from the Information Page"
Section 10.4.4.4, "Viewing Access Results by Revision"
Section 10.4.4.5, "Viewing Access Results for All Versions Combined"
Section 10.4.4.6, "Creating Custom Report Queries"

10.4.4.1 Generating Reports

To generate a pre-defined or custom report:

Choose Administration then Content Tracker Reports from the Main menu.
Click the report type.
Enter any search and filtering criteria in the applicable fields.
Click Submit.

The selected report type opens.

10.4.4.2 Accessing Drill Down Reports

To access one or more drill down reports:

Generate a pre-defined or custom report. For more information, see Section 10.4.4.1.
After generating a pre-defined report, certain line item results contain an active drill down report link. Click the link.

The selected drill-down report opens.

Note:

Some reports contain multiple levels of drill down reports. For example, the Top Content Items report contains a DocName drill down report link. Click this link to generate another report that displays the applicable content access details for the selected content item. In this report, two additional drill down reports are available: one for Accesses and another for Users.

10.4.4.3 Accessing Reports from the Information Page

To generate an Access History Report for a content item from the Information page of that content item:

Search for a content item and click the associated Info icon.
On the Content Information page, select View Access History Report from the Global Actions list.
On the Content Access Report, click Accesses.
On the Content Access Report, click Users.

The most current Accesses by User report for the content item opens.

10.4.4.4 Viewing Access Results by Revision

By default, the access results for multiple versions of a single content item are displayed individually on the Content Dashboard. To see the separated access results view of the Content Dashboard report:

Generate a content item-based query report from the Content Tracker Report Generator main page. For more information, see Section 10.4.4.1. For example, select the Top Content option on the Content Tracker Report Generator Main page to generate the applicable report.
Select a content item from the results report and click the content identification number listed in the DocName column.

The Content Dashboard for the selected content item opens. By default, this view shows the access results for each revision of the selected content item that was accessed.

10.4.4.5 Viewing Access Results for All Versions Combined

To see the combined access results view of the content dashboard report:

Generate a content item-based query report from the Content Tracker Report Generator main page. For more information, see Section 10.4.4.1. For example, select the Top Content option on the Content Tracker Report Generator Main page to generate the applicable report.
Select a content item from the results report and click the content identification number listed in the DocName column.
On the Content Dashboard for the selected content item, click All Versions Together.

The resulting content dashboard view shows the combined access results for both versions.

10.4.4.6 Creating Custom Report Queries

This section provides an example that demonstrates how to create a non-secure custom report query. This particular query generates a report that lists users and their personal attributes. The data is derived from the Content Server's Users database table.

Note:

The example in this section uses a non-secure query. Any user can view the generated report results regardless of their role and account privileges. All of the reports are generated using either non-secure of secure queries. The query selection is dependent on the security mode. For more information about the optional security checks preference variable, see Section 10.4.3. To create a secure report query, see Section 10.4.3.8.

To create the custom users report:

Design the SQL report query.
Enter the custom report query into the query file of Content Tracker Reports:
1. Navigate to the IntradocDir/custom/ContentTrackerReports/resources directory. Open the contenttrackerreports_query.htm file in a text editor.
2. Enter the custom report name, number of columns, and the source database table.
  
  For example, the following excerpt from the query file illustrates that the custom query report extracts the information from all columns in the Users database table.
```
<tr>
    <td>qCustomUsers</td>
    <td>
    SELECT *
    FROM Users
    </td>
</tr>
```
Enter a link to the custom report in the Content Tracker Report Generator main page file:
1. Navigate to the IntradocDir/custom/ContentTrackerReports/templates directory. Open the contenttrackerreports_main_page.htm file in a text editor.
2. Enter the attributes to display the link on the Content Tracker Report Generator main page.
  
  For example, the following excerpt from the main page file illustrates that the custom report link is presented as a selectable button and is listed as the Custom Users Report link on the page.
```
<h4 class=xuiSubheading>Custom Reports</h4>
<table width=80% border=0>
    <tr>
    <td> <span class="tableEntry"><input type="radio" name="radiobutton" value="qCustomUsers">
    Custom Users Report </span></td>
    </tr>
</table>
```
Enter the formatting requirements in the template resource file of Content Tracker Reports.
1. Navigate to the IntradocDir/custom/ContentTrackerReports/resources directory.
2. In a text editor, open the contenttrackerreports_template_resource.htm file.
3. Enter the display features to use for the generated custom report and any drill-down reports.
  
  For example, the following excerpt from the template resource file specifies that the report title is "Deanna's First Report" and that the drill-down report is based on the content items seen by user report.
```

<@dynamichtml qCustomUsers_vars@>
    <$reportWidth = "100%"$>
    <$title = "<i>Content Access Report</i>"$>
    <$reportTitle="Deanna's First Report"$>
    <$column1Width="35%"$>
    <$column0Drill="qSctrDocsSeenByUser_Drill"$>
<@end@>
```
Restart the Content Server to apply the changes.