Analytics Data Capture Application

Also referred to as the "sensor."

Asset Registration

Enabling report generation for assets. Because WebCenter Sites assets are specific to a WebCenter Sites installation, you must register their asset types with Analytics by assigning them to reports through the Analytics Administration interface. This enables Analytics to:

  • Recognize WebCenter Sites asset types

  • Configure report menu options in the "General Information" and "Content Information" report groups

  • Generate reports on assets of the registered asset types

Data Capture

The process of recording each visitor's clicks and the associated information—the date and time of each click, the assets that are clicked, the IP address from which the clicks are issued, the site being visited, and so on. The information is captured in real time by the sensor servlet and recorded in a data.txt.tmp file on the local file system (local to the Analytics data capture application). The data.txt.tmp file will be rotated by the sensor to data.txt when either the threshold interval is reached (see the sensor.threshold property on sensor.thresholdtime), or the application server is restarted.

Analytics can capture data on the usage of WebCenter Sites assets and on their visitors only if published pages are tagged for data capture. In the case of Engage assets, the assets themselves must be tagged for data capture.

Hadoop Jobs

Runs jobs in a parallel and distributed fashion in order to efficiently compute statistics on the raw data that is stored in the Hadoop Distributed File System.

Hadoop implements a computational paradigm named Map/Reduce, which divides a large computation into smaller fragments of work, each of which may be executed on any node in the cluster. Map/Reduce requires a combination of jar files and classes, all of which are collected into a single jar file that is usually referred to as a "job" file. To execute a job, you submit it to a JobTracker. Hadoop Jobs then responds with the following actions:

  • Schedules and submits the jobs to JobTracker.

  • Processes raw data captured by the data capture application into statistical data and injects the statistics into the Analytics database.

(Hadoop provides a web interface to browse HDFS and to determine the status of the jobs.)

Hadoop jobs pre-calculate commonly requested site usage statistics (such as average number of requests for a piece of content per unit time) in order to shorten report generation time. Statistical computation is typically resource-intensive and time-consuming. Therefore, it is performed not on-the-fly, each time a report is generated, but in advance so that it can be available by the time it is needed. Thus, precalculated statistics are immediately available for retrieval into reports. Statistics include, for example:

  • Current information, such as today's total hits to each site, visiting countries, total number of visits from a given country, types of browsers, and average session duration.

  • Historical results, such as:

    Daily, weekly, and monthly statistics—for example, the total number of requests for a given asset on a given site during a certain month in the reporting period. Yearly statistics—a histogram in the performance indicator indicating the frequency with which certain assets were accessed during each week of the past year.

    How long a Hadoop job runs depends on a number of factors, including site activity within the latest data capture time frame, the cumulative volume of captured data, and the configuration of the Analytics application. When data analysis is complete, the resulting statistics are available, at any time, for report generation.


Integrating Analytics with your WebCenter Sites system means enabling report generation for asset types and users on your online site. Integration involves registering content management sites, WebCenter Sites users, and asset types with Analytics, configuring the Pageview Object (through the "Page Views" report), and granting users permissions to access reports through membership in the appropriate user groups. The steps necessary to accomplish these tasks are described in Chapter 1, "Integrating Oracle WebCenter Sites: Analytics with Oracle Web Center Sites."

Internal Search

A search performed by a visitor using the site's built in search engine. This search returns results from within the site's contents.


An Analytics construct. The subject of a report.

When storing and processing information, Analytics uses objects, whereas WebCenter Sites uses assets and asset types. To allow Analytics to recognize a WebCenter Sites asset type and track assets of that type, administrators define an Analytics object in terms of a WebCenter Sites asset type. They do so by configuring an Analytics report for the object and assigning the desired asset type to that object. The process of configuring a report defines the underlying asset.

Note: A special instance of an object is the Pageview Object, which administrators must configure (by configuring the "Page Views" Report) in order for reports in the "General Information" group to work.

The "Page Views" report supports multiple asset types.

Object Impression

A single invocation of the sensor servlet. For more information, see Section 3.4, "Object Impressions and Work Packages."

Page View

An Analytics construct. A group of one or more assets, whose asset types are enabled for tracking by the Analytics data capture application.

Asset types are enabled for tracking when they are defined in the Pageview Object and when published pages displaying those asset types are tagged with AddAnalyticsImgTag (data capture tag). For more information about tracking, see Data Capture.

Pageview Object

A default Analytics object that you configure through the "Page Views" report to specify the type (or types) of assets Analytics will track. Configuring the Pageview object enables default reports that are based on the Pageview object.

The Pageview object is the basis for the "Page Views," "Site Information," and "Clickstream" reports, and thus it should be assigned asset types whose assets make the most sense (from the marketing standpoint) to be included in these reports.

A Pageview object can be assigned multiple asset types. The "Page Views" report will contain statistics on the usage of those asset types.

"Page Views" Report

A report, based on the Pageview Object. The "Page Views" report displays statistics on Page View activity on your site.

Processed Data

Visitor activity data that has been processed by Hadoop Jobs into statistical data. When processing is complete, the data is injected into the Analytics database, where it is immediately available for the reports that users request from the Analytics reporting interface.

Raw Data

Unprocessed data describing visitor activity on the site, recorded during the Data Capture process and stored in the local file system for future processing. This is the data on which statistics are calculated by the Hadoop Jobs for display in reports. (This data cannot be directly used for report generation.)


Also referred to as the "Analytics data capture application."

Site Registration

Identifying a WebCenter Sites content management site to Analytics in order to enable Analytics to track visitor activity on assets published from that site.

Statistical Data

See Processed Data.

Work Package

A collection of object impressions. For more information, see Section 3.4, "Object Impressions and Work Packages."