In order for Analytics to process raw data for a custom report, you must develop a new Analytics job to process that data. The processed data will be inserted into the database.
Note :
This exercise walks you through the process of developing a new Analytics job, for the parameter you added (in Exercise 1) to your Analytics installation. The Analytics job you will be developing is a duplicate of the default Analytics job in your Analytics installation. For the purposes of this tutorial, all bean and mapper class names used to develop the Analytics job in this exercise have been renamed to avoid overwriting the default Analytics job.
This exercise contains the following sections:
The following is a brief description of the steps that are required for developing an Analytics job.
Select the most appropriate location as the input:
Select a location from the folders within the analytics
directory, which must be used by the Analytics job as input data for the report. The criteria for choosing a particular location as an input depends primarily on the data required for a particular report. In most instances, the following location is sufficient for satisfying the data requirements for your report.
Sesdata
: This location stores the data for all sessions, along with the details of the visitor, object impressions, and other relevant information associated with a session.
Note :
When creating a custom report, do not use the injected
folders (such as oiinjected
, sesinjected
, or visinjected)
as your input location(s), because the data stored within these locations is used by the database injection processor to store into the database.
For more information about locations and processors, and which type of data is stored in each location, see the Oracle Fusion Middleware WebCenter Sites: Analytics Administrator's Guide.
Extend the schema:
To store the processed data, you will add a new L3
table to the Analytics database.
The main purpose of the bean class is to store the output data. The framework will use the bean class to store the final output of the job. The database injection processors will take the data stored within each bean class and insert the data into the L3
table (created in the previous step).
The new bean class extends the pre-defined classes provided by the framework. Implementation details are explained in the following sections: Section 4.1.1, "Example for Developing a New Analytics Job" and Section 4.1.2, "Developing an Analytics Job for the 'New Browsers' Report."
Create a new Mapper
class:
The Mapper
class will encapsulate the business logic to process the input data. For every input bean, the Mapper class will create a new instance of the bean class (created in step 3), and then store the processed data in the newly created instance. The output of the Mapper class will be collected by the Analytics framework and further processed before it is finally written to the designated output location.
The new Mapper class will extend the pre-defined classes provided by the framework. The implementation details will be explained in the next section.
Configuring the processor.
To integrate your newly coded beans and Mapper classes with the Analytics framework, you need to add them to the existing processor configuration file (.xml
file).
Figure 4-1 depicts the job execution flow with the custom mapper integrated into the Analytics-Hadoop job framework.
Figure 4-1 Execution flow chart with custom mapper integrated with the analytics hadoop job framework
In the Map Phase, the processor will read the data from the input location. This data will be passed to the custom mapper. The custom mapper will transform every input bean to custom bean.
In the Reduce Phase, the data collected from the Custom Mapper will be further processed before it is written to the output location. The content of the output location will contain the Custom Bean. This aggregated data will be stored into the database by the database injection processor.
This section contains the following topics:
Section 4.1.2, "Developing an Analytics Job for the 'New Browsers' Report"
Section 4.1.4, "Integrating the New Analytics Job with the Existing Hadoop-Jobs Component"
The "NewBrowsers" report identifies the browsers that visitors used to access a given site's page view within the reported time period.
Counting browsers is simply counting the sessions for each browser. We simply sum up the aggregated values on sessions. Aggregating sums for all possible time ranges that users can select results in too many aggregated values, so we concentrate on daily sums.
Data aggregation is done in a single Map-Reduce phase. (1) The starting point is the session data (SesData
location). You will use this data to generate intermediate/uncompressed raw data stored as L3NewBrowserBean
objects in the SesProcessed
location. (2) The SessionBean
objects are then used by the SessionProcessor
processor to create L3NewBrowserBean
objects, which store aggregated data that can be inserted into the database. The implementation is illustrated in Figure 4-2.
Figure 4-2 "NewBrowsers" Report Implementation
The SesData
location has all data on all sessions. You will use this data as the starting point for generating your report data. There is no need to add to or modify that data; you can use the existing SessionBean
objects.
Extend the SessionProcessor
processor to create a new L3NewBrowserBean
object for each combination of siteid
, dateid
, browserid
, siteid/dateid/browserid combination. This object will be stored by the SessionProcessor processor in the Sesinjected location.
The SessionInjection
processor inserts the data into the database. A new bean is not required, but for proper insertion into the database, make sure that you have properly annotated the fields of the L3
Bean (created in the SessionProcessor
) with theAggregator.
Note :
The Aggregator is a java annotation used to tag fields of a bean that can be aggregated. If you wish to use an Aggregator other than SumAggregator
, then you can use any of the following Aggregators listed:
AvgAggregator
CountAggregator
DistinctCountAggregator
MaxAggregator
NullAggregator
MinAggregator
Follow the general steps, as described in Section 4.1, "Overview for Creating an Analytics Job," for developing an Analytics Job.
Section 4.1.2.1, "Step 1: Select the Input Location"
Section 4.1.2.2, "Step 2: Extend the Schema"
Section 4.1.2.3, "Step 3: Create the Beans"
Section 4.1.2.4, "Step 4: Create the Mapper Classes"
Section 4.1.2.5, "Step 5: Adding Beans and Mappers to the Processor Definitions"
In the "NewBrowsers" report you are aggregating data on a session, so the input location should be the sesdata
location.
To store the data, you will need to add a new table to store the pre-aggregated L3
data. Execute the following SQL statement as the analytics
user:
CREATE TABLE L3_DATEXSITEXNEWBROWSERXCOUNT ( DATEID NUMBER NOT NULL , SITEID NUMBER(6) , BROWSERID NUMBER , COUNT NUMBER);commit;
For this report, one bean is required:
L3NewBrowserBean
- This class is mapped to the L3
table in the database. The L3NewBrowserBean
is used to store the aggregated count of browser visits. The content of this bean will be injected into the L3_DATEXSITEXNEWBROWSERXCOUNT
table.
Create the L3NewBrowserBean
bean class. For sample code, see the example in Section 4.1.2.3, "L3NewBrowserBean.java."
Example 4-1 L3NewBrowserBean.java
package com.fatwire.analytics.domain.l3;
import javax.persistence.Entity;
import com.fatwire.analytics.domain.AbstractL3Bean;
import com.fatwire.analytics.domain.annotation.Aggregate;
import com.fatwire.analytics.domain.annotation.SumAggregator;
/**
* this class represents entries in the L3_dateXsiteXnewbrowserXcount table
*/
@AttributeOverride(name = "key.firstEntity", column = @Column(name = "BROWSERID", nullable = false, insertable = false, updatable = false))
public class L3NewBrowserBean extends AbstractL3Bean<DateSiteEntityKey> {
private static final long serialVersionUID = 1L;
/** the primary key definition of the object/table */
/** the count value for the entity */
/** overwrite constructor to set the hadoop keys
keys = new String[]{"dateid", "siteid", "browserid", "type"};
}
return key.getFirstEntity();
}
public void setBrowserid(Long browserid) {
key.setFirstEntity(browserid);
}
@Transient
public DateSiteEntityKey getKey() {
return key;
}
public void setKey(DateSiteEntityKey key) {
}
public Long getSiteid() {
return key.getSiteid();
}
public void setSiteid(Long siteid) {
key.setSiteid(siteid);
}
public Long getDateid() {
return key.getDateid();
}
public void setDateid(Long dateid) {
key.setDateid(dateid);
}
public Long getCount() {
return count;
}
public void setCount(Long count) {
}
}
The L3NewBrowserBean
code is analyzed as follows:
Line 16: Extends the com.fatwire.analytics.domain.AbstractL3Bean
class.
Declare private member variables where:
key
: primary key
count
: number of sessions
@Entity
annotation designates this class as persistent entity thereby making it eligible for use by the JPA services (Line 14). The value of the name attribute is the name of the database table to which the entity should be mapped.
With the @Attribute
annotation (Line 15), the L3_DATEXSITEXNEWBROWSERXCOUNT
table would have the key.firstEntity
attribute of the persistent entity mapped to the BROWSERID
column.
Use the DateSiteEntity
key (Line 20) which is an implementation of an L3
multi-column primary key.
Note :
If you do not wish to use the DateSiteEntity
implementation, then you can use any of the following multi-column primary key implementations:
DateSite
EntityEntitykey
DateSite
EntityStringkey
DateSiteStringkey
where Entity represents a numeric entity.
The choice of implementation will depend solely on the primary key of the table used for storing the contents of the L3
bean.
Use the @Id
annotation (Line 19) to designate DateSiteEntity
key member variable as the entity's primary key.
Use the aggregate annotation to annotate the count field with SumAggregator
. The SumAggregator
annotation is used to sum the session count for each browser. (Line 22)
In the constructor specify the key on the basis of which multiple instances of the bean class will be aggregated (Lines 25-26). The type
signifies the name of the bean class.
Implement getter/setter methods to expose the private member fields (Lines 28-57).
For this report, one Mapper class L3NewBrowserMapper (L3NewBrowserMapper.java)
is required:
Example 4-2 L3NewBrowserMapper.java
package com.fatwire.analytics.report.mapper;
import java.io.IOException;
import org.apache.log4j.Logger;
import com.fatwire.analytics.domain.SessionBean;
import com.fatwire.analytics.domain.l3.L3NewBrowserBean;
import com.fatwire.analytics.mapreduce.AbstractAnalyticsMapper;
import com.fatwire.analytics.mapreduce.AnalyticsOutputCollector;
/**
* L3 mapper on New Browser
*/
public class L3NewBrowserMapper extends AbstractAnalyticsMapper<SessionBean, L3NewBrowserBean> {
/** initialize logging */
private static final Logger logger = Logger.getLogger(L3NewBrowserMapper.class);
public void map(SessionBean input, AnalyticsOutputCollector<L3NewBrowserBean> outputCollector) throws IOException {
if(logger.isTraceEnabled()) {
logger.trace("mapping input bean '"+ input +"' to L3NewBrowserBean");
}
L3NewBrowserBean output = new L3NewBrowserBean();
output.setDateid(input.getDateid());
output.setSiteid(input.getSiteid());
// collect the output bean
}
}
The L3NewBrowserMapper
class code is analyzed as follows:
The L3NewBrowserMapper
class will extend the AbstractAnalyticsMapper
class (Line 11) and override the map method (Lines 16-17).
In the map method, every input SessionBean
is transformed into L3NewBrowserBean
by setting the value of L3NewBrowserBean
from the SessionBean
(Lines 19-25).
The count property of the L3NewBrowserBean
is set to 1L
for every input bean (Line 26).
Every L3NewBrowserBean
created will be collected by the output collector (AnalyticsOutputCollector
) (Line 29).
Add debugging statements (Line 19).
To enable your newly coded beans and mapper classes, add them to the existing processor definitions. Adding a mapper is done by adding the mapper to the spring-mapper.xml
files in the corresponding processor folder.
In this exercise you will be configuring:
L3NewBrowserMapper
L3NewBrowserBean
To add beans and mappers to the processor definitions
Configure L3NewBrowserMapper
:
Open the processors/sesprocessor/spring-mapper.xml
file in a text editor.
Add the bean class line (shown in bold type, below) to the spring- mapper.xml
file:
<bean id="AnalyticsMapperConfigBean" class="java.util.ArrayList"> <constructor-arg> <list> <bean id="clickstreamMapper" class="com.fatwire.analytics.report.mapper.L3ClickstreamMapper"/> <bean id="newBrowserMapper" class="com.fatwire.analytics.report.mapper.L3NewBrowserMapper"/> <bean id="osMapper" class="com.fatwire.analytics.report.mapper.L3OperatingsystemMapper"/> <bean id="sessionEntryidMapper" class="com.fatwire.analytics.report.mapper.L3SessionEntryMapper"/> <bean id="sessionExitidMapper" class="com.fatwire.analytics.report.mapper.L3SessionExitMapper"/> <bean id="ipMapper" class="com.fatwire.analytics.report.mapper.L3IpMapper"/> <bean id="hostnameMapper" class="com.fatwire.analytics.report.mapper.L3HostnameMapper"/> <bean id="jsMapper" class="com.fatwire.analytics.report.mapper.L3JsMapper"/> <bean id="searchengineMapper" class="com.fatwire.analytics.report.mapper.L3SearchengineMapper"/> <bean id="refererMapper" class="com.fatwire.analytics.report.mapper.L3RefererMapper"/> <bean id="screenresMapper" class="com.fatwire.analytics.report.mapper.L3ScreenresMapper"/> <bean id="sessionQuantilMapper" class="com.fatwire.analytics.report.mapper.L3SessionQuantilMapper"/> <bean id="objectDurationMapper" class="com.fatwire.analytics.report.mapper.L3ObjectDurationMapper"/> <bean id="engageMapper" class="com.fatwire.analytics.report.mapper.L3EngageMapper"/> </list> </constructor-arg> </bean>
Configure L3NewBrowserBean:
Open the processor/sesprocessor/spring-combiner_reducer.xml file in a text editor.
Add the entry key line (shown in bold type, below) to the spring- combiner_reducer.xml file:
<util:map id="AnalyticsCombinerReducerConfigBean" map-class="java.util.HashMap">
<entry key="com.fatwire.analytics.domain.l3.L3ClickstreamBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3OperatingsystemBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3NewBrowserBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionEntryBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionExitBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3IpBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3HostnameBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3JsBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3RefererBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3ScreenresBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionQuantilBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3ObjectDurationBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegObjBean" value-ref="analyticsBeanReducer"/>
</util:map>
Follow these steps to configure the database injection:
Open the processors/sesinjection/spring-combiner.xml
file in a text editor.
Add the entry key line (shown in bold type, below) to the spring-combiner.xml snippet:
<util:map id="AnalyticsCombinerConfigBean" map-class="java.util.HashMap">
<entry key="com.fatwire.analytics.domain.l3.L3ClickstreamBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3NewBrowserBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3OperatingsystemBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionEntryBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionExitBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3IpBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3HostnameBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3JsBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3RefererBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3ScreenresBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionQuantilBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3ObjectDurationBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegBean" value-ref="analyticsBeanReducer"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegObjBean" value-ref="analyticsBeanReducer"/>
</util:map>
Open the processors/sesinjection/spring-reducer.xml
files in a text editor.
Add the entry key line (shown in bold type, below) to the spring-reducer.xml
snippet:
<util:map id="AnalyticsReducerConfigBean" map-class="java.util.HashMap">
<entry key="com.fatwire.analytics.domain.l3.L3ClickstreamBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3NewBrowserBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3OperatingsystemBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionEntryBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionExitBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3IpBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3HostnameBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3JsBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3RefererBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3ScreenresBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3SessionQuantilBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3ObjectDurationBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegBean" value-ref="databaseInjection"/>
<entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegObjBean" value-ref="databaseInjection"/>
</util:map>
Once you have developed the new Analytics job, integrate the new job you developed with the existing hadoop-jobs component by recreating a jar file (hadoop-jobs.jar
). Copy the new hadoop-jobs.jar
file to the hadoop-jobs directory. Recreating the jar
file enables the hadoop-jobs component to process the data captured by the new Analytics job you developed in this exercise.
To integrate the new Analytics job with the existing hadoop-jobs component
Create the hadoop-jobs.jar
file.
Replace the existing hadoop-jobs.jar
file, located in the hadoop-jobs installation directory, with the jar file you just created. Then, remove the ._tmp_hadoop-jobs.jar
file, which is the actual jar used by the run command. By deleting this jar, the run command will rebuild it from the new hadoop-jobs.jar
.
Run the hadoop-jobs component in order to process the data captured by the parameter (added in Chapter 2, "Exercise 1: Adding a New Parameter for Data Capture").
Chapter 5, "Exercise 4: Creating and Preparing a Report for Viewing" of this tutorial walks you through the process of creating a new report in the reporting interface. As an example, you will create the "NewBrowsers" report, which displays the number of visitors for each browser.
Note :
The "NewBrowsers" report you will be creating in this tutorial, is a duplicate of the default "Browsers" report in your Analytics installation. For the purposes of this tutorial, the xml file and report name of the "Browsers" report you will be configuring, along with the bean and mapper class names, have been renamed to avoid overwriting the default "Browsers" report.