4 Exercise 3: Developing a New Analytics Job

In order for Analytics to process raw data for a custom report, you must develop a new Analytics job to process that data. The processed data will be inserted into the database.

Note :

This exercise walks you through the process of developing a new Analytics job, for the parameter you added (in Exercise 1) to your Analytics installation. The Analytics job you will be developing is a duplicate of the default Analytics job in your Analytics installation. For the purposes of this tutorial, all bean and mapper class names used to develop the Analytics job in this exercise have been renamed to avoid overwriting the default Analytics job.

This exercise contains the following sections:

4.1 Overview for Creating an Analytics Job

The following is a brief description of the steps that are required for developing an Analytics job.

  1. Select the most appropriate location as the input:

    Select a location from the folders within the analytics directory, which must be used by the Analytics job as input data for the report. The criteria for choosing a particular location as an input depends primarily on the data required for a particular report. In most instances, the following location is sufficient for satisfying the data requirements for your report.

    Sesdata: This location stores the data for all sessions, along with the details of the visitor, object impressions, and other relevant information associated with a session.

    Note :

    When creating a custom report, do not use the injected folders (such as oiinjected, sesinjected, or visinjected) as your input location(s), because the data stored within these locations is used by the database injection processor to store into the database.

    For more information about locations and processors, and which type of data is stored in each location, see the Oracle Fusion Middleware WebCenter Sites: Analytics Administrator's Guide.

  2. Extend the schema:

    To store the processed data, you will add a new L3 table to the Analytics database.

  3. Create a new bean class:

    The main purpose of the bean class is to store the output data. The framework will use the bean class to store the final output of the job. The database injection processors will take the data stored within each bean class and insert the data into the L3 table (created in the previous step).

    The new bean class extends the pre-defined classes provided by the framework. Implementation details are explained in the following sections: Section 4.1.1, "Example for Developing a New Analytics Job" and Section 4.1.2, "Developing an Analytics Job for the 'New Browsers' Report."

  4. Create a new Mapper class:

    The Mapper class will encapsulate the business logic to process the input data. For every input bean, the Mapper class will create a new instance of the bean class (created in step 3), and then store the processed data in the newly created instance. The output of the Mapper class will be collected by the Analytics framework and further processed before it is finally written to the designated output location.

    The new Mapper class will extend the pre-defined classes provided by the framework. The implementation details will be explained in the next section.

  5. Configuring the processor.

    To integrate your newly coded beans and Mapper classes with the Analytics framework, you need to add them to the existing processor configuration file (.xml file).

Figure 4-1 depicts the job execution flow with the custom mapper integrated into the Analytics-Hadoop job framework.

Figure 4-1 Execution flow chart with custom mapper integrated with the analytics hadoop job framework

Description of Figure 4-1 follows
Description of "Figure 4-1 Execution flow chart with custom mapper integrated with the analytics hadoop job framework "

In the Map Phase, the processor will read the data from the input location. This data will be passed to the custom mapper. The custom mapper will transform every input bean to custom bean.

In the Reduce Phase, the data collected from the Custom Mapper will be further processed before it is written to the output location. The content of the output location will contain the Custom Bean. This aggregated data will be stored into the database by the database injection processor.

4.1.1 Example for Developing a New Analytics Job

This section contains the following topics:

4.1.1.1 Designing the 'NewBrowsers' Report

The "NewBrowsers" report identifies the browsers that visitors used to access a given site's page view within the reported time period.

Counting browsers is simply counting the sessions for each browser. We simply sum up the aggregated values on sessions. Aggregating sums for all possible time ranges that users can select results in too many aggregated values, so we concentrate on daily sums.

Data aggregation is done in a single Map-Reduce phase. (1) The starting point is the session data (SesData location). You will use this data to generate intermediate/uncompressed raw data stored as L3NewBrowserBean objects in the SesProcessed location. (2) The SessionBean objects are then used by the SessionProcessor processor to create L3NewBrowserBean objects, which store aggregated data that can be inserted into the database. The implementation is illustrated in Figure 4-2.

Figure 4-2 "NewBrowsers" Report Implementation

Description of Figure 4-2 follows
Description of "Figure 4-2 "NewBrowsers" Report Implementation "

  • The SesData location has all data on all sessions. You will use this data as the starting point for generating your report data. There is no need to add to or modify that data; you can use the existing SessionBean objects.

  • Extend the SessionProcessor processor to create a new L3NewBrowserBean object for each combination of siteid, dateid, browserid, siteid/dateid/browserid combination. This object will be stored by the SessionProcessor processor in the Sesinjected location.

  • The SessionInjection processor inserts the data into the database. A new bean is not required, but for proper insertion into the database, make sure that you have properly annotated the fields of the L3 Bean (created in the SessionProcessor) with theAggregator.

    Note :

    The Aggregator is a java annotation used to tag fields of a bean that can be aggregated. If you wish to use an Aggregator other than SumAggregator, then you can use any of the following Aggregators listed:

    • AvgAggregator

    • CountAggregator

    • DistinctCountAggregator

    • MaxAggregator

    • NullAggregator

    • MinAggregator

4.1.2 Developing an Analytics Job for the 'New Browsers' Report

Follow the general steps, as described in Section 4.1, "Overview for Creating an Analytics Job," for developing an Analytics Job.

Section 4.1.2.1, "Step 1: Select the Input Location"

Section 4.1.2.2, "Step 2: Extend the Schema"

Section 4.1.2.3, "Step 3: Create the Beans"

Section 4.1.2.4, "Step 4: Create the Mapper Classes"

Section 4.1.2.5, "Step 5: Adding Beans and Mappers to the Processor Definitions"

4.1.2.1 Step 1: Select the Input Location

In the "NewBrowsers" report you are aggregating data on a session, so the input location should be the sesdata location.

4.1.2.2 Step 2: Extend the Schema

To store the data, you will need to add a new table to store the pre-aggregated L3 data. Execute the following SQL statement as the analytics user:

CREATE TABLE L3_DATEXSITEXNEWBROWSERXCOUNT ( DATEID NUMBER NOT NULL , SITEID NUMBER(6) , BROWSERID NUMBER , COUNT NUMBER);commit; 

4.1.2.3 Step 3: Create the Beans

For this report, one bean is required:

L3NewBrowserBean - This class is mapped to the L3 table in the database. The L3NewBrowserBean is used to store the aggregated count of browser visits. The content of this bean will be injected into the L3_DATEXSITEXNEWBROWSERXCOUNT table.

Create the L3NewBrowserBean bean class. For sample code, see the example in Section 4.1.2.3, "L3NewBrowserBean.java."

Example 4-1 L3NewBrowserBean.java

  1. package com.fatwire.analytics.domain.l3;

  2. import javax.persistence.AttributeOverride;

  3. import javax.persistence.Column;

  4. import javax.persistence.Entity;

  5. import javax.persistence.Id;

  6. import javax.persistence.Transient;

  7. import com.fatwire.analytics.domain.AbstractL3Bean;

  8. import com.fatwire.analytics.domain.annotation.Aggregate;

  9. import com.fatwire.analytics.domain.annotation.SumAggregator;

  10. import com.fatwire.analytics.domain.key.DateSiteEntityKey;

  11. /**

  12. * this class represents entries in the L3_dateXsiteXnewbrowserXcount table

  13. */

  14. @Entity(name="L3_dateXsiteXnewbrowserXcount")

  15. @AttributeOverride(name = "key.firstEntity", column = @Column(name = "BROWSERID", nullable = false, insertable = false, updatable = false))

  16. public class L3NewBrowserBean extends AbstractL3Bean<DateSiteEntityKey> {

  17. private static final long serialVersionUID = 1L;

  18. /** the primary key definition of the object/table */

  19. @Id

  20. private DateSiteEntityKey key = new DateSiteEntityKey();

  21. /** the count value for the entity */

  22. @Aggregate(aggregatorClass = SumAggregator.class)

  23. private Long count;

  24. /** overwrite constructor to set the hadoop keys

  25. public L3NewBrowserBean() {

  26. keys = new String[]{"dateid", "siteid", "browserid", "type"};

  27. }

  28. public Long getBrowserid() {

  29. return key.getFirstEntity();

  30. }

  31. public void setBrowserid(Long browserid) {

  32. key.setFirstEntity(browserid);

  33. }

  34. @Transient

  35. public DateSiteEntityKey getKey() {

  36. return key;

  37. }

  38. public void setKey(DateSiteEntityKey key) {

  39. this.key = key;

  40. }

  41. public Long getSiteid() {

  42. return key.getSiteid();

  43. }

  44. public void setSiteid(Long siteid) {

  45. key.setSiteid(siteid);

  46. }

  47. public Long getDateid() {

  48. return key.getDateid();

  49. }

  50. public void setDateid(Long dateid) {

  51. key.setDateid(dateid);

  52. }

  53. public Long getCount() {

  54. return count;

  55. }

  56. public void setCount(Long count) {

  57. this.count = count;

  58. }

  59. }

The L3NewBrowserBean code is analyzed as follows:

  • Lines 2-10: Import all the required packages

  • Line 16: Extends the com.fatwire.analytics.domain.AbstractL3Bean class.

  • Lines 20-23:

    Declare private member variables where:

    • key: primary key

    • count: number of sessions

  • @Entity annotation designates this class as persistent entity thereby making it eligible for use by the JPA services (Line 14). The value of the name attribute is the name of the database table to which the entity should be mapped.

  • With the @Attribute annotation (Line 15), the L3_DATEXSITEXNEWBROWSERXCOUNT table would have the key.firstEntity attribute of the persistent entity mapped to the BROWSERID column.

  • Use the DateSiteEntity key (Line 20) which is an implementation of an L3 multi-column primary key.

    Note :

    If you do not wish to use the DateSiteEntity implementation, then you can use any of the following multi-column primary key implementations:

    • DateSiteEntityEntitykey

    • DateSiteEntityStringkey

    • DateSiteStringkey

    where Entity represents a numeric entity.

    The choice of implementation will depend solely on the primary key of the table used for storing the contents of the L3 bean.

  • Use the @Id annotation (Line 19) to designate DateSiteEntity key member variable as the entity's primary key.

  • Use the aggregate annotation to annotate the count field with SumAggregator. The SumAggregator annotation is used to sum the session count for each browser. (Line 22)

  • In the constructor specify the key on the basis of which multiple instances of the bean class will be aggregated (Lines 25-26). The type signifies the name of the bean class.

  • Implement getter/setter methods to expose the private member fields (Lines 28-57).

4.1.2.4 Step 4: Create the Mapper Classes

For this report, one Mapper class L3NewBrowserMapper (L3NewBrowserMapper.java) is required:

Example 4-2 L3NewBrowserMapper.java

  1. package com.fatwire.analytics.report.mapper;

  2. import java.io.IOException;

  3. import org.apache.log4j.Logger;

  4. import com.fatwire.analytics.domain.SessionBean;

  5. import com.fatwire.analytics.domain.l3.L3NewBrowserBean;

  6. import com.fatwire.analytics.mapreduce.AbstractAnalyticsMapper;

  7. import com.fatwire.analytics.mapreduce.AnalyticsOutputCollector;

  8. /**

  9. * L3 mapper on New Browser

  10. */

  11. public class L3NewBrowserMapper extends AbstractAnalyticsMapper<SessionBean, L3NewBrowserBean> {

  12. /** initialize logging */

  13. private static final Logger logger = Logger.getLogger(L3NewBrowserMapper.class);

  14. @Override

  15. public void map(SessionBean input, AnalyticsOutputCollector<L3NewBrowserBean> outputCollector) throws IOException {

  16. if(logger.isTraceEnabled()) {

  17. logger.trace("mapping input bean '"+ input +"' to L3NewBrowserBean");

  18. }

  19. L3NewBrowserBean output = new L3NewBrowserBean();

  20. output.setDateid(input.getDateid());

  21. output.setSiteid(input.getSiteid());

  22. output.setBrowserid(input.getBrowserid());

  23. output.setCount(1L);

  24. // collect the output bean

  25. outputCollector.collect(output);

  26. }

  27. }

The L3NewBrowserMapper class code is analyzed as follows:

  • The L3NewBrowserMapper class will extend the AbstractAnalyticsMapper class (Line 11) and override the map method (Lines 16-17).

  • In the map method, every input SessionBean is transformed into L3NewBrowserBean by setting the value of L3NewBrowserBean from the SessionBean (Lines 19-25).

  • The count property of the L3NewBrowserBean is set to 1L for every input bean (Line 26).

  • Every L3NewBrowserBean created will be collected by the output collector (AnalyticsOutputCollector) (Line 29).

  • Add debugging statements (Line 19).

4.1.2.5 Step 5: Adding Beans and Mappers to the Processor Definitions

To enable your newly coded beans and mapper classes, add them to the existing processor definitions. Adding a mapper is done by adding the mapper to the spring-mapper.xml files in the corresponding processor folder.

In this exercise you will be configuring:

  • L3NewBrowserMapper

  • L3NewBrowserBean

To add beans and mappers to the processor definitions

  1. Configure L3NewBrowserMapper:

    1. Open the processors/sesprocessor/spring-mapper.xml file in a text editor.

    2. Add the bean class line (shown in bold type, below) to the spring- mapper.xml file:

      <bean id="AnalyticsMapperConfigBean" class="java.util.ArrayList">
          <constructor-arg>
            <list>
             <bean id="clickstreamMapper" class="com.fatwire.analytics.report.mapper.L3ClickstreamMapper"/>
      <bean id="newBrowserMapper" class="com.fatwire.analytics.report.mapper.L3NewBrowserMapper"/>
             <bean id="osMapper" class="com.fatwire.analytics.report.mapper.L3OperatingsystemMapper"/>
             <bean id="sessionEntryidMapper" class="com.fatwire.analytics.report.mapper.L3SessionEntryMapper"/>
             <bean id="sessionExitidMapper" class="com.fatwire.analytics.report.mapper.L3SessionExitMapper"/>
             <bean id="ipMapper" class="com.fatwire.analytics.report.mapper.L3IpMapper"/>
             <bean id="hostnameMapper" class="com.fatwire.analytics.report.mapper.L3HostnameMapper"/>
             <bean id="jsMapper" class="com.fatwire.analytics.report.mapper.L3JsMapper"/>
             <bean id="searchengineMapper" class="com.fatwire.analytics.report.mapper.L3SearchengineMapper"/>
             <bean id="refererMapper" class="com.fatwire.analytics.report.mapper.L3RefererMapper"/>
             <bean id="screenresMapper" class="com.fatwire.analytics.report.mapper.L3ScreenresMapper"/>
      
             <bean id="sessionQuantilMapper" class="com.fatwire.analytics.report.mapper.L3SessionQuantilMapper"/>
      
             <bean id="objectDurationMapper" class="com.fatwire.analytics.report.mapper.L3ObjectDurationMapper"/>
      
             <bean id="engageMapper" class="com.fatwire.analytics.report.mapper.L3EngageMapper"/>
            </list>
          </constructor-arg>
         </bean>
      
  2. Configure L3NewBrowserBean:

    1. Open the processor/sesprocessor/spring-combiner_reducer.xml file in a text editor.

    2. Add the entry key line (shown in bold type, below) to the spring- combiner_reducer.xml file:

      <util:map id="AnalyticsCombinerReducerConfigBean" map-class="java.util.HashMap">
      <entry key="com.fatwire.analytics.domain.l3.L3ClickstreamBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3OperatingsystemBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3NewBrowserBean" value-ref="analyticsBeanReducer"/> 
      
      <entry key="com.fatwire.analytics.domain.l3.L3SessionEntryBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3SessionExitBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3IpBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3HostnameBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3JsBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3RefererBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3ScreenresBean" value-ref="analyticsBeanReducer"/>
      
      <entry key="com.fatwire.analytics.domain.l3.L3SessionQuantilBean" value-ref="analyticsBeanReducer"/>
      
      <entry key="com.fatwire.analytics.domain.l3.L3ObjectDurationBean" value-ref="analyticsBeanReducer"/>
      
      <entry key="com.fatwire.analytics.domain.l3.L3EngageRecBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegBean" value-ref="analyticsBeanReducer"/>
      <entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegObjBean" value-ref="analyticsBeanReducer"/>
      </util:map>
      

4.1.3 Configuring Database Injection

Follow these steps to configure the database injection:

  1. Open the processors/sesinjection/spring-combiner.xml file in a text editor.

  2. Add the entry key line (shown in bold type, below) to the spring-combiner.xml snippet:

    <util:map id="AnalyticsCombinerConfigBean" map-class="java.util.HashMap">
        <entry key="com.fatwire.analytics.domain.l3.L3ClickstreamBean" value-ref="analyticsBeanReducer"/>
        <entry key="com.fatwire.analytics.domain.l3.L3NewBrowserBean" value-ref="analyticsBeanReducer"/> 
        <entry key="com.fatwire.analytics.domain.l3.L3OperatingsystemBean" value-ref="analyticsBeanReducer"/>
        <entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="analyticsBeanReducer"/>
    
    <entry key="com.fatwire.analytics.domain.l3.L3SessionEntryBean" value-ref="analyticsBeanReducer"/>
    <entry key="com.fatwire.analytics.domain.l3.L3SessionExitBean" value-ref="analyticsBeanReducer"/>
    <entry key="com.fatwire.analytics.domain.l3.L3IpBean" value-ref="analyticsBeanReducer"/>
    <entry key="com.fatwire.analytics.domain.l3.L3HostnameBean" value-ref="analyticsBeanReducer"/>
    <entry key="com.fatwire.analytics.domain.l3.L3JsBean" value-ref="analyticsBeanReducer"/>
    <entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="analyticsBeanReducer"/>
    <entry key="com.fatwire.analytics.domain.l3.L3RefererBean" value-ref="analyticsBeanReducer"/>
        <entry key="com.fatwire.analytics.domain.l3.L3ScreenresBean" value-ref="analyticsBeanReducer"/>
    
        <entry key="com.fatwire.analytics.domain.l3.L3SessionQuantilBean" value-ref="analyticsBeanReducer"/>
    
        <entry key="com.fatwire.analytics.domain.l3.L3ObjectDurationBean" value-ref="analyticsBeanReducer"/>
    
        <entry key="com.fatwire.analytics.domain.l3.L3EngageRecBean" value-ref="analyticsBeanReducer"/>
    <entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegBean" value-ref="analyticsBeanReducer"/>
    <entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegObjBean" value-ref="analyticsBeanReducer"/>
    </util:map>
    
  3. Open the processors/sesinjection/spring-reducer.xml files in a text editor.

  4. Add the entry key line (shown in bold type, below) to the spring-reducer.xml snippet:

    <util:map id="AnalyticsReducerConfigBean" map-class="java.util.HashMap">
        <entry key="com.fatwire.analytics.domain.l3.L3ClickstreamBean" value-ref="databaseInjection"/>
        <entry key="com.fatwire.analytics.domain.l3.L3NewBrowserBean" value-ref="databaseInjection"/> 
        <entry key="com.fatwire.analytics.domain.l3.L3OperatingsystemBean" value-ref="databaseInjection"/>
        <entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="databaseInjection"/>
    
        <entry key="com.fatwire.analytics.domain.l3.L3SessionEntryBean" value-ref="databaseInjection"/>
        <entry key="com.fatwire.analytics.domain.l3.L3SessionExitBean" value-ref="databaseInjection"/>
        <entry key="com.fatwire.analytics.domain.l3.L3IpBean" value-ref="databaseInjection"/>
        <entry key="com.fatwire.analytics.domain.l3.L3HostnameBean" value-ref="databaseInjection"/>
        <entry key="com.fatwire.analytics.domain.l3.L3JsBean" value-ref="databaseInjection"/>
        <entry key="com.fatwire.analytics.domain.l3.L3SearchengineBean" value-ref="databaseInjection"/>
        <entry key="com.fatwire.analytics.domain.l3.L3RefererBean" value-ref="databaseInjection"/>
            <entry key="com.fatwire.analytics.domain.l3.L3ScreenresBean" value-ref="databaseInjection"/>
    
            <entry key="com.fatwire.analytics.domain.l3.L3SessionQuantilBean" value-ref="databaseInjection"/>
    
            <entry key="com.fatwire.analytics.domain.l3.L3ObjectDurationBean" value-ref="databaseInjection"/>
    
            <entry key="com.fatwire.analytics.domain.l3.L3EngageRecBean" value-ref="databaseInjection"/>
           <entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegBean" value-ref="databaseInjection"/>
          <entry key="com.fatwire.analytics.domain.l3.L3EngageRecSegObjBean" value-ref="databaseInjection"/>
          </util:map>
    

4.1.4 Integrating the New Analytics Job with the Existing Hadoop-Jobs Component

Once you have developed the new Analytics job, integrate the new job you developed with the existing hadoop-jobs component by recreating a jar file (hadoop-jobs.jar). Copy the new hadoop-jobs.jar file to the hadoop-jobs directory. Recreating the jar file enables the hadoop-jobs component to process the data captured by the new Analytics job you developed in this exercise.

To integrate the new Analytics job with the existing hadoop-jobs component

  1. Create the hadoop-jobs.jar file.

  2. Replace the existing hadoop-jobs.jar file, located in the hadoop-jobs installation directory, with the jar file you just created. Then, remove the ._tmp_hadoop-jobs.jar file, which is the actual jar used by the run command. By deleting this jar, the run command will rebuild it from the new hadoop-jobs.jar.

  3. Run the hadoop-jobs component in order to process the data captured by the parameter (added in Chapter 2, "Exercise 1: Adding a New Parameter for Data Capture").

4.2 Next Steps

Chapter 5, "Exercise 4: Creating and Preparing a Report for Viewing" of this tutorial walks you through the process of creating a new report in the reporting interface. As an example, you will create the "NewBrowsers" report, which displays the number of visitors for each browser.

Note :

The "NewBrowsers" report you will be creating in this tutorial, is a duplicate of the default "Browsers" report in your Analytics installation. For the purposes of this tutorial, the xml file and report name of the "Browsers" report you will be configuring, along with the bean and mapper class names, have been renamed to avoid overwriting the default "Browsers" report.