4
Loading the Clickstream Database

This chapter describes how to execute, monitor, and manage operations related to the Clickstream database. All functionality discussed in this chapter is contained within the Manage tab located on the right side of the Horizontal Navigation bar.

The items listed below represent the links that appear and become active on the Horizontal Navigation bar when you select the Manage tab. In this chapter, you'll learn about:

Database Processes: perform tasks related to the Clickstream database (such as data loading), monitor currently running processes, and view details for specific database processes.
Data Sources: view all data sources in the system, and view details about specific data packets associated with those data sources.

Database Processes

The Database Processes section of this chapter describes how to start, manage, and interpret details about the database processes you execute. This section covers the following topics:

The Process Status Page

The Process Status page is displayed by default when you click the Processes link on the Horizontal Navigation bar. It provides information related to the activity of your database, and is the page from which you can start a database process.

The Process Status page contains the following two headings:

Currently Running: Information displayed under this heading indicates when a database process is currently in progress. If a process is running, descriptive information is displayed; if there is not a currently running process, you can Starting and Stopping a Process.
Previous Runs: lists all completed and failed database processes. From this table, you can view the details about previously-run database processes, and view other types of information from the The Process Navigation Menu.

The status of the Execution Engine is also displayed on the Process Status page. The Execution Engine should run continually in the background; it must be running in order to execute any Clickstream processes. If the Execution Engine is not currently running, a Warning message is displayed at the top of the Process Status page. (No message appears when the Engine is currently running.)

Controlling the Clickstream Daemons

To start or stop the Execution Engine, use the same commands that also apply to the Collector Server. At the command prompt, go to the appropriate directory below.

(UNIX)  CLICK_HOME/click/bin

(Windows)  CLICK_HOME\click\bin

In the expression above, CLICK_HOME is the directory in which Oracle9iAS Clickstream Intelligence is installed. Then, use the start and stop commands below.

clkctl start DATABASE_LOGIN

clkctl stop DATABASE_LOGIN

In the expressions above, DATABASE_LOGIN is equal to the username for the Runtime Administrator schema, which should be entered as clkrt.

To connect to a Clickstream database instance other than the one specified in the click-app.xml file, the following commands must be used with the appropriate variables.


clkctl start DATABASE_LOGIN

clkctl stop DATABASE_LOGIN

In the commands above, DATABASE_LOGIN is equal to username/password@HOST:PORT:SID The username entered must be clkrt, the password corresponds to the one defined for the clkrt schema, the HOST is the name of the machine on which the database is installed, the PORT is the port number of the database TNS listener, and SID corresponds to the Oracle SID.

To abort the Execution Engine (and Collector Server), enter the following command:

clkctl abort DATABASE_LOGIN

where DATABASE_LOGIN uses the appropriate variables described above. This command stops the daemons when a process is hanging or simply taking too long to complete. After using the abort command, you must run execln.sql from the appropriate directory below:

(UNIX)  CLICK_HOME/admin

(Windows)  CLICK_HOME\admin

To view the status of the Clickstream daemons, use the command below:

clkctl status DATABASE_LOGIN

where DATABASE_LOGIN equals the appropriate variables described above.

Starting and Stopping a Process

The Start Process page enables you to select the database process you want to run. It lists the five Clickstream process types, accompanied by a pull-down menu that lists all existing definitions for that process type. If at least one definition does not appear for a particular process, you must first Create a Process Definition. (See Chapter 3, "Configuring Clickstream Intelligence".)

Only one database process can be run at a time. If a process is already running, you must either wait until the process has finished, or stop the currently running process. You can determine if a process is already in progress by checking the "Currently Running" section of the Process Status page. If a process is not already underway, then a new database process can be started.

How to Start a Database Process

To start a database process, follow the steps below:

Click the Start Process button located on the Process Status page.

The Start Process page appears.
Select the radio button for the Process Type you would like to start. (For more information about process types, see the "Process Types" section that follows.)
Use the pull-down menu to select a Process Definition for your chosen process type.
Click Start.

The process is started by the Execution Engine. The Process Status page displays the process status as "Running" under the Currently Running heading.

Stopping a Database Process

To stop a currently running process, click the Stop button on the Process Status page. Because the process actually stops only after all currently running jobs are executed, the process may appear to remain in the Stopping state for an extended period of time.

When the Stopped execution state is finally reached, however, the process is still not entirely finished- you must click the Undo button to completely terminate the Stopped process.

Show Details for a Previous Run

The Previous Runs section of the Process Status page displays all completed and failed database processes. To view the details for a previously-run process definition, follow the steps below:

Select the Manage tab.

The Process Status page appears (by default).
Go to the Previous Runs heading and select the radio button beside the process for which you want to view details.
Click Show Details.

The Process Details page displays information about the process, such as process type, definition, start date, status, and warehouse version.
To return to the Process Status page, click the Processes link on the Horizontal Navigation bar.

Process Types

You can define definitions for five types of database processes (see "Create a Process Definition" in Chapter 3.) To start a process, use the pull-down menu to select a particular process definition for one of the five available process types. The sections that follow describe each process type.

Load Clickstream

The Load Clickstream process enables the transfer and storage of Web log data into the Clickstream database. When you start a Load Clickstream process, the Clickstream Loader begins processing and transforming Web log data and then loads it into the database.

Note:

When a Load Clickstream process is executed, the interface tables are first truncated (emptied) before Web log data is loaded into the Clickstream Intelligence database. Therefore, if you have loaded external (non-Web) data into the interface tables, a Load Dimensions process must be run immediately thereafter to ensure that interface table data is loaded into the database dimensions.

If external data is loaded into the interface tables and then a Load Clickstream process (instead of a Load Dimensions process) is executed, all interface table data will be lost when the interface tables are emptied.

Load Dimensions

The Load Dimensions process transfers existing interface table data into the database levels, which are used to populate the dimensions. This process does not load data into the interface tables - the user must load external or non-Web data into the interface tables before running the Load Dimensions process.

Refresh Summaries

Summary Refresh updates the Summary Layer with the most current version of data in the database. It may be useful to note that you can automatically refresh summaries as part of the Load Clickstream process when the "Refresh Summaries" option is enabled. For more information, see the "Refresh Summaries" section of Chapter 3, "Configuring Clickstream Intelligence".

Resolve Unknown IP Addresses

The Resolve Unknown IP Address process operates only on the Client Host dimension. This process is typically performed when the Web log contains the IP address of the host, but does not include the resolved name. To resolve the IP address, the Resolve Unknown IP Addresses process queries the DNS server (in a reverse DNS lookup) to determine the user-friendly hostname from the client's numerical IP address.

Restore a Previous Version

The Restore a Previous Version process enables you to undo any changes to data that have been made since a given point in time. The database version number to which you want to roll back is specified in the "Version after Rollback" field when you initially create a process definition of this type. For example, if you run a definition that indicates "2" as the database version after roll back, running the Restore a Previous Version process will return the database to the version previously labeled "2."

Process Execution States

Status information about currently running processes is useful from both a tracking and a troubleshooting standpoint. For a process that is "Currently Running" on the Process Status page, you can click the Show Details button to view and track general process information via the Process Details page.

When a running process displays a status indicative of a problem (such as Error or Failed status), you can troubleshoot the problem by viewing Process Details and other information available from The Process Navigation Menu. Troubleshooting typically involves viewing Process Messages and analyzing details about Subprocesses (and the jobs of which they are comprised.)

Consider the following scenario in which a Load Clickstream process is running. A problem arises that causes the process to stop. The status of this process changes from Running to Error on the Process Status page, and three buttons are displayed:

Show Details: Displays the Process Details page.

This page is the first option on the Process Navigation menu and provides general information about the process, such as its name and definition. You can use the other options on The Process Navigation Menu to further investigate the process error (as described below.)
Undo: Cancels the process and returns the database to its original state (just before the process started).

The process is listed under the "Previous Runs" heading on the Process Status page with the status Failed. To view process information and access the Process Navigation menu, click the Show Details button.
Resume: After the problem causing the process error is fixed or removed, click this button to continue the process from the point at which it originally stopped.

Depending on the option selected above, one of several execution states may ultimately result. Completed or Failed status indicates that execution for a given process has finished. An intermediate state, such as Error, indicates a process that has temporarily stopped, but must ultimately Resume or be cancelled via the Undo button.

Note:

A process may also reach the Unrecoverable execution state when a user restarts the Execution Engine while a process is running, or if a job loses its database connection during execution. When the Unrecoverable process status is displayed, the user must click the Undo button before any new database processes can be run.

The following sections outline typical process execution states and scenarios that may be encountered when running a database process.

A Successful Process

Click the Start button to begin a process - status is Starting.
The Execution Engine begins process execution - status is Running.
The process finishes successfully - status is Completed.

Stop a Process and Resume

Click the Start button to begin a process - status is Starting.
The Execution Engine begins process execution - status is Running.
Click the Stop button - status is Stopping.
The Execution Engine stops process execution - status is Stopped.
Click the Resume button - status is Starting once again.

Stop a Process and Undo

Click the Start button to begin a process - status is Starting.
The Execution Engine begins process execution - status is Running.
Click the Stop button - status is Stopping.
The Execution Engine stops process execution - status is Stopped.
Click the Undo button - status is Undoing.
Process status changes to Failed - you can now start a new process.

Resume a Process with an Error

Click the Start button to begin a process - status is Starting.
The Execution Engine begins process execution - status is Running.
An error occurs during execution and execution is temporarily stopped - status is Error.
Click the Resume button - status is Starting once again.
If previous errors were fixed - process reaches Completed status.

If errors were not fixed - process reaches Error status again.

Undo a Process with an Error

Click the Start button to begin a process - status is Starting.
The Execution Engine begins process execution - status is Running.
An error occurs during execution and execution is temporarily stopped - status is Error.
Click the Undo button - status is Undoing.
Process status changes to Failed - you can now start a process once again (see "Starting and Stopping a Process").

The Process Navigation Menu

The Process Navigation menu is displayed whenever you view specific information about a process. It is used to view information about a particular process- its details, definition, messages, subprocesses and jobs, and data packets details.

The following options are accessible from the Process Navigation menu:

Details
Messages
Definition
Subprocesses
Data Packets

The following sections describe each item displayed on the Process Navigation menu.

Process Details

The Process Details page provides additional information about a process. For a given process, the following information is displayed:

Type: the name of the process type
Process Definition: the name of the specific definition for the process type indicated above
Start Date: date and time that the process began
End Date: the date and time at which the process reached completion
Elapsed Time: amount of time needed to execute the process from start to finish, equal to (End Time - Start Time)
Status: Completed, Failed, Error
Warehouse Version: the database version label to which this process belongs

Process Messages

The Messages option on the Process Navigation menu provides information about the jobs that have been run for a process. The messages displayed typically indicate job errors, and can be used for troubleshooting a process that has failed or has an error.

The Process Messages page displays:

Process Messages: Displays the (error) messages generated while the process was underway.
Failed Job Messages: Lists the jobs that failed, accompanied by a date, time, and error message. To view job details, click the link for any job on the list.

Process Definition

The Definition menu option displays the parameters (and values) that characterize the process. This information can be modified from the Process Definitions page, located under the Configure tab.

Subprocesses

The Subprocesses page lists all subprocesses that were executed for a particular process. To view the jobs the comprise a particular subprocess, select the radio button beside the subprocess and click Show Details.

Subprocess Details

This page lists all jobs that comprise a given subprocess. For each job, additional statistics are displayed, such as the Job Type, Start Date, End Date, and Status. To drill down into the details of any given job, click the Show Details button.

Job Details

The Job Details page provides the most granular data about a database process - details about a specific job that was executed in the database. The job name is displayed with the actual command that was used to start the execution of the job and the job messages (if any) that were generated. The information provided on the Job Details page is typically used for detailed analysis of database processes, perhaps when a process fails or has an error.

Process Data Packets

Data packets consist of Web log files that are grouped together and compressed by the Collector Agent (installed on the Web server). Formation of data packets facilitates transmission from the Web server to the Clickstream Collector Server.

The Process Data Packets Details page displays specific parameters associated with the data packets for a given file. Information such as File Name, Status, Date Created, Lines, Lines Loaded, Lines Rejected, and Lines Discarded may be displayed for any data packet associated with a given data source.

Delete Data Packets

To delete data packets from the Process Data Packets page, select the radio button for the packets you want to permanently remove and click Delete. Deletion of data packets is permanent. Periodic deletion of data packets is recommended, as it creates free (unused) disk space on the Collector Server host.

Data Sources

The Data Sources link on the Horizontal Navigation bar provides access to the Data Sources main page. This page displays data sources for all sites in your system. To view all data packets associated with any source listed on the Data Sources page, select the appropriate radio button and click Show Data Packets.

Data Packets

The Data Packets main page lists, in tabular format, all data packets that have been downloaded by the Collector Server.

Note:

All data packets are listed for a data source, even if the packets have not been loaded into the database.

From the Data Packets page, you can:

Show Details - The Data Packets Details Page displays information about a particular data packets file.
Delete Data Packets - A Delete Confirmation appears. If you are sure you want to delete the data packets, click Delete.

The Data Packets Details Page

When you select a data source from the Data Sources main page and click the Show Data Packets button, the Data Packets page appears with a listing of all data packets that have been downloaded for that specific data source.

To view details about an item listed on the Data Packets page, locate the data packets file name and click the corresponding Show Details link. The Data Packets Details page appears with the following information:

Date Created: indicates when the data packet was downloaded from the Web server.
File Name: the name of the data packet, as it appears on the Collector Server.
Status: describes the current state of the data packets, as outlined below.
- Loading or Loaded - this status may appear if a load process is currently running
- Completed - the packet was successfully loaded into the database and the rest of the "Load Clickstream"process completed successfully.
- Load Failed - the packet was not loaded into the database due to format problems or other errors. Consult the Process Messages page for details about the cause of the packet load failure.
- Transfer Failed - indicates a problem downloading the data packet to the Collector Server.
- Transferred - the packet was successfully downloaded from the Web server to the Collector Server.
Lines Loaded: the number of lines loaded from the Collector Server to the Clickstream database.
Lines Rejected: the number of lines not loaded from the Collector Server because the lines were corrupt, poorly formatted, or in the wrong data format. (The number of lines rejected does not include lines that were discarded due to filtering.)
Lines Discarded: the number of lines thrown out due to filtering only.
Details: click the Show Details link to view all infomation about the data packet from the Data Packets Details page.

4 Loading the Clickstream Database

Database Processes

The Process Status Page

Controlling the Clickstream Daemons

Starting and Stopping a Process

How to Start a Database Process

Stopping a Database Process

Show Details for a Previous Run

Process Types

Load Clickstream

Load Dimensions

Refresh Summaries

Resolve Unknown IP Addresses

Restore a Previous Version

Process Execution States

A Successful Process

Stop a Process and Resume

Stop a Process and Undo

Resume a Process with an Error

Undo a Process with an Error

The Process Navigation Menu

Process Details

Process Messages

Process Definition

Subprocesses

Subprocess Details

Job Details

Process Data Packets

Delete Data Packets

Data Sources

Data Packets

The Data Packets Details Page

4
Loading the Clickstream Database