4 Configuring Enterprise Data Quality with Oracle WebLogic Server

This chapter describes how to create the EDQ database repository, create an Oracle WebLogic Server domain for EDQ, and start Oracle WebLogic Server.

Note:

These instructions apply to Oracle WebLogic Server environments only. If you are using Apache Tomcat, you must follow the directions in Chapter 5, "Configuring Enterprise Data Quality with Apache Tomcat."

This chapter includes the following sections:

4.1 Prerequisites for these Procedures

Before performing the procedures in this section, you must first read and satisfy the steps in:

4.2 Creating an EDQ Database Repository

EDQ makes use of some database schemas. These schemas are the configuration schema (EDQCONFIG), the results schema (EDQRESULTS), and the staging schema (EDQSTAGING). You create them with the Oracle Repository Creation Utility (RCU).

The person who runs RCU must be able to log into the database with DBA privileges. If you cannot run with DBA privileges, RCU can create a script for a DBA to run later.

Note:

Do not use RCU to upgrade EDQ; use the instructions in Section 8, "Upgrading Enterprise Data Quality."

To run RCU:

  1. Make certain the repository database is running.

  2. Run the command shell or console of the operating system.

  3. Start RCU from the FMW_HOME/oracle_common/bin directory, where FMW_HOME is the Oracle Fusion Middleware installation directory.

    On Linux:

    ./rcu

    On Windows:

    rcu.bat

Complete the RCU configuration screens by following the instructions in Table 4-1.

Table 4-1 Running the RCU Program

Screen Action to Perform

Welcome

Click Next to proceed with the installation.

You may cancel the installation at any time by clicking Exit.

Create Repository

Click Next to continue.

This uses the default Create Repository and System Load and Product Load options. This requires the person running RCU to have DBA privileges.

Database Connection Details

Select Oracle Database from the Database Type list.

Specify the host name where your Oracle database is running.

Enter the port number for your database. The default port number for Oracle Database is 1521.

Specify the service name for the database. Typically, the service name is the same as the global database name. For example, orcl.example.com.
Note that if you are using release 12c of the Oracle Database, you need to ensure that you enter the connection details of a pluggable database. For example, pdborcl.example.com.

Enter the user name for your database. The user name could either be SYS or that of the user with DBA privileges.

Enter the password for your database user.

Select SYSDBA from the Role: list. This is automatically selected when the user is SYS. Select Normal from the Role: list if you are a user with DBA privileges.

Click Next to continue.

Checking Global Prerequisites

When the prerequisites checking progress has reached completion, click OK to continue.

Select Components

Select Create new prefix and enter a unique prefix name for all the database schemas you are creating in this session. For example, edqprod or the default of DEV.

Select the Oracle AS Repository Components check box. The Oracle EDQ check boxes that create the EDQ configuration, results, and staging schemas in the database repository are then automatically selected.

When you select the Oracle EDQ component, all the necessary schemas are installed. Not only will the EDQ schema be selected but the common schema needed to support audit and OPSS, which EDQ is integrated with will also be selected. This ensure that EDQ will function correctly with these integrated components.

The three EDQ schemas that are to be installed are Config, Results and Staging. The Staging schema is also used by CDS, with or without the Fusion Connector. However installing the Staging schema when not necessary will not interfere with operation and will consume minimal database resources.

Click Next to continue.

Checking Component Prerequisites

When the prerequisites checking progress has reached completion, click OK to continue.

Schema Passwords

Ensure that Use same passwords for all schemas is selected.

Enter the password that you want to use for all of the EDQ database schemas in all password fields, then click Next to continue.

Map Tablespaces

The default EDQ tablespaces that will be created by RCU are displayed by component.

You can change the tablespaces by clicking Manage Tablespaces and then modifying the information. Oracle recommends using one user tablespace for the EDQCONFIG schema, and a different user tablespace for the EDQRESULTS schema. The recommended minimum sizes (can be adjusted later) are:

  • EDQRESULTS: 120GB tablespace (4 x 30GB files)

  • EDQCONFIG: 30GB tablespace (1 x 30GB file)

  • EDQSTAGING: 30GB tablespace (1 x 30GB file)

Click Next to continue.

Validating and Creating Tablespaces

Click OK to create any non-existent tablespaces in your schema, then click OK when the operation completes.

Summary

Review the database details, then click Create to continue.

A status screen is displayed that shows the progress of creating the repository components.

Completion Summary

Click Close to exit the RCU program.


4.3 Creating the WebLogic Server EDQ Domain

These instructions use the Configuration Wizard to create a Basic WebLogic Server domain for EDQ, with the following:

  • One Administration Server and one managed server (no additional managed servers or clusters).

    Note:

    If there is the possibility you may add more managed servers in the future, you should choose a cluster deployment, even if it starts with a single server. Otherwise it will be necessary to manually re-target the Data Sources, Applications and Deployments in WebLogic from a single server to the cluster.
  • One (non-RAC) data source for the EDQ configuration schema, one data source for the results schema, and one data source for the staging schema. You can convert the datasources to RAC data sources with the Configuration Wizard, or you can do so later through the WebLogic Server Administration Console.

  • A Node Manager configuration that is predefined within the EDQ domain as edq/nodemanager. You cannot edit the Node Manager home in this configuration. You can change this configuration during this procedure, if desired.

Note:

Oracle recommends the use of managed servers that are administered by Oracle WebLogic Node Manager. You can configure Managed Servers, Clusters, and other advanced features through the Configuration Wizard, but it may be more practical to do so by using the WebLogic Server Administration Console after the initial configuration process. For more information, see Section 4.5, "Running Multiple EDQ Servers in the Same Domain."

4.3.1 Starting the WebLogic Server Domain Configuration Wizard

To start the Domain Configuration wizard, follow these steps. You will run the configuration wizard in graphical mode.

  1. Log in to the system as the EDQ installation user that you created in Section 1.4.5, "Operating System User."

  2. Go to FMW_HOME/oracle_common/common/bin directory, where FMW_HOME is the Fusion Middleware installation directory.

  3. Start the wizard by entering the following command:

    On Linux or UNIX operating systems:

    ./config.sh

    On Microsoft Windows operating systems:

    config.cmd

    The WebLogic Server Configuration Wizard is displayed.

4.3.2 Navigating the Domain Configuration Wizard Screens

Table 4-2 describes the screens in the configuration wizard. Certain screens are displayed only in certain situations depending on your selections. For help with any screen, click the Help button.

Table 4-2 Configuration Screens for Creating a New EDQ WebLogic Server Domain

Screen Action to Perform

Create Domain

Select Create a new domain.

In the Domain Location box, enter the path to the new domain (for example, FMW_HOME/user_projects/domains/edq_domain) or click Browse to create the domain directory.

Click Next to continue.

Templates

Select Oracle Enterprise Data Quality – 12.2.1.0 [edq]. The Oracle JRF and WebLogic Coherence Cluster Extension are automatically selected. Keep these selections.

Click Next to continue.

Administrator Account

Specify the user name and password for the EDQ domain's administrator account. This account is used to administer the domain and to log into the EDQ application.

Click Next to continue.

Application Location

Specify the directory in which the applications of the EDQ domain are to be stored.

Click Next to continue.

Domain Mode and JDK

Domain Mode: Select the startup operation mode for your domain from the following options:

  • Development Mode—In this mode, the security configuration is relatively relaxed and you utilize boot.properties for the username and password, allowing you to auto-deploy applications.

  • Production Mode—In this mode, the security configuration is relatively stringent, requiring a username and password to deploy applications. Before putting a domain into production, familiarize yourself with procedures for securing a production environment. For more information, see Securing a Production Environment for Oracle WebLogic Server.

JDK: From the Available JDKs list, select the JDK that you installed in Section 2.2, "Installing a Java Development Kit to Support EDQ."

Click Next to continue.

Database Configuration Type

Ensure that RCU Data is selected. This populates the connection information you supplied when you ran the Repository Creation Utility (see Section 4.2, "Running the RCU Program").

If you must change any of these fields, ensure that you use the schema prefix (DEV by default) and password that you specified when you ran RCU.

When done, click Get RCU Configuration to connect to the Oracle Database and bind the EDQ schemas.

Click Next to continue.

Component Datasources

Accept the defaults and then click Next.

JDBC Test

All schemas are selected and automatically tested.

Return to the previous screen to alter the connection configuration if necessary.

Click Next to continue.

Advanced Configuration

Select the Administration Server, Node Manager, Managed Servers, Clusters and Coherence Clusters option.

Administration Server

On the Administration Server screen, change the listen address from "All Local Addresses" to the IP address of the host where the Administration Server will reside.

Do not use All Local Addresses.

Do not specify any server groups for the Administration Server.

Node Manager

Let the Per Domain Default Location option remain selected, and enter a Username and Password for the Node Manager.

Managed Servers

Clone the EDQ server to create a copy of the server.

If you do not want to clone the EDQ server, select Add to add additional EDQ servers.

Select the IP address of the host on which the Managed Server will reside.

For configuration procedures for a clustered mode installation, see Section 4.6, "Configuring EDQ for High-Availability in a WebLogic Server Cluster"

Clusters

Select Add to add a cluster. If both servers are being deployed on the same machine, enter the machine name as the cluster address.

For additional configuration procedures for a clustered mode installation and a description of those configuration screens, see Section 4.6, "Configuring EDQ for High-Availability in a WebLogic Server Cluster"

To deploy EDQ in a non-clustered mode, select the appropriate components as needed.

Assign Servers to Clusters

Assign the managed servers that you created to the cluster that you created.

For configuration procedures for a clustered mode installation, see Section 4.6, "Configuring EDQ for High-Availability in a WebLogic Server Cluster"

Coherence Clusters

Configure the Coherence cluster that is automatically added to the domain.

Leave the default port number 0 as the Coherence cluster listen port.

Machines

Create a new machine in the domain. A machine is required so that the Node Manager can start and stop servers.

Click Add to create a new machine.

Specify edq_machine_1 in the Name field.

In the Node Manager Listen Address field, select the IP address of the machine in which the Managed Servers are being configured. You must select a specific interface and not "localhost." This allows Coherence cluster addresses to be dynamically calculated.

Verify the port in the Node Manager Listen Port field. The port number 5556, shown in this example, may be referenced by other examples in the documentation. Replace this port number with your own port number as needed.

Assign Servers to Machines

Assign the Administration Server and Managed Servers to the new machine you just created.

In the Machines pane, select the machine you want to assign the servers to; in this case, edq_machine_1.

In the Servers pane, assign AdminServer to edq_machine_1 by doing one of the following:

  • Click once on AdminServer to select it, then click on the right arrow to move it beneath the selected machine (edq_machine_1) in the Machines pane.

  • Double-click on AdminServer to move it beneath the selected machine (edq_machine_1) in the Machines pane.

Repeat to assign both edq_server_1 and edq_server_2 to edq_machine_1.

Configuration Summary

Review the configuration for your domain by selecting a view and then selecting individual items in the list for that view.

If the domain is configured as you want it, click Create to create the domain.

If you need to make changes to the configuration, click Back to return to the appropriate screen for the settings you want to change, or click on the links on the left to go to that screen.

Configuration Progress

Shows the progress of the domain creation.

When the process completes, click Next.

Configuration Success

Review the domain creation results.

Click Finish to exit the Configuration Wizard.


4.3.3 Configuring Launchpad to Show the Managed Server

To configure a launchpad to show the managed server that it is connected to, add this line to the director.properties in the local home directory:

[expr]adf.headerextra = ': ' || weblogic.Name

4.4 Start Oracle WebLogic Server

You must start your Administration Server, Managed Servers, and clusters to complete the installation. For information about starting managed servers using Node Manager and Administration Servers, see "Starting and Stopping Oracle WebLogic Server Instances" in Oracle Fusion Middleware Administering Server Startup and Shutdown for Oracle WebLogic Server.

See also Section 6, "Setting Server Parameters to Support Enterprise Data Quality" for important information about setting server parameters for startup.

4.5 Running Multiple EDQ Servers in the Same Domain

To support high availability scenarios, Oracle recommends that you configure a cluster of multiple EDQ servers to share the incoming load (for example, from a large number of simultaneous web service requests), and to provide continuous service in the event of failure of an individual server. This section provides some basic guidance about how to configure EDQ to support such a model using Oracle WebLogic Server.

Multiple EDQ managed servers can be configured to run in the same WebLogic Server domain either in a cluster or not. If all the servers are on the same machine, each server must listen on a different port.

The Java Required Files (JRF) Template must be applied to any managed servers that were created using the WebLogic Server Administration Console. This is equivalent to the library targeting performed automatically by the WebLogic Server Configuration Wizard.

The final step is:

  • Use the WebLogic Server Administration Console to modify the managed server settings for the additional EDQ servers.

Once multiple EDQ servers have been configured, you can leave them un-clustered and accessed directly using their respective Launchpad URLs to the relevant port, or you can configure them as part of a cluster using standard WebLogic Server practices. You can configure a separate front-end load balancer to handle incoming web service requests through a single cluster URL.

4.6 Configuring EDQ for High-Availability in a WebLogic Server Cluster

You can install and configure Oracle Enterprise Data Quality for high availability in an Oracle WebLogic Server cluster environment. The high availability features in Oracle Enterprise Data Quality have been enhanced to make the system function in a clustered environment, and tolerate individual RAC node failures and reconnect after complete database failures.

For more information on the high-availability features in Enterprise Data Quality, please see Oracle Fusion Middleware Understanding Oracle Enterprise Data Quality.

4.6.1 Configuring EDQ for Clustered Mode Deployment

To install and configure Oracle Enterprise Data Quality for high availability in an Oracle WebLogic Server cluster environment:

4.6.1.1 Running the Domain Configuration Wizard for a Clustered Deployment

To deploy EDQ in a clustered mode using the schema you created in the RCU:

  1. Follow the steps in Section 4.3.1, "Starting the WebLogic Server Domain Configuration Wizard" and Section 4.3.2, "Navigating the Domain Configuration Wizard Screens" to launch and configure the basic configuration screens in the Domain Configuration wizard.

  2. On the Advanced Configuration screen, select the Managed Servers, Clusters and Coherence option and then select Next.

  3. On the Manage Servers screen, select the EDQ server, for example "edq_server1", and select the Clone button to create a copy of the server. If you do not want to clone the server, select the Add button to add a new server and target it to the cluster.

    Name this server appropriately, for example "edq_server2".

    The listen port should not be the same port as in Server 1 and the Server Group should be EDQ Managed Server. The listen address should be the same as server 1, assuming they are both running on the same machine.

    Click Next and move to the Clusters screen.

  4. Select the Add button to add a cluster.

    If servers are being deployed on the same machine, enter the machine name as the cluster address. If not, the cluster address should be a comma-separated list of the machine names.

    Leave the frontend host blank if you are not deploying a load balancer. If you are deploying with a load balancer, enter the URL of the load balancer.

    Click Next.

  5. On the Assign Servers to Clusters screen, assign both the servers that have been created to the cluster that you have just created.

  6. Step through the Coherence Clusters screen.

  7. On the Machines screen you can add machines in the cluster, and allocate managed servers to them. The AdminServer does not need to be allocated to a machine since it is started manually via the startWebLogic.sh script and is not managed by the Node Manager.

  8. Step through the remaining screens and select Create to create the cluster.

4.6.2 Starting EDQ in a Cluster

Once the domain has been created, start the Admin Server by running the following command:

FMW_HOME/user_projects/domains/edq_domain/bin/startWebLogic.sh

Then start the Node Manager with the command:

FMW_HOME/user_projects/domains/edq_domain/bin/startNodeManager.sh

You can then log onto the Admin Console through a browser and start both of the Managed servers that were created during the configuration steps.

You can access the servers as follows:

  • You can access any of the servers' launchpads. Connect to one of the servers and start the Director application. Or,

  • You can connect via a load balancer in front of the cluster, and will be connected to whichever managed server the load balancer picks.

To display a dialog indicating which server the GUI is connected to, right-click the server in the project browser and select Server Information.

4.6.3 Enabling JMX API and Command Line for HA Clusters

The EDQ high availability deployment templates come with the internal JMX server disabled. This is to prevent port clashes between multiple managed servers which are running on the same host. Since the internal JMX server is not running the various command line tools which use this API, it will be unable to access the server. In particular the command line tools for starting and cancelling jobs will not work.

To enable these tools the internal JMX server must be re-enabled. This is achieved by editing the director.properties file in the oedq.local.home directory. Add the following line to the director.properties file:

management.localserver = "true"

This line enables the JMX server. By default the JMX server runs on port 8090. If multiple EDQ managed servers will run on the same host then you must configure each server to use a different port number.

Provided the EDQ managed servers are named with a number at the end, such as 'edq_server1', 'edq_server2', etc. (this is the default naming scheme), adding the line:

management.port = 8090 + servernum - 1

to the same director.properties file will cause the first EDQ managed server to run the JMX server on port 8090, the second server to run it on port 8092 and so on.

4.6.4 Landing Area

The landing area is a feature that allows EDQ to read and write data to server file system. If the landing area is to be used in a cluster then some consideration needs to be given as to how the landing area is shared amongst the managed servers. By default the landing area is located in the oedq.local.home area. If this is shared amongst the various hosts supporting EDQ managed servers then the landing area will continue to work as in a non-clustered system.

If the landing area file system is not shared amongst the managed servers, but is required for use then a number of options are available:

  • The location of the landing area can be changed using the landingarea property in the director.properties file to a location that is shared amongst the hosts running EDQ managed servers.

  • Since an EDQ job runs all its tasks and processes on the same managed server, any files consumed or generated by the job are written to its local landing area. External tasks can be added to the job to transfer any incoming or outgoing files to an appropriate shared location.

  • If an EDQ job is consuming external files then these could be copied to all managed servers before the job is started.

  • If an EDQ job is generating files for consumption by further EDQ jobs then the landing area can be synchronized across the various managed server between the various job run by use of an external tool such as rsync.

If files are being generated and consumed within the same job then a shared landing area may not necessary. This is because the entire job will run on the same managed server and so access the local landing area.