1 Planning an Enterprise Data Quality Installation

This chapter helps to prepare you for your Enterprise Data Quality (EDQ) installation. Various topics are covered that should be reviewed thoroughly to help ensure that you do not encounter any problems either during or after the product installation and domain configuration.

This chapter includes the following sections:

Overview of EDQ

EDQ provides a comprehensive data quality management environment that is used to understand, improve, protect and govern data quality. The software facilitates best practice master data management, data integration, business intelligence, and data migration initiatives. It provides integrated data quality in customer relationship management (CRM) and other applications.

This documentation guides you through the selection, installation and configuration of the components that are needed to support EDQ.

Overview of the Installation and Configuration Tasks

This section (Table 1-1) provides an overview of the EDQ installation and configuration tasks that you will perform, in the order that they should be performed.

Table 1-1 EDQ Product Installation Procedure Tasks

Task Action to Perform

Understand and select the external software components that support EDQ.

See Choosing EDQ Components and Versions

Satisfy EDQ system requirements.

See Satisfying EDQ System Requirements

Obtain an EDQ installation file from Oracle Software Delivery Cloud.

See Downloading EDQ

Install the JDK and your chosen application server and database components.

See Installing the Required External Software Components

Install the EDQ software.

See Installing Enterprise Data Quality

Configure EDQ

For instructions see, Configuring Enterprise Data Quality with Oracle WebLogic Server or Configuring Enterprise Data Quality with Apache Tomcat

Set system parameters.

See Setting Server Parameters to Support Enterprise Data Quality

Next Steps

For log in and basic use information, see Next Steps After Configuring Enterprise Data Quality

(Optional) Upgrade from a previous release of EDQ.

If using Oracle WebLogic Server or Apache Tomcat, see Upgrading Enterprise Data Quality

Choosing EDQ Components and Versions

The following sections show you the components that are required to support EDQ and the supported versions of those components and EDQ.

Choosing the Correct Combination of EDQ Required Components

EDQ is a Java Web Application that uses a Java Servlet Engine, a Java Web Start graphical user interface, and a data repository within a database. As such, it requires access to the following components:

  • a Java Development Kit (JDK)

  • a Java Application Server to supply web services. Oracle WebLogic Server and Apache Tomcat are supported.

  • a structured query language (SQL) relational database management system (RDBMS) to store configuration data, working data, and the results of work performed by the processes. Oracle Database is supported.

The following application servers and databases are supported for use with EDQ:

  • Oracle WebLogic 12.2.1 (from the Fusion Middleware Infrastructure 12.2.1 package), or Apache Tomcat 8.

  • Oracle Database 11.2.0.4+ or 12.1.0.1+.

Instructions for installing these components are in Installing the Required External Software Components See Supported Platforms and Component Versions for supported versions of each of these components.

Supported Platforms and Component Versions

Review the list of certified platforms, JDKs, databases, application servers, and releases for EDQ prior to installation. This information is listed in Enterprise Data Quality Certification Matrix at

http://www.oracle.com/technetwork/middleware/ias/downloads/fusion-certification-100350.html

Satisfying EDQ System Requirements

This section describes the hardware and software requirements of EDQ. These requirements represent the server configurations that are certified and supported by Oracle for the EDQ product.

Disk and Memory Requirements

Depending on the tasks that EDQ is required to perform, it can place heavy demands on the hardware used to run it. A recommended minimum hardware specification for an EDQ server is:

  • 8GB physical memory, with 4GB allocated to the EDQ Java Virtual Machine (JVM)

  • At least 4 logical CPUs

  • At least 500GB of hard disk space on the database server. The EDQ results schema, which contains the working data that EDQ generates, must have enough space to contain at least 20 times the volume of source data that you expect to process through EDQ. This size may increase if there are many EDQ users working on the same projects at the same time. The configuration schema remains small, normally less than 5GB, unless there is a large amount of user-modified reference data and Case Management data.

Note:

These recommendations do not represent sizing advice for any specific deployment, but rather a starting point for testing size requirements in your environment. It may be appropriate to deploy a larger machine or many machines, depending on the processing load placed on EDQ.

EDQ Directory Requirements

EDQ uses an installation directory and two configuration directories. You should record the location of these directories in case you need to apply manual updates to any of their contents.

EDQ Installation Directory

During the installation process, you must specify an installation directory to contain the EDQ installation files. This directory is known as the EDQ Home (EDQ_HOME) directory and is named as follows:

  • If you are installing EDQ as part of the Oracle Fusion Middleware product stack, you must install the EDQ Home directory as a subdirectory of the Oracle Fusion Middleware home (installation) directory. A typical default Fusion Middleware home directory is as follows, depending on the platform:

    Linux and UNIX:

    /opt/Oracle/Middleware/FMW_HOME

    Windows:

    C:\Oracle\Middleware\FMW_HOME

    Note:

    The Middleware home directory is referenced as FMW_HOME in this guide.

  • If you use the WebLogic Server or Apache Tomcat application server, you can install the EDQ Home directory in any directory.

The EDQ Home directory requires approximately 1GB of hard disk space.

EDQ Configuration Directories

EDQ requires two configuration directories, which are separate from the EDQ Home (installation) directory that contains the program files. The configuration directories are:

  • The base configuration directory: This directory contains default configuration data. Once EDQ is installed, the files in the base configuration directory must not be altered, renamed, or moved.

  • The local configuration directory: This directory contains overrides to the default configuration. EDQ looks for overrides in this directory first, before looking in the base configuration directory. Files in the local configuration directory can be modified to customize or extend EDQ.

The names and locations of the configuration directories are as follows:

  • If you are using Oracle WebLogic Server, the Configuration Wizard automatically creates and populates the configuration directories in the EDQ domain with the names of oedq.home (base configuration directory) and oedq.local.home (local configuration directory). An example installation path is:

    WLS_HOME/user_projects/domains/edq/config/fmwconfig/edq/oedq.home
    WLS_HOME/user_projects/domains/edq/config/fmwconfig/edq/oedq.local.home
    
  • If you are using Apache Tomcat, you create the configuration directories manually in any location, with any names, and the configuration utility will populate them. You are prompted to create the directories during the installation instructions.

On a default EDQ installation, the configuration directories occupy approximately 1MB in total. When an EDQ instance uses the landingarea, this total can increase as files are loaded up for processing and written out.

UNIX System Resource Requirements (ulimit)

Depending on how the UNIX system that hosts the application server is configured, you may find that the application server cannot create files larger than 1 GB. This restricts your ability to work with large data sets if you intend to use files to transfer data to EDQ for processing. The most common limit that has to be changed is the number of processes that the application server user (for example. 'weblogic') can create. On large servers, this often needs to be increased, for example to 65536.

System resource limits are controlled by the ulimit command. Default ulimit values exist, which you can view by using the ulimit -a command. Look at the settings for file size, process limit, and file handle limits. The hard ulimit on file size may need to be adjusted upward or removed for your application server account. Consult your System Administrator for assistance, if needed.

Virtual Hardware

You can install EDQ on virtualized systems using a virtualization tool, such as Oracle VM Server. Both the virtual system and the physical system must fulfill the minimum hardware requirements listed in this documentation.

If load balancing software is used to deploy multiple virtual systems onto a single physical system, care must be taken to ensure that the load balancing software is carefully tuned. In general, EDQ requires considerable compute power and memory in the Middle Tier (where it performs most of its processing), and considerable tablespace in the Database (where it writes data and results). Between batches, very little load is imposed on the system. When processing a batch of data, EDQ rapidly drives hardware to be CPU or I/O bound. Unless the virtualized load balancing is correctly configured, suboptimal performance results.

Operating System User

An operating system user account is used to install and upgrade EDQ on your servers and install the application server. This user is required on all platforms. This user must have full permissions (read, write and execute) to the EDQ installation (EDQ_HOME) directory, configuration directories, and all database directories. This user account is referred to as the EDQ installation user in this documentation. For more information about the EDQ directories, see EDQ Directory Requirements

Note:

When installing on UNIX or Linux operating systems, do not use the root user as your EDQ installation user account.

Downloading EDQ

To download the EDQ installation and configuration files, obtain the generic package installer from the Oracle Technology Network website as follows:

  1. Enter the following URL into a web browser:

    http://www.oracle.com/technetwork/middleware/oedq/overview/index.html

  2. Click Sign-in/Register.

  3. Locate and select Oracle Enterprise Data Quality Media Pack that you want to download.

  4. Click the Download button.

  5. Browse to the directory where you want to save the file. Click Save to start the file download. A ZIP file is downloaded.

  6. Extract the ZIP file to a temporary directory.