Skip Headers
Oracle® Enterprise Data Quality for Product Data Governance Studio Reference Guide
Release 5.6.2

Part Number E23609-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

2 Configuration and Administration

How to configure and administer the Governance Studio is explained in this chapter.

Configuring a DSA for Governance Studio

In order to use the Governance Studio, you must use a DSA that has been properly configured. The configuration requires selection of appropriate output types. The functions of the Governance Studio options for output types are described in this section.

Additionally, this section is a summary of the information required to create a DSA template for a Governance Studio project. For more information on DSA design, see Oracle Enterprise Data Quality for Product Data Application Studio Reference Guide or consult with your Enterprise DQ for Product Administrator.

Accessing the Output Step in the DSA

In the Application Studio, open a DSA, and then double-click any Text output node.

Note:

Governance Studio projects report on text-based output only so the DSA must have at least one text output step that returns results. The Governance Studio ignores other types of outputs such as database outputs provided you set the output type to Not Shown (will not appear in application) as described in this section.

Click the Oracle DataLens Governance Studio tab to view the Governance Studio output configuration options.

Surrounding text describes govstutab.png.

Governance Studio Output Type Options

As shown in previous figure, you may choose one of the following output types to determine how data is displayed in the Governance Studio using the Output Step dialog in the Application Studio:

Not shown (will not appear in application)

Results will not appear in the Governance Studio console Output tabs. Use this output type for results that do not conform to Governance Studio requirements, such as database output steps.

Review, Approve and Route

Results are the records of data that exactly meet the output criteria specified are displayed. Use this output type for displaying the results of a DSA process that requires a person to continue processing.

Review, Approve and Route with AutoSuggest

Results are the records of data that exactly meet the output criteria specified are displayed. Use this output type for displaying the results of a DSA process where the records fall below the quality threshold and require enrichment. A person would have the opportunity to invoke the AutoSuggest feature to provide a set of suggestions to enrich the records that have missing attribute values.

Match Results (Single Selection Review, Approve, and Route)

Possible data records meeting match or duplicate criteria are displayed below a split-screen in the output spreadsheet. Only a single data record from the list of matched items can be selected; the record is available for downstream processing. Use this output type to display the results of a DSA template that runs a match process.

Match Results (Multiple Selection Review, Approve, and Route)

Records meeting match or duplicate criteria are displayed below a split-screen in the output spreadsheet. Multiple records from the list of matched items can be selected; the records are available for downstream processing. Use this type for displaying the results of a DSA template that runs a match process.

Summary Data for Graphing

Summary information is displayed in the graph; no selection of records is available with this option. The bars in the graph operate like buttons and when clicked jump to the Output tab for the data represented in the selected bar.

Summary Data for Graphing - No Tab, only appears in Graph Summary

An individual tab is not created; the data appears in the Graph Summary tab only.

Other (Display Only)

Result information is displayed in a spreadsheet form that does not allow user interaction.

Show Message Dialog

Informational messages, including errors and warnings, are displayed to the user.

Governance Studio Graph Options

This section of the Output Step dialog in the Application Studio is active only when one of the Summary Data for Graphing options are selected as described in the previous section.

There are four graphing options available for the output types. You can mix and match between two sets of options for the graphs and create a label for the X and Y axes as follows:

  • Bar direction

    • Horizontal

    • Vertical bars

  • Bar color

    • Multi-color bars

    • Single-color bars

As shown in following figure, you may choose from the following graphing options to configure how graphs appear in the Governance Studio:

Surrounding text describes govstutab1.png.

The data values across the defined Output tabs can be compared and displayed in the following types of charts:

Bar Chart

A 3-D, colorized visual effect that shows the data in a bar format. The orientation of the bars is determined by the selection of Horizontal Graph Bars or Vertical Graph Bars options. The use of color in this type of chart is defined using the Multi-Color Graph Bars (a different color for each Output tab) or Single-Color Graph Bars (one color for all Output tabs.)

Line Chart

A line connecting the various output data points in a simplified manner.

Scatter Chart

A series of connected markers that show the data relationship.

Pie Chart

A 3-D, colorized representation where each tab is depicted by its percentage of contribution to the total.

Stacked Chart

A 3-D, colorized representation where each tab is depicted by its percentage of contribution to the total.

Accumulate Graphing Data

The original data is retained when new data is added and all are rendered in the graphs.

Replace Graphing Data

All data is replaced by new data prior to rendering in a graph. The DSA is responsible for recalculating the results each time to ensure that the graph has the correct values.

X and Y-Axis Labels

You can add meaningful labels to indicate the data is graphed on the X and Y axes or you can leave it blank to use the default labeling.

Help Button

Use the Help button to review information about the various graphing options.

Note:

If you modify an output step in the DSA that underlies your Governance Studio project, you must check in your DSA changes and then refresh the Governance Studio project.

Starting the Governance Studio

Once you have added an output step in your DSA and configured it for Governance Studio output, you are ready to create a project in the Governance Studio. This section describes the steps to follow to create and run a Governance Studio project from a DSA created with the Application Studio.

If this is the first time you have started the Governance Studio, the client workspace appears blank as in the following figure; otherwise the results from the last job run are displayed.

Surrounding text describes startnew.png.

Setting the Transform Server

On first use of the Governance Studio GUI, a message is also displayed, requesting that you set the Transform Server to the location of the Oracle DataLens Server that will be running your Governance Studio projects.

Surrounding text describes set_server.jpg.
  1. 1. Click OK.

    Surrounding text describes settranserv.png.

    You can choose to select a specific server or a group of servers to use for data transformation.

  2. Select either the Server Group or Specific Server option.

  3. Click the down arrow to view your server options, and select the appropriate server.

    If selecting a group of servers, you can choose from the Administrator, Development, Production, or Quality Assurance server groups. The Administrator server group is the default.

  4. Click OK.

    The transform server is contacted and this server is used for all data transformations. You can use the Set Transform Server option from the Tools menu to change your Transform Server and host port at any time.

    The Governance Studio application start-screen is displayed.

Creating and Running a Governance Studio Project

From the File menu, select New Project to create a new Governance Studio project. The Create Project dialog box is displayed as shown in the following figure.

Surrounding text describes creatproj.png.

The Create Project dialog creates a new Governance Studio project. Following are the Governance Studio project parameters that you can configure:

Project Name

Name of the Governance Studio project.

Description

A description of the project. You must specify a description.

Data Service Application

Identifies the DSA that is run for the Governance Studio project and how the input data is received. Because Governance Studio requires at least one text output step that returns results, DSAs with no text output are not listed and cannot be used to create a new project.

Input File

Specifies the type of input data.

Retrieve the Input Data

After you specify the project name and supply a description, you must select a DSA from the list of DSAs that have been checked in to your Oracle DataLens Server (for more information on creating DSAs, see Oracle Enterprise Data Quality for Product Data Application Studio Reference Guide). The input data that you specify must be compatible with the specifications of the DSA you created earlier. For more information, see "Configuring a DSA for Governance Studio".

To load the input data into the project, you must first select one of the data modes listed, and then the corresponding input data file type using one of the colorized options. As you hover over a option, the color changes and when you click on it the selected option becomes green.

The options for receiving input data for the DSA are as follows:

  • Load from File - All data is loaded.

  • Run from Local File Path - Data is streamed from disk.

  • Manual (Test) Input - Data is supplied with manual entries.

  • Input from Database - Data is obtained from a database.

The Create Project dialog box input data types options change based on your data mode selection. For example, if the input data is from a database, only the Input from Database selection is active and is selected by default.

The input data types supported are:

  • Tab Separator

  • Excel File

  • CSV File

  • Custom Separator

Note:

Count data must be in integer format to avoid generating an error.

The following section details an example of opening an input data file to retrieve and load the data.

Opening an Excel Input File

  1. Select the Load from File option.

  2. Select the Excel File option to specify an Excel file.

  3. Click Open….

    The MS Excel File dialog box is displayed.

    Surrounding text describes loadexcel.png.
  4. Click Browse, locate the data file, and then double-click it.

  5. If the worksheet containing your data is not selected, select if from the Worksheet Name list.

  6. Select the fields from the input file shown in the left pane that match the fields in the DSAs transform map as shown in the right pane and move them using the arrow buttons.

    The single arrow buttons move one field and the double arrow buttons move all fields to the left or to the right. You can reorder how the fields are processed by the DSA using the up and down arrows. The arrows that contain a line above or below it move the selected field to the top or bottom position respectively.

    The -BLANK DATA- selection may be used if there are no output fields associated with the input data in the DSA.

  7. Click OK to accept your changes.

    You are returned to the Create Project dialog box.

  8. Click OK.

  9. Click the Transform Data button to process the data, which populates the Summary and Output tabs upon a successful run.

    Your new Governance Studio project is created and the Source tab of the Governance Studio console displays your input data as shown in following figure.

    Surrounding text describes srctabdisp.png.

    Note:

    The Governance Studio input data row limit is 64K rows for Excel 2003 and 1M for Excel 2007. Attempts to load input data files greater than this will result in a file input error. You should create several smaller input data files and process them separately.

The data file loading information, including the path to the file, is displayed in the status bar. Below the file information is the name of the DSA that will be used to process the input data.

Project Templates

The ability to create and subsequently open a project from a particular template can be very useful in ensuring consistent use of DSAs. Additionally, it allows privileged users to process multiple batches reusing the same template.

Creating a Project Template

  1. To create a new project from an existing DSA template, from the File menu, click New Project From…. You are prompted for a project ID as shown in the following figure.

  2. Enter a unique ID text string to identify this project and click OK.

    This unique text string is appended to the current DSA name so that it is clear which DSA was used to create the template.

    For example, if the DSA name is 'SCS_MAIN_PROCESSING' and you enter '1000999', then the template name would be 'SCS_MAIN_PROCESSING_1000999'.

Opening a Project Template

You can retrieve any template created using the New Project From… feature.

  1. From the File menu, click Open Project From….

  2. Locate the template you want to use, select it, and then click OK.

    The project is opened, the data transformed, and the project is ready for use.

Opening Projects Without Results

You can open an existing project file without processing and displaying data results in any Output tabs. To open a project without data results, from the File menu, click Open Project - No Results, select the project you want to open, and then click OK.

Note:

If you created your project with version 5.5 or later, the Data Summary tab is populated as this data is saved with the project.

This feature is useful in opening a project that previously could not be opened because it exceeded the maximum memory allocation (you can monitor the Memory Cache, located in the client workspace frame, usage as it is dynamically updated). When that is the case, use this feature to open the project as previously described and then set the Conserve Memory and Limits on Rows Retrieved project options as appropriate so that you can process your data. For more information, see "Memory Tab".

Note:

To avoid exceeding the allocated memory, you may need to adjust your client's EDQP Java Web Start memory settings. For more information, see Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Administration Guide.

This option can not be used if the project was last run asynchronously though it can be used after you retrieve results using the Open Project command on the File menu.

Configuring Options

You can configure project options including how graphs and tabs are displayed with the two options menu selections on the Tools menu.

To set the options for the open project only, from the Tools menu, select Project Options.

To set the global options for all projects in the Governance Studio, from the Tools menu, select Options.

Surrounding text describes optioncombo.png.

Graphs Tab

This tab is available for both Governance Studio Options and Project Options. It allows you to specify how graphs are displayed on the Graph Summary and Data Summary tabs.

The Governance Studio Options graphing settings are global and are the default settings when a new project is created. However, the Project Options graphing settings that you define for a project are used for that project only and are exported with the project so that they remain the same when imported by another user.

The following options can be configured:

Data Summary Tab Section

Select the Show Data Summary Tab check box to show the Data Summary tab.

Graph Summary Tab Section

Set the maximum number of graphs that will be displayed on the Graph Summary tab using the list.

Set the placement of the tab summary by selecting one of the following options:

  • it appears on all tabs

  • it appears only on the first tab

  • no graph summaries appear on any tabs

Maximum in Graphs Section

Set the maximum number of horizontal and/or vertical bars using the lists.

Set which records are shown if the total number of records exceeds the maximum number of bars allowed.

Tabs Tab

This is an Options tab. It allows you to specify one or the other or both of the following to change the way that tabs display as follows:

Show all Tabs

All of the tabs are displayed rather than being hidden as described in "Task Panes".

Show Tab Icons

The various icons for the types of tabs are shown on the tab adjacent to the name so that you can easily identify the function of the tab. For example, all Output tabs will display a green check mark or yellow question mark icon while all Data Summary tabs display a bar graph icon.

General Tab

This is an Options tab. Use the general options as follows:

Finish Job before Retrieval

When set, the time required to complete a job then retrieve and display the data in each Output tab is displayed in the Status Field for your review as in the following example:

Results Retrieved for job 212. Run=57 seconds. Retrieval=5 seconds.

The normal Governance Studio behavior of retrieving and displaying results before the job has completed is suspended when this option is selected.

Validate local Data Service Application

Allows you to validate the DSA that resides on a local drive rather than the same DSA that resides on the Oracle DataLens Server. The local DSA is the one you have checked-out to modify so this option ensures its validity before checking the DSA into the Oracle DataLens Server.

Additionally, the Governance Studio ensures that the local DSA mirrors the functionality of the Oracle DataLens Server DSA that is used to process the data.

Default Row Warning Count

Allows you to set a default number of source data rows beyond which the Governance Studio will display a warning message to allow the user to choose to retrieve all rows or the default number of rows specified. A warning message is displayed for each tab that has a resulting number of transformed rows (result count) higher than the specified warning count.

The default number of rows is set to 5000. Setting it to zero ensures that no warning messages are displayed. Setting a reasonable default can avoid potential memory issues when your project returns large sets of data.

This overall Governance Studio warning count option can be overridden for each Output tab by setting individual limits using the Memory tab of the Project Options as described in the next section.

Memory Tab

This is an Project Options tab. Use the options to set how the Governance Studio will use the available memory as follows:

Conserve Memory

Set this option to populate Output tabs only when the tab is selected thereby saving memory. When you run a job (or load it from disk), the data is retrieved and saved for each Output tab though it is not displayed until it is selected. With this option selected, your project will open and no results are displayed until you click an Output tab.

When using this option, the Keep Lines Cutoff option becomes active so that you can specify if data is to be retained in the selected Output tab when another one is selected.

Note:

You can not use the Re-run button (as described in "Re-Running a Secondary DSA") to reprocess data when this option is set because all Output tabs must be fully populated so that the re-run operation can add data to tabs as necessary.
Keep Lines Cutoff

Set the amount of data to be retained in an Output tab when another tab is selected; this option is active after the Conserve Memory check box is selected.

Setting this value to:

  • zero (the default) - ensures that all of the data is retained in the tab when another one is selected.

  • > zero - removes the data from the tab if the rows of data exceeds the number specified. For example, if you set the value to 1000 rows and 1200 rows exist, then the data is removed when you select a different tab. Conversely, if only 900 rows exist, then the data remains because it did not exceed the 1000 row limit set.

Limits on Rows Retrieved

Set an individual value for the maximum number of rows that will be retrieved for each of the Output tabs. For example, 50,000 rows may be generated for an Output tab and you can choose to retrieve a maximum of 5000 to decrease the amount of time required to process the job.

Surrounding text describes projopt1.png.

Use the table as follows:

  • Tab column - Each tab is listed.

  • Limit column - Enter the maximum number of rows you want to retrieve for one or more tabs. Setting this value overrides the Warn column, locks it from editing, and no warning messages are displayed when the maximum is reached.

  • Warn column - Use the default warning message value displayed or enter the number of source data rows beyond which the Governance Studio will display a warning message to allow the user to choose to retrieve all rows or the default number of rows specified. If you set this value then enter a Limit value for the tab, the Warn value is reset to the Governance Studio default as it is ignored by the override Limit value. You can reset this value by entering the Governance Studio default warning count surrounded by less than and greater than symbols. For example, if the default warning count was not changed you would enter'<5000>'.

These Limit and Warn values specified override any Default Row Warning Count value set as described in "General Tab".

Tip:

You can find out what the total number of generated rows by viewing the tooltip for a given tab without selecting it.

Note:

To avoid exceeding the allocated memory, you may need to adjust your client's EDQP Java Web Start memory settings. For more information, see Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Administration Guide.