Before you Begin
This 10-minute tutorial shows you how to use data quality insights to implement changes in datasets using Oracle Analytics.Background
When you create a dataset from a file or connection, Oracle Analytics profiles the data and provides data quality insights for a representational sample of the data. In this tutorial, you create a dataset from two spreadsheet files and use a common column to create a join between the dataset tables.
Data quality insights provide histograms and frequency tiles as visual overviews of your data columns. Quality bars indicate the validity of the data in the columns, and can help you resolve hidden issues in your data.
In this tutorial, you learn how to fix hidden issues such as spelling errors, missing values, and non-standard values in your data.
What Do You Need?
- Access to Oracle Analytics
- Download web_customers.xlsx and web_orders_data.xlsx to your computer
Create a Dataset Table
In this section, you create a dataset from the web_orders_data and web_customers spreadsheet files.
- Sign in to Oracle Analytics.
- On the Home page, click Create, and then click Dataset.
- In Create Dataset, click Drop data file here or click to browse, select the web_orders_data.xlsx file, and then click Open.
- In Create Dataset Table from web_orders_data.xlsx, click OK.
- In the Connections panel, click Add , and then click Add File.
- In File Upload, select web_customers.xlsx, and then click Open.
- In Create Dataset Table from web_customers.xlsx, click OK.
- In the Join Diagram, drag the web_customers table on top of the web_orders_data table.
- In Join web_orders_data - web_customers under the web_orders_data column, click the select a column , select CUSTID.
- In Join web_orders_data - web_customers under the web_customers column, click the select a column , select Custid, and then click outside of the dialog.
- Click Save . In Save Dataset as, enter
web_orders
in Name, and then click OK.
Update the Treat As Property in Columns
In this section, you change the column's treat as property from handling the numeric values as countable numbers (measures) to attributes.
- Click the web_orders_data tab, select the Order Line ID column, click Measure , and then select Attribute.
- Select the Order ID column, click Measure , and then select Attribute.
- Click Save .
Standardize Data Values
In this section, you standardize the values in the Order Priority column to use High, Medium, and Low.
- In the web_orders_data tab, hover the Order Priority column, right-click and select Replace Values List.
- In Replace List under the Original Value column, go to the Critical row. Enter
High
in the row under the Replace Value column. - Under the Original Value column in the Not Specified row, enter
Low
in the Replace Value column. - Click Add Step.
- Click Save .
Correct Spelling to Improve Quality
In this section, you can see that Phoenix is spelled incorrectly in some instances as Pheonix. Correcting the misspelled city name increases the percentage of valid entries in the column.
- In web_orders_data, hover over the Ship to City quality bar to view the percentage of valid and invalid data values.
- Click the Ship to City filter and select Filter by Invalid or Missing.
The misspelling of the City of Phoenix makes up the entire percentage of invalid values in the Ship to City list.
- In the Ship to City column, right-click and select Replace Value List.
- In the Pheonix row, enter
Phoenix
in Replace Value, and then click Add Step. - Click Save .
Replace Missing Values
In this section, you add a value to replace the missing values in the Payment Type Name column.
- Click the web_customers tab.
- In the Payment Type Name column, hover over the quality bar to view the percentage of populated text values.
- Click the Payment Type Name filter and select Filter by Invalid or Missing.
- In Replace Value, go to the row with Missing or Null, enter
Gift Card
in Replace Value row, and then click Add Step. - Click Save .
Learn More
Assess and Improve Data Quality in Oracle Analytics
F42213-04
January 2024
Copyright © 2024, Oracle and/or its affiliates.
Learn how to instantly review and improve data quality in Oracle Analytics.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms governing the U.S. Government's use of Oracle cloud services are defined by the applicable contract for such services. No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.