Before You Begin
This 10-minute tutorial shows you how to perform a set of manual and recommended data preparation actions to your dataset.
Background
Preparing and cleansing your data is an important step before visualizing a dataset. For example, the set might have sensitive data such as customers' social security numbers that you don't want to expose. You can hide or transform all characters of the social security number column, remove columns from a dataset, or extract portions of a data column to create a new column that contains the extracted data.
You can use the recommendations and available data preparation options in Oracle Analytics to improve data quality.
In this tutorial, you use a spreadsheet as the data source. You can add spreadsheet files that have the XLSX extension and that are no larger than 100 MB. You can also use comma-separated value (CSV) and text (TXT) files to create datasets. You can perform data preparation actions on supported data sources.
What Do You Need?
- Access to Oracle Analytics Cloud or Oracle Analytics Desktop
- Download the accountinfo_sales.xlsx file to your computer.
Create a Data Source
Oracle Analytics displays recommendations for the data, by column, in the dataset. In this tutorial, you accept some of the recommendations that are relevant for your analysis. You can also implement transformation changes for data in columns that don't have specific recommendations.
- Sign in to Oracle Analytics.
- On the Home page, click Create, and then click Dataset.
- In Create Dataset, click Drop data file here or click to browse, select the
accountinfo_sales.xlsx
file, and then click Open. - In Create Dataset Table from accountinfo_sales.xlsx, click OK.
- In the Join Diagram, click the accountinfo_sales tab.
- In the Transform Editor, select the id column. In Properties, click Measure in the Treat As row, and then select Attribute.
- Select the Sales column. In Properties, click Number Format . In the Number Format row, click Number, and then select Currency.
- Click Save. In Save Dataset as, enter
accounting_salesinfo
, and then click OK.
Extract Data from a Column
When you extract data from a column, a new column is created that contains the extracted data. In this section, you extract the area code from phone numbers that use the North American Numbering Plan.
- In the accountinfo_sales dataset, click Toggle Quality Tiles to close the insights over each column.
- Select the phone column. In the Recommendation list, click Extract area code from phone.
Oracle Analytics adds an area code column to the dataset.
Conceal Sensitive Customer Data
To comply with security policies for sensitive data, you can obfuscate all or a portion of the data in a column. If some users need to see the sensitive data, you can create a duplicate dataset containing the sensitive data.
- Select the ccnumber column.
- In the Recommendations list, click Obfuscate First 12 Digits of ccnumber.
- Select the ssn column. In the Recommendations list, click Obfuscate First 5 digits of ssn.
Enrich Data with Geographic Coordinates
- Select the zip column. In Recommendations, click Enrich zip with Lat (latitude).
- In Recommendations, click Enrich zip with Lon (longitude).
- Click Save .
Your changes are listed in the Preparation Script pane, and then applied when you save the dataset.
Inspect the Dataset
In this section, you review the changes implemented in the dataset.
- Click Go back .
- On the Home page, select the accountinfo_sales dataset, click the Actions menu , and then select Inspect.
- In the dataset page, click Data Elements to view the columns added to the dataset.
Learn More
Transform and Enrich Data in Oracle Analytics
E97621-10
December 2023
Copyright © 2023, Oracle and/or its affiliates.
Learn how to transform data and enrich datasets to use in visualizations in Oracle Analytics.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms governing the U.S. Government's use of Oracle cloud services are defined by the applicable contract for such services. No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.