1 Introduction to Oracle Data Masking and Subsetting

The Data Masking and Subsetting features of the Enterprise Manager for Oracle Database Plug-in help you to securely manage test data.

When performing real-world testing, there is a risk of exposing sensitive data to non-production users in a test environment. The test data management features of Oracle Database helps to minimize this risk by enabling you to perform data masking and data subsetting on the test data.

When production data is copied into a testing environment, there is the risk of breaching sensitive information to non-production users, such as application developers or external consultants. In order to perform real-world testing, these non-production users need to access some of the original data, but not all the data, especially when the information is deemed confidential.

Oracle Database offers test data management features that help reduce this risk by enabling you to:

  • Store the list of applications, tables, and relationships between table columns using Application Data Modeling, and sensitive columns. For more information, see Application Data Modeling.

  • Replace sensitive data from your production system with fictitious data that can be used during testing using data masking. For more information, see Data Masking.
  • Replicate information that pertains only to a particular site using data subsetting. For more information, see Data Subsetting.

Note:

You must have the Oracle Data Masking and Subsetting Pack license to use these security features.

The Need to Mask and Subset Data

The reasons to mask and subset data include the following:

  • Limit sensitive data proliferation: The growing security threats have increased the need to limit exposure of sensitive information. At the same time, copying production data for non-production purposes such as test and development is proliferating sensitive data, expanding the security and compliance boundary, and increasing the likelihood of data breaches.

  • Share what is necessary: Often, companies have to share a production data set with internal and external parties for various reasons. For example, a Cloud application provider may have to extract and share information specific to their individual subscribers on demand. Another example is a company serving a court order must extract and share a subset of production data with the court. In several cases, it is efficient to extract and share a portion or subset of information instead of sharing the entire production dataset.

  • Comply with data privacy laws and standards: Data privacy standards such as PCI-DSS and European Union (EU) General Data Protection Regulation (GDPR) emphasize on protecting sensitive information in non-production environments because these environments are typically not as protected or monitored as production systems. EU GDPR also mandates an individual’s right to be forgotten, erasure, portability or rectification which requires identifying and processing a subset of information.

  • Minimize storage costs: Using the entire production data for test, development, and QA purposes will incur additional storage costs and prolong the test and development cycles, increasing the overall storage and operational cost.

Masking and subsetting data addresses the above use cases. Data Masking is the process of replacing sensitive data with fictitious yet realistic looking data. Data Subsetting is the process of downsizing either by discarding or extracting data. Masking limits sensitive data proliferation by anonymizing sensitive production data. Subsetting helps to minimize storage costs by deleting data or extracting a subset of data for sharing or archival. Data Masking is also known as Static Data Masking, and Data Subsetting is also known as Test Data Management.

Components

This section discusses the components of Oracle Data Masking and Subsetting:

Application Data Modeling

Application Data Modeling (ADM) simplifies the effort of sensitive data discovery through automated discovery procedures and sensitive column types from Oracle database tables. The automated discovery procedures not only discover columns holding sensitive information, but also identify the referential or parent-child relationships between these columns defined in the database.

A sensitive column type creates forensics of sensitive data elements such as national insurance numbers using combination of data patterns in column names, column data, and column comments. Automated discovery procedures leverage sensitive column types to sample data in the database table columns while scanning the database for sensitive information.

Application Data Modeling provides various out-of-the-box sensitive columns types such as credit card numbers, social security numbers, phone number, etc. Custom–sensitive column types can be easily created using regular expressions. With Oracle Data Masking Pack, and the Data Discovery and Modeling capability in Oracle Enterprise Manager, enterprises can define data pattern search criteria allowing database administrators to locate these sensitive elements. For example, data pattern such as 15 or 16 digits for credit card numbers or 9 digit formatted US social security numbers.

Oracle Data Masking and Subsetting also allows you to import and export Application Data Models.

For more information, see the Application Data Modeling chapter.

Data Masking Format Library

Oracle Data Masking and Subsetting provides security administrators the ability to maintain a centralized definition for common masking formats. This centralized definition ensures that database administrators apply the same masking rules across the entire enterprise regardless of the database and where the sensitive data resides. Thus, organizations and enterprises can ensure that the sensitive data is masked consistently and in compliance with the established standards for data sharing.

The data masking format library contains a centralized set of data masking format for commonly used sensitive data. For example, there are mask formats for phone numbers, social security numbers, credit card numbers, and fictitious first and last names. Security administrators can define these mask formats and maintain them in the format library to be used by the DBAs across the entire enterprise.

The Data Masking format library provides:

  • several built-in out-of-the-box masking formats and primitives that can be used by the Data Masking definitions

  • Built-in mask primitives that can be combined to create additional formats

  • User-defined SQL or PL/SQL functions for handling complex formats

The Data Masking Library also includes the following primitives:

  • Fixed Numbers

  • Fixed Strings

  • Random Strings

  • Random Digits

  • Random Numbers

  • Random Dates

Data Masking Transformations

Oracle Data Masking and Subsetting provides sophisticated masking transformations. If masking formats are considered as building blocks of a data masking definition, then masking transformations align these masking formats according to the varied business requirements. Oracle Data Masking and Subsetting provides the following data masking transformations:

  • Conditional Masking: provides the ability to arrange masking formats according to different conditions. For example, consider masking a column containing unique person identifiers. Person identifiers that belong to United States of America are masked using Social Security Number format and the person identifiers that belong to United Kingdom are masked using the National Insurance Number format.

  • Compound Masking (also known as grouping option): masks related columns as a group to ensure that the data being masked across the related columns retain the same relationship. For example, consider masking the address fields such as city, state, and postal codes. The values in these address fields will be consistent even after the masking operation is completed.

  • Deterministic/Consistent Masking: generates consistent outputs for a given input across various databases. This transformation is helpful to maintain data integrity across multiple applications and also to preserve system integrity in a single sign-on environment. For example, consider three applications: a human capital management application, a customer relationship management application, and a sales data warehouse. These applications might have a common field such as EMPLOYEE ID that must be masked consistently. The Substitute and Encrypt masking formats provide deterministic masking transformation.

  • Shuffle: allows the fields within a column to be shuffled in a random manner. This transformation is helpful in breaking the one-to-one mapping between sensitive data elements. For example, columns containing personal health records can be shuffled while masking health care information.

  • Key-based reversible masking (also known as Encrypt Format): encrypts and decrypts the original data using a secure key string, while preserving the format of the input data. This transformation uses the 3DES algorithm, and is helpful when businesses need to mask and send their data to a third-party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third–party, the original data can be recovered using the same key string that was used to encrypt the data.

  • Format Preserving Randomization (also known as Auto Mask Format): randomizes the data, preserving the input length, position of the characters and numbers, case of the character (upper or lower), and special characters in the input.

Data Subsetting Techniques

Oracle Data Masking and Subsetting simplifies subsetting through its easy-to-define goal-based and condition-based subsetting techniques.

  • Goal-based subsetting: Data is subsetted based on goals. A goal can be a relative table size. For example, extracting 1% subset of a table containing 1 million rows.

  • Condition-based subsetting: Data is subsetted based on conditions. The conditions are specified using “SQL where clause”. The “SQL where clause” supports bind variables.

    A condition can be based on time. For example, discarding all user records created prior to a particular year.

    A condition can be based on region. For example, extracting information pertaining to Asia Pacific for a new application development.

Application Templates

Oracle Data Masking Application Templates deliver pre-identified sensitive columns, their relationships, and industry-standard best practice masking techniques out-of-the box for packaged applications such as Oracle E-Business Suite and Oracle Fusion Applications. Use the Self Update feature to get the latest masking and subsetting templates available from Oracle.

Architecture

Oracle Data Masking and Subsetting architecture consists of a two-tier framework. The Cloud Control UI provides a graphical user interface for administrators to handle all management tasks. These client components can be installed locally or brought up with a Web browser. The Oracle Enterprise Manager framework is comprised of Oracle Management Server and a database repository.

The Management Server is the core of the Enterprise Manager framework. It provides administrative user accounts, processes management functions such as jobs and events, and manages the flow of information between the Cloud Control UI and the nodes with Enterprise Management Agents. The Oracle Enterprise Manager Agent communicates with the Oracle Management Server and performs tasks sent by Cloud Control UI and other client applications.

The Enterprise Management Agent is installed on each monitored host and is responsible for monitoring all of the targets running on those hosts, communicating that information to the Oracle Management Server, and managing and maintaining the hosts and its targets.

Figure 1-1 Architecture of the Oracle Data Masking and Subsetting

DMS_architecture.png diagram provides an overview of the architecture used in Oracle Data Masking and Subsetting.

Execution Methods

Oracle Data Masking and Subsetting provides two modes for masking and subsetting data:

  • In-Database

  • In-Export

In-Database

In-Database mode directly masks and subsets the data within a non-production database with minimal or no impact on production environments. Since the In-Database masking and subsetting mode permanently changes the data in a database, it is strongly recommended to use this mode for non-production environments such as staging, test, and development databases rather than using it on production databases.

In-Export

In-Export mode masks and subsets the data in near real-time while extracting the data from a database. The extracted masked and subsetted data is written to data pump export files that are further imported into test, development, or QA databases. In general, In-Export mode is used for production databases.

Heterogeneous

Oracle Data Masking and Subsetting also supports data masking of data from any non-Oracle database, such as IBM DB2, Microsoft SQL Server, and Sybase.

Oracle Data Masking and Subsetting uses the DB links and the gateways to read data from non-Oracle production databases, copy the data to an Oracle-based staging environment, mask in the Oracle staging environment, and then write to the non-Oracle test databases.

It uses the same masking techniques that are used in the Oracle databases for masking the data.

Note:

Oracle Data Masking and Subsetting does not support automatic data discovery for non-Oracle databases, and also does not perform referential integrity for non-Oracle databases.

Methodology

Oracle Data Masking and Subsetting uses the following methodology to secure non-production database and replaces sensitive data with fictitious, but yet relevant data that meets compliance requirements.

  • Creating an Application Data Model— Discover sensitive data and data relationships, and then create or assign an Application Data Model

  • Selecting Masking Formats and Criteria— Create data masking definition and masking format types and templates based on the sensitive data that is discovered.

  • Previewing and Validating — Secure sensitive data by previewing the masking algorithm results and the subset reduction results

  • Executing Masking Transformations— Execute In-Database or In-Export masking and subsetting transformations and validate the data that is masked.

The following figure describes the methodology used in Oracle Data Masking and Subsetting.

Figure 1-2 Methodology

The DMS_methodology.png explains the 4–step methodology that Oracle Data Masking and Subsetting uses.

Workflow

The following diagram explains the Oracle Data Masking and Subsetting workflow.

Figure 1-3 Oracle Data Masking and Subsetting Workflow

The DMS_workflow.png explains the Oracle Data Masking and Subsetting workflow in a flow chart format. The three important steps of the Data Masking and Subsetting workflow are a) creating an Application Data Model b) creating a data masking definition and c) creating a data subsetting definition.

The following steps describe the Oracle Data Masking and Subsetting Workflow:

  1. Create an Application Data Model — To begin using Oracle Data Masking and Subsetting, you must create an Application Data Model (ADM). ADMs capture application metadata, referential relationships, and discover sensitive data from the source database.

  2. Create a Data Masking Definition — After an ADM is created, the next step is to create a data masking definition. A masking definition includes information regarding the table columns and the masking format for each of these columns. The mask can be created by writing the masked data to the export file.

  3. Create a Data Subsetting Definition — Create a data subsetting definition to define the table rules and rule parameters. The subset can be created by writing the subset data to the export file.