6 Frequently Asked Questions

This FAQ provides information on reoccurring customer queries/ solutions and related resources links.

Product Overview

Growing security threats and ever-expanding privacy regulations have made it necessary to limit the exposure of sensitive data. Copying production data for non-production purposes such as development and data analytics proliferates sensitive data, expanding the security and compliance boundary and increasing the likelihood of data breaches. Oracle Data Masking and Subsetting provides a flexible solution that discovers, masks, and subsets sensitive data, allowing organizations to safely share data across their non-production environments.

What is Data Masking and Subsetting?

Data masking (sometimes called static data masking) replaces sensitive data such as credit card numbers with fictitious yet realistic-looking data. Data subsetting is the process of retaining or extracting a selected portion of a data set from a larger database.

Why do I need to mask and subset data?

Copying production data to non-production, outsourced, partner, and cloud environments for test, development, and other purposes proliferates sensitive information such as credit card numbers and social security numbers, increasing the risk of a data breach as non-production environments are generally not as protected or monitored as diligently as production environments. For this reason, data privacy standards such as PCI-DSS recommend rendering sensitive production data unreadable when used for test and development.

Subsetting extracts only the necessary information from a large database for sharing with internal and external teams, reducing the resources required to store and manage that data in test and development. Masking and subsetting sensitive data in non-production environments helps improve security and minimize compliance and infrastructure costs.

How does masking improve security and minimize compliance costs?

Masking sensitive data in test and development environments reduces the overall compliance boundary, restricting it to only production environments. Rendering data unreadable in these environments limits the risk of a data breach and helps minimize compliance costs.

Why are data masking and subsetting important for cloud computing?

Organizations understand the advantage of leveraging a cloud platform for test and development. However, they are concerned about uploading sensitive on-premises production data to the cloud because of data privacy and compliance reasons. Other concerns are the storage cost associated with the cloud platform and the network cost due to data transfers.

Oracle Data Masking and Subsetting addresses these concerns by enabling cloud users to mask sensitive data on-premises before uploading it to the cloud. The product helps reduce storage and network costs by extracting a subset of production data for upload to the cloud.

Is data masking reversible?

Data masking usually means permanently replacing the data and ensuring that no one can retrieve the original data. But, sometimes you might want to see the original data. Reversible masking is helpful when businesses need to mask and send their data to a third party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third party, the original data can be recovered. The Encrypt masking format supports reversible masking, enabling the masked data to be reverted to its original form when necessary.

Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during display time, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted or masked version.

Components and Features

What are the main components of Oracle Data Masking and Subsetting?

The main components are:
  • Application Data Modelingprovides automated procedures to discover sensitive columns and parent-child relationships. The discovery results are stored as an application data model, which is reusable across multiple databases.
  • Masking Format Library provides a comprehensive set of predefined masking formats to mask sensitive data such as credit card numbers, national identifiers, and phone It also allows creating new masking formats to meet domain-specific requirements.
  • Data Masking assists in mapping masking formats to discovered sensitive columns, creating reusable masking scripts. It also provides a workflow to mask data.
  • Data Subsetting helps create reusable goal/condition-based subsetting rules on a database. It also provides a workflow to generate subsets.

How does the product preserve the relational integrity of the data in an application?

Masking parent table that has child relationships also automatically masks the children to preserve referential integrity. For example - if there is a foreign key dependency, on a column that is being masked, we want to make sure that after masking, the foreign key constraint is not violated and appropriately mask the child table as necessary. Oracle Data Masking and Subsetting does the following to minimize the disruption of applications post-masking and subsetting:
  • The product uses automated discovery procedures to gather referential integrity or parent-child relationships between the columns before the masking and subsetting process.
  • During the masking and subsetting process, parent and child columns are processed consistently to preserve the integrity between these columns.
  • When masking a parent table that has child relationships, the tool automatically masks the child tables to preserve referential integrity. For example, if there is a foreign key dependency on a column that is being masked, the process ensures that after masking, the foreign key constraint is not violated by appropriately masking the child table as necessary.

Can masking support multi-byte or international characters?

Several masking options support multi-byte or international characters, such as UTF-8. The suitable masking formats include Array List, Shuffle, Substitute, Table Column, and User Defined Function.

Does the product include predefined masking formats?

Yes, Oracle Data Masking and Subsetting provides out-of-the-box masking formats covering a broad range of sensitive data, such as national identifiers of multiple countries, credit card numbers of various vendors, phone numbers, and more.

Which masking techniques are supported by the product?

Some options include generating fixed/random characters or numbers, replacing them with null values, substituting data from a random list or table column, and SQL or regular expression-based masking. You also have several advanced options to meet complex business requirements, such as:
  • Shuffle Masking randomly shuffles data within a table/ view. For example, columns containing salaries can be shuffled to break the employee-salary mapping.
  • Encryption encrypts sensitive data using a cryptographic key while preserving the data's format. It's a reversible masking option, and you can decrypt your data using the same key. This feature is useful when masked data sent to a third party has to be merged with further updates.
  • Conditional Masking masks column data using different masking formats based on user-defined conditions. For example, in a column, the US identifiers can be masked using the Social Security Number format and the UK identifiers using the National Insurance Number format.
  • Compound Masking masks related columns as a group, ensuring the masked data across the related columns retain the same For example, address fields such as city, state,
  • Deterministic Masking generates consistent masked output for a given input across application schemas and databases.
  • User-defined PL/SQL Masking enables you to define custom masking logic or migrate your existing masking scripts.

Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during display time, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted or masked version.

What assurances does the product provide for the integrity of the masked and subsetted data?

Each masking format has built-in logic to validate the generated masked output. For example, all credit card number masking formats perform the Luhn check on the masked credit card numbers. Also, the product provides a preview option to validate the defined masking and subsetting criteria before running masking and subsetting scripts.

Can I migrate in-house masking scripts to Oracle Data Masking and Subsetting?

Yes, the product supports the migration of existing masking scripts for generating custom data types with user-defined PL/SQL masking functions.

Which subsetting techniques are supported by the product?

Oracle Data Masking and Subsetting simplifies the task of subsetting through its goal or condition-based subsetting techniques. A goal can be a relative table size, such as extracting a 1% subset of a table containing 10 billion rows. Condition-based subsetting is useful for creating a subset using the data itself. For example, you can use a time-based condition, such as discarding all user records created before a particular year. Another example is a region-based condition, where you might extract only Asia Pacific information to support new application development.

Does Oracle Data Masking and Subsetting work with packaged applications like Oracle E-Business Suite and Oracle Fusion Applications?

As Oracle Data Masking and Subsetting is a database-centric solution, it works for all supported databases regardless of the application. However, care is required when setting up data models and masking and subsetting definitions to avoid misconfigurations that could break complex applications. Oracle E-Business Suite and Oracle Fusion Applications provide prepackaged application data models and masking definitions for use with Oracle Data Masking and Subsetting.

Can I mask and subset databases running in Oracle Cloud?

Yes, you can mask and subset databases in the Oracle Cloud. Oracle Data Masking and Subsetting for cloud databases works much like on-premises databases. Oracle Data Safe, an OCI-native cloud service that supports sensitive data discovery and masking, is also available for Oracle Cloud users.

What all masking formats support deterministic masking?

Currently, Substitute and Encrypt masking formats provide deterministic masking transformation.

Encrypt Masking Format: This transformation encrypts and decrypts the original data using a secure key string. The input data format is preserved during encryption and decryption. This transformation uses powerful industry-standard 3DES algorithm. This transformation is helpful when businesses need to mask and send their data to a third-party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third-party, the original data can be recovered using the same key string that was used to encrypt the data.

Substitute Masking Format: The Substitute format uses a table of values from which to substitute the original value with a mask value. As long as this table of values does not change, the mask is deterministic or consistent across the three databases.

How does reversibility work with encrypt masking format?

Reversibility can be performed using Encrypt Masking format which is Key Based Reversible Masking. It uses 3DES algorithm where a seed value is provided to encrypt and decrypt the data.

If a customer uses Oracle Data Masking and Subsetting on Enterprise Manager to generate the masking script, can thecustomer run these scripts directly on the DB Server for masking?

Yes, customer run these scripts directly on the DB Server for masking provided that everything is the same between the two environments and as long as the same seed is given.

If a customer masks data on one server (which is on-premises) and then the customer moves this data to another Oracle Database (e.g., on OCI cloud or any other third-party cloud), can Oracle Data Masking and Subsetting perform the decryption on this new server?

Yes, the customer can mask and subset databases in Oracle Database Cloud Service (DBCS). Oracle Data Masking and Subsetting for cloud databases works much like it does for on-premises databases. Oracle Data Masking and Subsetting license is included in DBCS High Performance, Extreme Performance, and Exadata Service.

Deployment and Administration

How do I download and install Oracle Data Masking and Subsetting?

Oracle Data Masking and Subsetting is pre-installed with Oracle Enterprise Manager. To use Oracle Data Masking and Subsetting, you must have a valid license for the pack.

What are the different ways to mask and subset data?

The product provides two modes to mask and subset data:
  • In-Database Masking and Subsetting: The target data is first copied (cloned) to a separate location. Oracle Data Masking and Subsetting operates on the cloned data. After processing is complete, the resulting masked data can be cloned and distributed for non-production.

    Note:

    In-Database masking and subsetting directly operates on the underlying data, on the original copy of the data and is not recommended to be performed on production databases.
  • In-Export Masking and Subsetting: The masking and subsetting rules are applied while the data is extracted from the target database, and the resulting data is written to Oracle Data Pump dump files. In this mode of operation, Oracle Data Masking and Subsetting can run directly on the production system and unmasked data does not leave After processing is complete, the dump file containing masked data can be imported into non-production databases.

How does Oracle Data Masking and Subsetting optimize performance?

Oracle Data Masking and Subsetting implements several database optimizations to deliver high-speed masking and subsetting during in-database operations. It leverages Oracle Data Pump for in-export operations to achieve high performance during masking and subsetting.

How does Oracle Data Masking and Subsetting compare to other similar Oracle offerings?

Can Data Masking and Subsetting run without Enterprise Manager?

All Data Masking and Subsetting objects are centrally located in the Oracle Enterprise Manager repository, which facilitates centralized creation and administration of Application Data Models, Data Masking and Subsetting rules or definitions. So, we recommend using it through Enterprise Manager.

As Oracle Data Masking and Subsetting is a database centric solution, it works for all supported databases regardless of the application. However, care must be taken when setting up data models, masking and subsetting definitions to avoid misconfigurations that could break complex applications. Certain applications provide pre-packaged data models and masking definitions to avoid the possibility of application breakage and to reduce the customer effort required. Oracle E-Business Suite and Oracle Fusion Applications provide prepackaged masking definitions for use with Oracle Data Masking and Subsetting.

Reoccurring Customer Queries

Reoccurring Issues Alternate Questions Response
Masking Reversibility

Can masked data be restored to its original state?

Is it possible to reverse data masking to retrieve the original data?

Does data masking allow for the original data to be recovered?

Can masked data be decrypted back to its original form?

Is there a way to revert masked data to its initial state?

Data masking usually means permanently replacing the data to ensure that no one can retrieve the original data. We recommend that masking should only be performed directly on non-production databases or in-export to ensure critical data isn't lost. If the original data is required even in a non-production environment, a backup should be taken before masking.

However, sometimes you might need to view the original data. The Encrypt masking format offered by DMS is the only format that supports masking reversibility, allowing the masked data to be reverted to its original form when necessary.

Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during display time, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted or masked version.

Privileges

What all privileges are required to use Data Masking and Subsetting?

What are minimum privileges required to use DMS features?

The following privileges are required for a database user to perform data masking operation in a normal database where Database Vault is not enabled:

EM Privileges

  • Other Resource Privileges:
    • Application Data Model —> Use Application Data Models in EM (DB_ADM_ADMIN)
    • Data Masking Definition —> Use Data Masking Definitions in EM (DB_MASK_ADMIN)
    • Job System —> Create (CREATE_JOB)
    • Named Credential —> Create new Named Credential (CREATE_CREDENTIAL)
  • Target Privileges:
    • Execute Command Anywhere (PERFORM_OPERATION_ANYWHERE)
    • View any Target AND Connect to any viewable target (VIEW_ANY_TARGET AND CONNECT_ANY_VIEW_TARGET) OR grant "Connect" privileges under Target Instance Privileges to specific instances

DB Privileges

  • CREATE SESSION
  • SELECT_CATALOG_ROLE
  • CREATE TABLE
  • CREATE PROCEDURE
  • CREATE SEQUENCE
  • CREATE TYPE
  • CREATE TABLESPACE
  • CREATE ANY DIRECTORY
  • DROP ANY DIRECRANT UNLIMITED TABLESPACE
  • DATAPUMP_EXP_FULL_DATABASE
  • DATAPUMP_IMP_FULL_DATABASE
  • EXECUTE ANY PROCEDURE
  • ALTER SYSTEM
  • EXECUTE ON SYS.DBMS_CRYPTO
  • EXECUTE ON SYS.DBMS_RANDOM
  • EXECUTE ON SYS.UTL_RECOMP
  • ANALYZE ANY
  • ADMINISTER ANY SQL TUNING SET
  • EXECUTE ON DBMS_AQADM

Ensure the database user performing the masking has object/system privileges to create, drop, alter, select, insert and compile the objects being masked (in case the objects are owned by another schema)

For Basic set privileges to perform Data masking operation in a Database Vault enabled environment, please refer the following Oracle documentation URL:

https://docs.oracle.com/cd/E11882_01/server.112/e23090/dba.htm#DVADM71124

A) Application Data Model (ADM):

1) Need to grant authorization in "Oracle Enterprise Manager" realm to the <user> for ADM package deployment.

Ex: If you want to grant this privilege to "SCOTT" user then Login as user with dv_owner role and execute below:

SQL>exec dbms_macadm.add_auth_to_realm('Oracle Enterprise Manager','SCOTT', NULL, dbms_macutl.g_realm_auth_owner);

B) In-Place : Masking and Subsetting:

1) Grant Execute on UTL_FILE to <user>;

2) Authorize <user> in realm protecting object: (we need it even if object owner is performing masking/subsetting)

Ex: SQL>exec dbms_macadm.add_auth_to_realm('<Realm Name>','<user>', NULL, dbms_macutl.g_realm_auth_owner);

C) Export : Masking and Subsetting:

1) Grant Execute on UTL_FILE to <user>;

2) Realm Authorization:

2.1) Object owner:

If we want to perform export masking or subsetting on TEST.EMPLOYEE and we execute the job as test,then we don't need to authorize TEST in any realm protecting TEST.EMPLOYEE.Without granting authorization, TEST can export its own tables.

2.2) Admin User :

i) Grant authorization to Admin user in realm protecting test.employee.

ii) Authorize Admin user to export tables under TEST schema

SQL>exec dbms_macadm.authorize_datapump_user('<Admin User>','TEST');

Masking Performance
  1. Are there ways to optimize data masking performance?/ How can I make the data masking process more efficient?
  2. What are some of the ways to calculate/ analyze the masking performance?

1.Improving Masking Performance in Oracle Data Masking and Subsetting

Based on the best practices and known issues from Oracle Data Masking and Subsetting Performance Tuning, here are key steps to enhance masking performance:

SQL Query Modification
  • Action: Review and remove unnecessary "ORDER BY" clauses from SQL queries.
  • Outcome: Reduced query execution time and improved overall masking performance.
Updating Statistics
  • Action: Regularly update database statistics using DBMS_STATS.GATHER_TABLE_STATS
  • Outcome: Improved query execution plans and overall masking efficiency.
Indexing
  • Action: Create and rebuild indexes on frequently queried columns.
  • Outcome: Faster data retrieval and improved masking performance.

2. Analyzing Masking Performance in Oracle Data Masking and Subsetting

Performance Calculation:

The total time for masking can be calculated as:

"Time taken for cloning tables to be masked"+"Time taken for CTAS of each table"+"Time taken to disable and enable existing indexes and constraints"+"Time taken to apply masking logic (X)"

The variable 'X' depends on the masking format. For example, fixed string masking is faster, while encryption takes longer.

Hardware Impact:

Masking performance is directly proportional to the underlying hardware capabilities.

Parallelism:

Adjust the PARALLEL_MAX_SERVERS parameter based on the formula:

\text{PARALLEL_MAX_SERVERS} = \text{CPU_COUNT} \times \text{PARALLEL_THREADS_PER_CPU} \times (2 \text{ if PGA_AGGREGATE_TARGET > 0 else 1}) \times 5

Performance Testing:

Use masking definitions to analyze the performance impact on a test system. This analysis can provide insights into the time taken to mask data and any changes in the execution plan.

Masking Methods:

At-Export Masking: Involves additional time for exporting the table to a dump file.

In-Place Masking: Involves additional cloning time from the source to the staging environment.

The total time for At- Export masking is the sum of the export time and the time taken to apply the masking logic.

New DMS Customer
  1. What resources are available for Data Masking and Subsetting (DMS)?
  2. Do we need a separate license for Data Masking and Subsetting (DMS) on Enterprise Manager (EM)?
  3. Is there a scheduling mechanism to schedule jobs on a weekly/ monthly basis instead of running them manually?
  4. How can the new rows getting inserted into columns be automatically masked?
  5. Currently, masked values are being updated for all rows in a column. How can we configure it to mask only when the column is not null?
  1. Resources: Documentation, LiveLabs
  2. In order to use the DMS feature, the Data Masking and Subsetting License Pack must be purchased
  3. No; currently all jobs must be scheduled manually but multiple jobs can be manually scheduled in the future to achieve the desired weekly/monthly cadence
  4. Currently, DMS does not support masking at the time of insertion
  5. Add a new masking rule with the condition "column_name is not null" and specify the desired masking format under this new masking rule. In total, there would be 2 rules, first rule with the "column_name is not null" with the desired masking format and there would be the second default condition rule which could just be "Preserve Original Data"
Masking Formats What are some of the examples for Personal Information (PI) Columns and their recommended Masking formats?

Here is the list of available Custom Masking Formats:

Array List

Delete

Encrypt

Fixed Number

Fixed String

Null Value

Preserve Original Data

Random Decimal Numbers

Random Digits

Random Dates

Random Strings

Random Numbers

SQL Expression

Regular Expression

Post-Processing Function

Shuffle

Subsitute

Substring

Table Column

Truncate

User Defined Function

Regular Expression Basics

What are some of the examples of Regex?

How can we implement Regex for a sensitive column?

Refer to tables 1.1 and 1.2

Table 6-1 Basic Building Blocks of Regex

Basic Characters Examples Definition
. (dot)

Pattern: b.t

Matches: "bat", "bet", "bit", "bot", "but", etc.

Matches any single character except newline (\n).
^

Pattern: ^start

Matches: "start" at the beginning of a line.

Matches the start of the string or line.
$

Pattern: end$

Matches: "end" at the end of a line.

Matches the end of the string or line.
|

Pattern: cat|dog

Matches: "cat" or "dog"

Alternation, matches either the expression before or after the |.
[]

Pattern: [aeiou]

Matches: Any single vowel character ("a", "e", "i", "o", "u")

Character class, matches any one of the characters inside the brackets.
()

Pattern: (ab)

Matches: "ab"

Grouping, groups multiple tokens together and captures matched text.
\

Pattern: \d Matches: The pattern \d matches any digit (0-9)

Escape character, used to escape a special character or indicate a special sequence.

Table 6-2 Regular Expression Example

Regular Expression Explanation Example
[A-Za-z]{3} Matches exactly three alphabetic characters (uppercase or lowercase). Fgy
[0-9]{3} Matches exactly three digits. 674
[A-Za-z]{3}+[0-9]{3} Matches exactly three alphabetic characters (uppercase or lowercase) and exactly three digits. Fgy674
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} This pattern matches a typical email address format. It includes character classes ([a-zA-Z0-9._%+-]), the + quantifier for one or more occurrences, the @ literal character, and the . (dot) for domain separation. f@8.co
(?:https?://)?(?:www\.)?[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

(?:https?://)?: Non-capturing group that matches "http://" or "https://", optionally. optionally.\.

(?:www\.)?: Non-capturing group that matches "www."

[a-zA-Z0-9.-]+: Matches one or more alphanumeric characters, dots, or hyphens (for the domain name).

[a-zA-Z]{2,}: Matches a dot followed by at least two alphabetic characters (for the top-level domain).

https://www.example.com
#([a-fA-F0-9]{6}|[a-fA-F0-9]{3}) This pattern matches a hexadecimal color code starting with #, followed by either six ({6}) or three ({3}) characters from the set [a-fA-F0-9]. It uses () for grouping and {} for specifying repetitions. #FfF011
\d{4}-\d{2}-\d{2} This pattern matches a date in the format YYYY-MM-DD. It uses \d for digits and {} for specifying exact repetitions. 7-16-2024

Masking Format Examples for Common Sensitive Types

Column Name Masking Format Masking Format Entries Original Data Masked Data
CustomerID Fixed StringRandom Strings Fixed String: FRandom Strings: Start Length: 7 End Length: 7Random Digits: Start Length 3 End Length 3 Fjg860 Faq623
Age Shuffle Shuffle 26 34
FirstName Random Strings Random Strings: Start Length: 1 End Length: 6 Kate aaabgd
LastName Substring Substring: Start Position 1 Length: 3 Wilson Wil
DateOfBirth Random Dates Random Dates: Start Date 07/18/50 End Date 07/18/20 7/3/1998 8/5/1963
SocialSecurityNumber (SSN) Random NumberPost-Processing Function Random Number: Start Integer 20000000 End Integer 738999999Post-Processing Function: DM_FMTLIB. MGMT_DM_GEN_SSN_FH 049-66-6786 136-76-4899
EmailAddress SQL Expression SQL Expression: %FirstName% || '.' || %LastName% || '@company.com ' ktewilson@gmail.com Kate.Wilson@company.com
PhoneNumber Random DigitsRandom NumberPost-Processing Function Random Digits: Start Length: 7 End Length: 7Random Number: Start Integer 1 End Integer 279?Post-Processing Function: DM_FMTLIB. MGMT_DM_GEN_PH_USA_FH 6033550232 619-989-9213
AddressLine1 Group Masking Shuffle Columns 303 Madison 26 West
AddressLine2 San Street Wheelock St
City Redwood City Hanover
PostalCode 94002 3755
Country United States United States
Gender Array List Array List: Female, Male Female Male
MaritalStatus Null Value Null Value Single  
Nationality Truncate Truncate American  
PassportNumber Substitute Regular Expression: [A-Z]{1}[0-9]{7} P2378283 V4697997
DriverLicenseNumber Encrypt Regular Expression: [D][0-9]{7} D7482464 D8208541
CreditCardNumber Random DigitsPost-Processing Function Random Digits: Start Length: 10 End Length: 10Post-Processing Function: DM_FMTLIB. MGMT_DM_GEN_VC 4111111111111110 4556771680150140
BankAccountNumber Substring Substring: Start Position 3 Length: 3 123456789 789
AccountHolderName Table Column Schema: HR Table: New_table Column: LastName Kate Wilson Wilson
TaxIdentificationNumber (TIN) Fixed StringRandom Digits Fixed String: TINRandom Digits: Start Length: 1 End Length: 6 TIN123456 TIN232163
EmploymentStatus Preserve Original Data Employed Employed
MedicalRecordNumber Fixed String MRNxxxxxx MRN123456 MRNxxxxxx
UserName SQL Expression SQL Expression: %FirstName% || '_' || %LastName% katewilson1980 Kate_Wilson
Password delete   xYdkI34b  

Where can I find more information on Oracle Data Masking and Subsetting?

For more information, such as product data sheets, tutorials, documentation, customer references, and blogs, please visit the Oracle Data Masking and Subsetting page on Oracle.com.

https://www.oracle.com/security/database-security/data-masking/

Explore the Data Masking and Subsetting workshop on Oracle LiveLabs to try the product at no cost.

https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=704