6 Frequently Asked Questions
This FAQ provides information on reoccurring customer queries/ solutions and related resources links.
Product Overview
Growing security threats and ever-expanding privacy regulations have made it necessary to limit the exposure of sensitive data. Copying production data for non-production purposes such as development and data analytics proliferates sensitive data, expanding the security and compliance boundary and increasing the likelihood of data breaches. Oracle Data Masking and Subsetting provides a flexible solution that discovers, masks, and subsets sensitive data, allowing organizations to safely share data across their non-production environments.
What is Data Masking and Subsetting?
Data masking (sometimes called static data masking) replaces sensitive data such as credit card numbers with fictitious yet realistic-looking data. Data subsetting is the process of retaining or extracting a selected portion of a data set from a larger database.
Why do I need to mask and subset data?
Copying production data to non-production, outsourced, partner, and cloud environments for test, development, and other purposes proliferates sensitive information such as credit card numbers and social security numbers, increasing the risk of a data breach as non-production environments are generally not as protected or monitored as diligently as production environments. For this reason, data privacy standards such as PCI-DSS recommend rendering sensitive production data unreadable when used for test and development.
Subsetting extracts only the necessary information from a large database for sharing with internal and external teams, reducing the resources required to store and manage that data in test and development. Masking and subsetting sensitive data in non-production environments helps improve security and minimize compliance and infrastructure costs.
How does masking improve security and minimize compliance costs?
Masking sensitive data in test and development environments reduces the overall compliance boundary, restricting it to only production environments. Rendering data unreadable in these environments limits the risk of a data breach and helps minimize compliance costs.
Why are data masking and subsetting important for cloud computing?
Organizations understand the advantage of leveraging a cloud platform for test and development. However, they are concerned about uploading sensitive on-premises production data to the cloud because of data privacy and compliance reasons. Other concerns are the storage cost associated with the cloud platform and the network cost due to data transfers.
Oracle Data Masking and Subsetting addresses these concerns by enabling cloud users to mask sensitive data on-premises before uploading it to the cloud. The product helps reduce storage and network costs by extracting a subset of production data for upload to the cloud.
Is data masking reversible?
Data masking usually means permanently replacing the data and ensuring that no one can retrieve the original data. But, sometimes you might want to see the original data. Reversible masking is helpful when businesses need to mask and send their data to a third party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third party, the original data can be recovered. The Encrypt masking format supports reversible masking, enabling the masked data to be reverted to its original form when necessary.
Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during display time, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted or masked version.
Components and Features
What are the main components of Oracle Data Masking and Subsetting?
- Application Data Modelingprovides automated procedures to discover sensitive columns and parent-child relationships. The discovery results are stored as an application data model, which is reusable across multiple databases.
- Masking Format Library provides a comprehensive set of predefined masking formats to mask sensitive data such as credit card numbers, national identifiers, and phone It also allows creating new masking formats to meet domain-specific requirements.
- Data Masking assists in mapping masking formats to discovered sensitive columns, creating reusable masking scripts. It also provides a workflow to mask data.
- Data Subsetting helps create reusable goal/condition-based subsetting rules on a database. It also provides a workflow to generate subsets.
How does the product preserve the relational integrity of the data in an application?
- The product uses automated discovery procedures to gather referential integrity or parent-child relationships between the columns before the masking and subsetting process.
- During the masking and subsetting process, parent and child columns are processed consistently to preserve the integrity between these columns.
- When masking a parent table that has child relationships, the tool automatically masks the child tables to preserve referential integrity. For example, if there is a foreign key dependency on a column that is being masked, the process ensures that after masking, the foreign key constraint is not violated by appropriately masking the child table as necessary.
Can masking support multi-byte or international characters?
Several masking options support multi-byte or international characters, such as UTF-8. The suitable masking formats include Array List, Shuffle, Substitute, Table Column, and User Defined Function.
Does the product include predefined masking formats?
Yes, Oracle Data Masking and Subsetting provides out-of-the-box masking formats covering a broad range of sensitive data, such as national identifiers of multiple countries, credit card numbers of various vendors, phone numbers, and more.
Which masking techniques are supported by the product?
- Shuffle Masking randomly shuffles data within a table/ view. For example, columns containing salaries can be shuffled to break the employee-salary mapping.
- Encryption encrypts sensitive data using a cryptographic key while preserving the data's format. It's a reversible masking option, and you can decrypt your data using the same key. This feature is useful when masked data sent to a third party has to be merged with further updates.
- Conditional Masking masks column data using different masking formats based on user-defined conditions. For example, in a column, the US identifiers can be masked using the Social Security Number format and the UK identifiers using the National Insurance Number format.
- Compound Masking masks related columns as a group, ensuring the masked data across the related columns retain the same For example, address fields such as city, state,
- Deterministic Masking generates consistent masked output for a given input across application schemas and databases.
- User-defined PL/SQL Masking enables you to define custom masking logic or migrate your existing masking scripts.
Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during display time, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted or masked version.
What assurances does the product provide for the integrity of the masked and subsetted data?
Each masking format has built-in logic to validate the generated masked output. For example, all credit card number masking formats perform the Luhn check on the masked credit card numbers. Also, the product provides a preview option to validate the defined masking and subsetting criteria before running masking and subsetting scripts.
Can I migrate in-house masking scripts to Oracle Data Masking and Subsetting?
Yes, the product supports the migration of existing masking scripts for generating custom data types with user-defined PL/SQL masking functions.
Which subsetting techniques are supported by the product?
Oracle Data Masking and Subsetting simplifies the task of subsetting through its goal or condition-based subsetting techniques. A goal can be a relative table size, such as extracting a 1% subset of a table containing 10 billion rows. Condition-based subsetting is useful for creating a subset using the data itself. For example, you can use a time-based condition, such as discarding all user records created before a particular year. Another example is a region-based condition, where you might extract only Asia Pacific information to support new application development.
Does Oracle Data Masking and Subsetting work with packaged applications like Oracle E-Business Suite and Oracle Fusion Applications?
As Oracle Data Masking and Subsetting is a database-centric solution, it works for all supported databases regardless of the application. However, care is required when setting up data models and masking and subsetting definitions to avoid misconfigurations that could break complex applications. Oracle E-Business Suite and Oracle Fusion Applications provide prepackaged application data models and masking definitions for use with Oracle Data Masking and Subsetting.
Can I mask and subset databases running in Oracle Cloud?
Yes, you can mask and subset databases in the Oracle Cloud. Oracle Data Masking and Subsetting for cloud databases works much like on-premises databases. Oracle Data Safe, an OCI-native cloud service that supports sensitive data discovery and masking, is also available for Oracle Cloud users.
What all masking formats support deterministic masking?
Currently, Substitute and Encrypt masking formats provide deterministic masking transformation.
Encrypt Masking Format: This transformation encrypts and decrypts the original data using a secure key string. The input data format is preserved during encryption and decryption. This transformation uses powerful industry-standard 3DES algorithm. This transformation is helpful when businesses need to mask and send their data to a third-party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third-party, the original data can be recovered using the same key string that was used to encrypt the data.
Substitute Masking Format: The Substitute format uses a table of values from which to substitute the original value with a mask value. As long as this table of values does not change, the mask is deterministic or consistent across the three databases.
How does reversibility work with encrypt masking format?
Reversibility can be performed using Encrypt Masking format which is Key Based Reversible Masking. It uses 3DES algorithm where a seed value is provided to encrypt and decrypt the data.
If a customer uses Oracle Data Masking and Subsetting on Enterprise Manager to generate the masking script, can thecustomer run these scripts directly on the DB Server for masking?
Yes, customer run these scripts directly on the DB Server for masking provided that everything is the same between the two environments and as long as the same seed is given.
If a customer masks data on one server (which is on-premises) and then the customer moves this data to another Oracle Database (e.g., on OCI cloud or any other third-party cloud), can Oracle Data Masking and Subsetting perform the decryption on this new server?
Yes, the customer can mask and subset databases in Oracle Database Cloud Service (DBCS). Oracle Data Masking and Subsetting for cloud databases works much like it does for on-premises databases. Oracle Data Masking and Subsetting license is included in DBCS High Performance, Extreme Performance, and Exadata Service.
Deployment and Administration
How do I download and install Oracle Data Masking and Subsetting?
Oracle Data Masking and Subsetting is pre-installed with Oracle Enterprise Manager. To use Oracle Data Masking and Subsetting, you must have a valid license for the pack.
What are the different ways to mask and subset data?
- In-Database Masking and Subsetting: The target data is first copied
(cloned) to a separate location. Oracle Data Masking and Subsetting operates
on the cloned data. After processing is complete, the resulting masked data
can be cloned and distributed for non-production.
Note:
In-Database masking and subsetting directly operates on the underlying data, on the original copy of the data and is not recommended to be performed on production databases. - In-Export Masking and Subsetting: The masking and subsetting rules are applied while the data is extracted from the target database, and the resulting data is written to Oracle Data Pump dump files. In this mode of operation, Oracle Data Masking and Subsetting can run directly on the production system and unmasked data does not leave After processing is complete, the dump file containing masked data can be imported into non-production databases.
How does Oracle Data Masking and Subsetting optimize performance?
Oracle Data Masking and Subsetting implements several database optimizations to deliver high-speed masking and subsetting during in-database operations. It leverages Oracle Data Pump for in-export operations to achieve high performance during masking and subsetting.
How does Oracle Data Masking and Subsetting compare to other similar Oracle offerings?
Oracle Solutions |
Description |
Oracle Data Masking and Subsetting |
Software for creating masked and subsetted copies of production data for use in non-production environments such as testing and development databases. |
Oracle Data Safe (Data Discovery/ Data Masking) |
Discover and mask sensitive data with a cloud service that supports Oracle Databases everywhere: in the Oracle Cloud, on-premises, and third-party clouds. |
Oracle Data Redaction |
Redacts sensitive data from query results before display through client applications. Enforces redaction at runtime, with low overhead, and according to conditions set in policies. |
Oracle Label Security |
Implements Multi-Level Security (MLS), enabling rows with differing sensitivity to reside in the same table. Explicitly labels rows with group, compartment, and sensitivity levels then matches them with user labels. |
Can Data Masking and Subsetting run without Enterprise Manager?
All Data Masking and Subsetting objects are centrally located in the Oracle Enterprise Manager repository, which facilitates centralized creation and administration of Application Data Models, Data Masking and Subsetting rules or definitions. So, we recommend using it through Enterprise Manager.
As Oracle Data Masking and Subsetting is a database centric solution, it works for all supported databases regardless of the application. However, care must be taken when setting up data models, masking and subsetting definitions to avoid misconfigurations that could break complex applications. Certain applications provide pre-packaged data models and masking definitions to avoid the possibility of application breakage and to reduce the customer effort required. Oracle E-Business Suite and Oracle Fusion Applications provide prepackaged masking definitions for use with Oracle Data Masking and Subsetting.
Reoccurring Customer Queries
Reoccurring Issues | Alternate Questions | Response |
---|---|---|
Masking Reversibility |
Can masked data be restored to its original state? Is it possible to reverse data masking to retrieve the original data? Does data masking allow for the original data to be recovered? Can masked data be decrypted back to its original form? Is there a way to revert masked data to its initial state? |
Data masking usually means permanently replacing the data to ensure that no one can retrieve the original data. We recommend that masking should only be performed directly on non-production databases or in-export to ensure critical data isn't lost. If the original data is required even in a non-production environment, a backup should be taken before masking. However, sometimes you might need to view the original data. The Encrypt masking format offered by DMS is the only format that supports masking reversibility, allowing the masked data to be reverted to its original form when necessary. Oracle also offers a product called Data Redaction. Unlike data masking, Data Redaction does not alter the actual data. Instead, it modifies the output during display time, keeping the original data intact. This feature allows you to control who can see the original data and who sees the redacted or masked version. |
Privileges |
What all privileges are required to use Data Masking and Subsetting? What are minimum privileges required to use DMS features? |
The following privileges are required for a database user to perform data masking operation in a normal database where Database Vault is not enabled: EM Privileges
DB Privileges
Ensure the database user performing the masking has object/system privileges to create, drop, alter, select, insert and compile the objects being masked (in case the objects are owned by another schema) For Basic set privileges to perform Data masking operation in a Database Vault enabled environment, please refer the following Oracle documentation URL: https://docs.oracle.com/cd/E11882_01/server.112/e23090/dba.htm#DVADM71124 A) Application Data Model (ADM): 1) Need to grant authorization in "Oracle Enterprise Manager" realm to the <user> for ADM package deployment. Ex: If you want to grant this privilege to "SCOTT" user then Login as user with dv_owner role and execute below: SQL>exec dbms_macadm.add_auth_to_realm('Oracle Enterprise Manager','SCOTT', NULL, dbms_macutl.g_realm_auth_owner); B) In-Place : Masking and Subsetting: 1) Grant Execute on UTL_FILE to <user>; 2) Authorize <user> in realm protecting object: (we need it even if object owner is performing masking/subsetting) Ex: SQL>exec dbms_macadm.add_auth_to_realm('<Realm Name>','<user>', NULL, dbms_macutl.g_realm_auth_owner); C) Export : Masking and Subsetting: 1) Grant Execute on UTL_FILE to <user>; 2) Realm Authorization: 2.1) Object owner: If we want to perform export masking or subsetting on TEST.EMPLOYEE and we execute the job as test,then we don't need to authorize TEST in any realm protecting TEST.EMPLOYEE.Without granting authorization, TEST can export its own tables. 2.2) Admin User : i) Grant authorization to Admin user in realm protecting test.employee. ii) Authorize Admin user to export tables under TEST schema SQL>exec dbms_macadm.authorize_datapump_user('<Admin User>','TEST'); |
Masking Performance |
|
1.Improving Masking Performance in Oracle Data Masking and Subsetting Based on the best practices and known issues from Oracle Data Masking and Subsetting Performance Tuning, here are key steps to enhance masking performance: SQL Query Modification
2. Analyzing Masking Performance in Oracle Data Masking and Subsetting Performance Calculation: The total time for masking can be calculated as: "Time taken for cloning tables to be masked"+"Time taken for CTAS of each table"+"Time taken to disable and enable existing indexes and constraints"+"Time taken to apply masking logic (X)" The variable 'X' depends on the masking format. For example, fixed string masking is faster, while encryption takes longer. Hardware Impact: Masking performance is directly proportional to the underlying hardware capabilities. Parallelism: Adjust the PARALLEL_MAX_SERVERS parameter based on the formula: \text{PARALLEL_MAX_SERVERS} = \text{CPU_COUNT} \times \text{PARALLEL_THREADS_PER_CPU} \times (2 \text{ if PGA_AGGREGATE_TARGET > 0 else 1}) \times 5 Performance Testing: Use masking definitions to analyze the performance impact on a test system. This analysis can provide insights into the time taken to mask data and any changes in the execution plan. Masking Methods: At-Export Masking: Involves additional time for exporting the table to a dump file. In-Place Masking: Involves additional cloning time from the source to the staging environment. The total time for At- Export masking is the sum of the export time and the time taken to apply the masking logic. |
New DMS Customer |
|
|
Masking Formats | What are some of the examples for Personal Information (PI) Columns and their recommended Masking formats? |
Here is the list of available Custom Masking Formats: Array List Delete Encrypt Fixed Number Fixed String Null Value Preserve Original Data Random Decimal Numbers Random Digits Random Dates Random Strings Random Numbers SQL Expression Regular Expression Post-Processing Function Shuffle Subsitute Substring Table Column Truncate User Defined Function |
Regular Expression Basics |
What are some of the examples of Regex? How can we implement Regex for a sensitive column? |
Refer to tables 1.1 and 1.2 |
Table 6-1 Basic Building Blocks of Regex
Basic Characters | Examples | Definition |
---|---|---|
. (dot) |
Pattern: b.t Matches: "bat", "bet", "bit", "bot", "but", etc. |
Matches any single character except newline (\n). |
^ |
Pattern: ^start Matches: "start" at the beginning of a line. |
Matches the start of the string or line. |
$ |
Pattern: end$ Matches: "end" at the end of a line. |
Matches the end of the string or line. |
| |
Pattern: cat|dog Matches: "cat" or "dog" |
Alternation, matches either the expression before or after the |. |
[] |
Pattern: [aeiou] Matches: Any single vowel character ("a", "e", "i", "o", "u") |
Character class, matches any one of the characters inside the brackets. |
() |
Pattern: (ab) Matches: "ab" |
Grouping, groups multiple tokens together and captures matched text. |
\ |
Pattern: \d Matches: The pattern \d matches any digit (0-9) |
Escape character, used to escape a special character or indicate a special sequence. |
Table 6-2 Regular Expression Example
Regular Expression | Explanation | Example |
---|---|---|
[A-Za-z]{3} | Matches exactly three alphabetic characters (uppercase or lowercase). | Fgy |
[0-9]{3} | Matches exactly three digits. | 674 |
[A-Za-z]{3}+[0-9]{3} | Matches exactly three alphabetic characters (uppercase or lowercase) and exactly three digits. | Fgy674 |
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} | This pattern matches a typical email address format. It includes character classes ([a-zA-Z0-9._%+-]), the + quantifier for one or more occurrences, the @ literal character, and the . (dot) for domain separation. | f@8.co |
(?:https?://)?(?:www\.)?[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} |
(?:https?://)?: Non-capturing group that matches "http://" or "https://", optionally. optionally.\. (?:www\.)?: Non-capturing group that matches "www." [a-zA-Z0-9.-]+: Matches one or more alphanumeric characters, dots, or hyphens (for the domain name). [a-zA-Z]{2,}: Matches a dot followed by at least two alphabetic characters (for the top-level domain). |
https://www.example.com |
#([a-fA-F0-9]{6}|[a-fA-F0-9]{3}) | This pattern matches a hexadecimal color code starting with #, followed by either six ({6}) or three ({3}) characters from the set [a-fA-F0-9]. It uses () for grouping and {} for specifying repetitions. | #FfF011 |
\d{4}-\d{2}-\d{2} | This pattern matches a date in the format YYYY-MM-DD. It uses \d for digits and {} for specifying exact repetitions. | 7-16-2024 |
Masking Format Examples for Common Sensitive Types
Column Name | Masking Format | Masking Format Entries | Original Data | Masked Data |
CustomerID | Fixed StringRandom Strings | Fixed String: FRandom Strings: Start Length: 7 End Length: 7Random Digits: Start Length 3 End Length 3 | Fjg860 | Faq623 |
Age | Shuffle | Shuffle | 26 | 34 |
FirstName | Random Strings | Random Strings: Start Length: 1 End Length: 6 | Kate | aaabgd |
LastName | Substring | Substring: Start Position 1 Length: 3 | Wilson | Wil |
DateOfBirth | Random Dates | Random Dates: Start Date 07/18/50 End Date 07/18/20 | 7/3/1998 | 8/5/1963 |
SocialSecurityNumber (SSN) | Random NumberPost-Processing Function | Random Number: Start Integer 20000000 End Integer 738999999Post-Processing Function: DM_FMTLIB. MGMT_DM_GEN_SSN_FH | 049-66-6786 | 136-76-4899 |
EmailAddress | SQL Expression | SQL Expression: %FirstName% || '.' || %LastName% || '@company.com ' | ktewilson@gmail.com | Kate.Wilson@company.com |
PhoneNumber | Random DigitsRandom NumberPost-Processing Function | Random Digits: Start Length: 7 End Length: 7Random Number: Start Integer 1 End Integer 279?Post-Processing Function: DM_FMTLIB. MGMT_DM_GEN_PH_USA_FH | 6033550232 | 619-989-9213 |
AddressLine1 | Group Masking | Shuffle Columns | 303 Madison | 26 West |
AddressLine2 | San Street | Wheelock St | ||
City | Redwood City | Hanover | ||
PostalCode | 94002 | 3755 | ||
Country | United States | United States | ||
Gender | Array List | Array List: Female, Male | Female | Male |
MaritalStatus | Null Value | Null Value | Single | |
Nationality | Truncate | Truncate | American | |
PassportNumber | Substitute | Regular Expression: [A-Z]{1}[0-9]{7} | P2378283 | V4697997 |
DriverLicenseNumber | Encrypt | Regular Expression: [D][0-9]{7} | D7482464 | D8208541 |
CreditCardNumber | Random DigitsPost-Processing Function | Random Digits: Start Length: 10 End Length: 10Post-Processing Function: DM_FMTLIB. MGMT_DM_GEN_VC | 4111111111111110 | 4556771680150140 |
BankAccountNumber | Substring | Substring: Start Position 3 Length: 3 | 123456789 | 789 |
AccountHolderName | Table Column | Schema: HR Table: New_table Column: LastName | Kate Wilson | Wilson |
TaxIdentificationNumber (TIN) | Fixed StringRandom Digits | Fixed String: TINRandom Digits: Start Length: 1 End Length: 6 | TIN123456 | TIN232163 |
EmploymentStatus | Preserve Original Data | Employed | Employed | |
MedicalRecordNumber | Fixed String | MRNxxxxxx | MRN123456 | MRNxxxxxx |
UserName | SQL Expression | SQL Expression: %FirstName% || '_' || %LastName% | katewilson1980 | Kate_Wilson |
Password | delete | xYdkI34b |
Where can I find more information on Oracle Data Masking and Subsetting?
For more information, such as product data sheets, tutorials, documentation, customer references, and blogs, please visit the Oracle Data Masking and Subsetting page on Oracle.com.
https://www.oracle.com/security/database-security/data-masking/
Explore the Data Masking and Subsetting workshop on Oracle LiveLabs to try the product at no cost.
https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=704