Sensitive Types

A sensitive type is used to drive sensitive data discovery. It defines regular expressions that help search for data based on column names, data, and comments. Data Discovery searches for sensitive columns in your Oracle databases using the predefined and user-defined sensitive types that you choose.

This article has the following topics:

Predefined Sensitive Types

In Data Discovery, you can choose from a wide variety of predefined sensitive types and create your own sensitive types. For example, the predefined US Social Security Number (SSN) sensitive type helps you discover columns containing Social Security numbers. You cannot modify or delete predefined sensitive types.

Sensitive types are grouped into sensitive categories. The top level categories for predefined sensitive types are as follows:

  • Identification Information: Includes sensitive types for national, personal, and public identifiers. Examples are US Social Security Number (SSN), Visa Number, and Full Name.
  • Biographic Information: Includes sensitive types for address, family data, extended PII, and restricted processing data. Examples are Full Address, Mother's Maiden Name, Date of Birth, and Religion.
  • IT Information: Includes sensitive types for user IT data and device data. Examples are User ID, password, and IP Address.
  • Financial Information: Includes sensitive types for payment card data and bank account data. Examples are Card Number, Card Security PIN, and Bank Account Number.
  • Healthcare Information: Includes sensitive types for health insurance data, healthcare provider data, and medical data. Examples include Health Insurance Number, Healthcare Provider, and Blood Type.
  • Employment Information: Includes sensitive types for employee basic data, organization data, and compensation data. Examples are Job Title, Termination Date, Income, and Stock.
  • Academic Information: Includes sensitive types for student basic data, institution data, and performance data. Examples are Financial Aid, College Name, Grade, and Disciplinary Record.
Note

Data Discovery does not discover sensitive columns that are object data types.

User-Defined Sensitive Types

Although Oracle Data Safe provides an extensive set of predefined sensitive types, you might want to create sensitive types to meet your specific requirements. You can also create new sensitive categories and arrange your sensitive types under them. You cannot place a user-defined sensitive type under a predefined sensitive category.

For a user-defined sensitive type, you can assign a default masking format, which should be used to mask the columns discovered using this sensitive type. When creating a user-defined sensitive type, you must assign it to a compartment.

When creating a sensitive type, you can provide one or more column patterns (regular expressions) that should be used to discover sensitive columns. You can also provide a column comment pattern, column data pattern, and a search pattern. Data Discovery performs case-insensitive pattern matching.

Column Name Pattern

A column name pattern is a regular expression that is used to match column names during data discovery. For example, to search for columns containing Social Security numbers, you could define the following column name pattern:

(^|[_-])SSN($|[_-])|(SSN|SOC.*SEC.*).?(ID|NO|NUMBERS?|NUM|NBR|#)

The regular expression checks for specific keywords in column names. It matches column names, such as PATIENT_SSN, SSN#, SOCIAL_SECURITY_NUMBER, and EMPLOYEE_SOC_SEC_NO.

Tips for creating column name patterns:

  • Consider when to use .? and .*. Use .? if you want to allow zero or one character, and use .* to allow any number of characters. For example, you could use SOCIAL.?SECURITY.?NUMBER or SOC.*SEC.*NUMBER depending upon how strict you want the regular expression to be.
  • To get an exact match of a word or a match if the word is part of a column name, use (^|[_-])<WORD>($|[_-]). The pattern finds an exact match and variations of <WORD> plus the characters _- before or after the word.
  • Whenever searching for columns containing numbers, you could use keywords like (ID|NO|NUMBERS?|NUM|NBR|#).
  • To match singular and plural words, if applicable, use S?. For example, use CODES? to match CODE and CODES.
  • To match dates, use (DT|DATE) and the reverse pattern. For example, you could use the following pattern to match BIRTH_DATE and DATE_OF_BIRTH:
    BIRTH.?(DT|DATE)|(DT|DATE).*BIRTH

Column Comment Pattern

A column comment pattern is a regular expression that is used to match column comments during data discovery. Sometimes column names are obscure and therefore, metadata is entered as a comment for a database column. Data Discovery can search these comments and potentially find more sensitive data. For example, to search for columns containing Social Security numbers, you could define the following column comment pattern:

\bSSN#?\b|SOCIAL SECURITY (ID|NUM|\bNO\b|NBR)

The regular expression checks for specific keywords in column comments. For example, it matches the column comment Contains social security numbers of employees.

Tips for creating column comment patterns:

  • Avoid using .* in column comments to reduce false positives.
  • Use \b<word>\b to search for a specific word. It avoids matching words that contain <word>. For example, the regular expression \bNO\b matches social security no but not social security notification. Similarly, the regular expression \bSECT\b does not match the word SECTOR, and \bCULT\b does not match the word CULTURE.
  • Whenever searching for columns containing numbers, you can use keywords like (ID|\bNO\b|NUM|NBR|#).

Column Data Pattern

A column data pattern is a regular expression that is used to match the actual column data during data discovery. For example, to search for columns containing Social Security numbers, you could define the following column data pattern:

^[0-9]{3}[ -]?[0-9]{2}[ -]?[0-9]{4}$

The regular expression checks for 9-digit numbers. A number can be either numeric or can have three parts separated by hyphens or spaces. It matches numbers like 383368610 and 383-36-8610.

Tips for creating column data patterns:

  • Ensure that the data pattern is as specific as possible to avoid false positives.
  • See whether it is logical to have a data pattern. If the data pattern is too broad, it can result in false positives. If it does not add any value, you could decide not to add the data pattern for a sensitive type.
  • If you want to use a broad data pattern, you could use the And search operator to reduce false positives.

Search Pattern

The search pattern indicates how the column name, comment and data patterns of a sensitive type should be used to discover sensitive columns. There are two search options: AND and OR.

The AND search option ensures that all the provided patterns of a sensitive type must match for identifying a column as sensitive. For example, if a sensitive type has name, comment, and data patterns, they must match a column's name, comment, and data respectively, for identifying that column as sensitive. The following table covers the various possible combination of the patterns provided for a sensitive type and the corresponding AND search behavior.

Patterns Present in a Sensitive Type Search Behavior
Name, Comment, and Data Name AND Comment AND Data
Name and Data Name AND Data
Name and Comment Name AND Comment
Comment and Data Comment AND Data
Name Name
Comment Comment
Data Data

The OR search option provides some flexibility to identify a column as sensitive even if only some of the patterns of a sensitive type match. For example, if a sensitive type has name and comment patterns, a column is identified as sensitive even if only the name pattern (or comment pattern) matches the column's name (or comment). If a sensitive type has all three patterns, the data pattern must match along with either the name pattern or the comment pattern (or both). The following table covers the various possible combination of the patterns provided for a sensitive type and the corresponding OR search behavior.

Patterns Present in a Sensitive Type Search Behavior
Name, Comment, and Data Data OR (Name AND Data) OR (Comment AND Data)
Name and Data Data OR (Name AND Data)
Name and Comment Name OR Comment
Comment and Data Data OR (Comment AND Data)
Name Name
Comment Comment
Data Data

Related Content

For help on writing regular expressions for user-defined sensitive types, see the following resource: