Create Sensitive Types and Categories
In Oracle Data Safe, you can create your own sensitive types and sensitive categories.
Create a Sensitive Type
When creating a sensitive type, you can provide one or more patterns (regular expressions) that should be used to discover sensitive columns. You can provide a column name pattern, column comment pattern, column data pattern, and a search type (AND/OR). Data Discovery performs case-insensitive pattern matching.
For a user-defined sensitive type, you can assign a default masking format, is used to mask the columns discovered using this sensitive type. When creating a user-defined sensitive type, you must assign it to a compartment.
Tips for Creating Sensitive Types
The following topics help you to write patterns for sensitive types. For more information about regular expressions, see Regular Expressions.
Column Name Pattern
A column name pattern is a regular expression that is used to match column names during data discovery. For example, to search for columns containing Social Security numbers, you could define the following column name pattern:
(^|[_-])SSN($|[_-])|(SSN|SOC.*SEC.*).?(ID|NO|NUMBERS?|NUM|NBR|#)
The regular expression checks for specific keywords in column names. It
matches column names, such as PATIENT_SSN
, SSN#
,
SOCIAL_SECURITY_NUMBER
, and
EMPLOYEE_SOC_SEC_NO
.
Tips for creating column name patterns:
- Consider when to use
.?
and.*
. Use.?
if you want to allow zero or one character, and use.*
to allow any number of characters. For example, you could useSOCIAL.?SECURITY.?NUMBER
orSOC.*SEC.*NUMBER
depending upon how strict you want the regular expression to be. - To get an exact match of a word or a match if the word is part of a
column name, use
(^|[_-])<WORD>($|[_-])
. The pattern finds an exact match and variations of<WORD>
plus the characters_-
before or after the word. - Whenever searching for columns containing numbers, you could use
keywords like
(ID|NO|NUMBERS?|NUM|NBR|#)
. - To match singular and plural words, if applicable, use
S?
. For example, useCODES?
to matchCODE
andCODES
. - To match dates, use
(DT|DATE)
and the reverse pattern. For example, you could use the following pattern to matchBIRTH_DATE
andDATE_OF_BIRTH
:BIRTH.?(DT|DATE)|(DT|DATE).*BIRTH
Column Comment Pattern
A column comment pattern is a regular expression that is used to match column comments during data discovery. Sometimes column names are obscure and therefore, metadata is entered as a comment for a database column. Data Discovery can search these comments and potentially find more sensitive data. For example, to search for columns containing Social Security numbers, you could define the following column comment pattern:
\bSSN#?\b|SOCIAL SECURITY (ID|NUM|\bNO\b|NBR)
The regular expression checks for specific keywords in column comments. For
example, it matches the column comment Contains social security numbers of
employees
.
Tips for creating column comment patterns:
- Avoid using
.*
in column comments to reduce false positives. - Use
\b<word>\b
to search for a specific word. It avoids matching words that contain<word>
. For example, the regular expression\bNO\b
matchessocial security no
but notsocial security notification
. Similarly, the regular expression\bSECT\b
does not match the wordSECTOR
, and\bCULT\b
does not match the wordCULTURE
. - Whenever searching for columns containing numbers, you can use keywords
like
(ID|\bNO\b|NUM|NBR|#)
.
Column Data Pattern
A column data pattern is a regular expression that is used to match the actual column data during data discovery. For example, to search for columns containing Social Security numbers, you could define the following column data pattern:
^[0-9]{3}[ -]?[0-9]{2}[ -]?[0-9]{4}$
The regular expression checks for 9-digit numbers. A number can be either
numeric or can have three parts separated by hyphens or spaces. It matches numbers like
383368610
and 383-36-8610
.
Tips for creating column data patterns:
- Ensure that the data pattern is as specific as possible to avoid false positives.
- See whether it is logical to have a data pattern. If the data pattern is too broad, it can result in false positives. If it does not add any value, you could decide not to add the data pattern for a sensitive type.
- If you want to use a broad data pattern, you could use the
And
search operator to reduce false positives.
Search Pattern
The search pattern indicates how the column name, comment and data patterns of a sensitive type should be used to discover sensitive columns. There are two search options: AND and OR.
The AND search option ensures that all the provided patterns of a sensitive type must match for identifying a column as sensitive. For example, if a sensitive type has name, comment, and data patterns, they must match a column's name, comment, and data respectively, for identifying that column as sensitive. The following table covers the various possible combination of the patterns provided for a sensitive type and the corresponding AND search behavior.
Patterns Present in a Sensitive Type | Search Behavior |
---|---|
Name, Comment, and Data | Name AND Comment AND Data |
Name and Data | Name AND Data |
Name and Comment | Name AND Comment |
Comment and Data | Comment AND Data |
Name | Name |
Comment | Comment |
Data | Data |
The OR search option provides some flexibility to identify a column as sensitive even if only some of the patterns of a sensitive type match. For example, if a sensitive type has name and comment patterns, a column is identified as sensitive even if only the name pattern (or comment pattern) matches the column's name (or comment). If a sensitive type has all three patterns, the data pattern must match along with either the name pattern or the comment pattern (or both). The following table covers the various possible combination of the patterns provided for a sensitive type and the corresponding OR search behavior.
Patterns Present in a Sensitive Type | Search Behavior |
---|---|
Name, Comment, and Data | Data OR (Name AND Data) OR (Comment AND Data) |
Name and Data | Data OR (Name AND Data) |
Name and Comment | Name OR Comment |
Comment and Data | Data OR (Comment AND Data) |
Name | Name |
Comment | Comment |
Data | Data |