Data Discovery Overview
Data Discovery helps you find sensitive data in your Oracle databases.
How Data Discovery Searches for Sensitive Data
Protecting sensitive data begins with knowing what sensitive data you have, and where it is located. Data Discovery's primary means of discovering sensitive data in your target databases is by using sensitive types. Data Discovery also searches for dictionary-based referential relationships to find parent-child relationships. You can also choose to have Data Discovery search for non-dictionary referential relationships (application-level relationships).
Data Discovery searches for sensitive columns in your Oracle databases using the Oracle predefined and user-defined sensitive types that you choose. You tell Data Discovery what to look for, and it finds the sensitive columns that meet your criteria.
To help you validate the discovered sensitive columns, you can choose to collect sample data from your target databases during data discovery. Please be careful when using this feature because the sample data is sensitive data. Only authorized people should be able to collect and view the sample data.
Discovery through Sensitive Types
A sensitive type defines regular expressions that help
search for sensitive columns based on column names, data, and comments. Oracle Data Safe provides over 170 predefined sensitive types that you can use to search for sensitive
data. The predefined sensitive types are organized into categories, making it easy to
find and use relevant sensitive types. You cannot modify or delete predefined sensitive
types. You can, however, create your own sensitive types and sensitive categories. Data
Discovery does not discover sensitive columns that are object
data
types.
The top level categories for predefined sensitive types are as follows:
- Identification Information: Includes sensitive types for national, personal, and public identifiers. Examples are US Social Security Number (SSN), Canadian Social Insurance Number (SIN) and other national IDs, Visa Number, and Full Name.
- Biographic Information: Includes sensitive types for address, family data, extended PII, and restricted processing data. Examples are Full Address, Mother's Maiden Name, Date of Birth, and Religion.
- IT Information: Includes sensitive types for user IT data and device data. Examples are User ID, password, and IP Address.
- Financial Information: Includes sensitive types for payment card data and bank account data. Examples are Card Number, Card Security PIN, and Bank Account Number.
- Healthcare Information: Includes sensitive types for health insurance data, healthcare provider data, and medical data. Examples include Health Insurance Number, Healthcare Provider, and Blood Type.
- Employment Information: Includes sensitive types for employee basic data, organization data, and compensation data. Examples are Job Title, Termination Date, Income, and Stock.
- Academic Information: Includes sensitive types for student basic data, institution data, and performance data. Examples are Financial Aid, College Name, Grade, and Disciplinary Record.
Discovery through Dictionary-Based Referential Relationships
Data Discovery also searches the Oracle data dictionary to find relationships between
primary key columns and foreign key columns. It then flags those related columns as
sensitive. For example, suppose that you have two tables. The first is called
CUSTOMERS
, and it stores information like the customer’s first
name, last name, and start date. The second table is called LOCATIONS
,
and it stores information about all of your sales locations. The
LOCATION_ID
in the CUSTOMERS
table is configured
as a foreign key and references the primary key, which is LOCATION_ID
in the LOCATIONS
table. Data Discovery automatically finds this type of
referential relationship. In this example, if there is a sensitive type for location,
LOCATION_ID
in both tables would be captured as sensitive.
Discovery through Non-Dictionary Referential Relationships
In Oracle Data Safe, you have the option to also use non-dictionary referential relationships to find sensitive columns. These are relationships between database columns that are defined in applications, but not in the Oracle data dictionary. Data Discovery uses column name patterns and column data patterns from your selected sensitive types to discover potential relationships between columns.
For example, suppose that a parent table is called CUSTOMER
and a
related table is called PAYMENT_METHOD
. The sensitive column is
CUST_NAME
in the parent table and CUST_NM
in the
related table. If the related table was created without showing a link in the data
dictionary to the parent table (that is, no foreign key information was entered into the
data dictionary), the relationship between the parent and related table is a
“non-dictionary referential relationship.”
Sensitive Data Models
Data Discovery saves the discovery results as a sensitive data model to a specified compartment in Oracle Cloud Infrastructure. You can find sensitive data models to which you have access on the Sensitive Data Models page in Oracle Data Safe. The results consist of sensitive columns and referential relationships. When changes occur on a target database, you can perform incremental updates to a sensitive data model, add and remove sensitive columns from the sensitive data model, and manage the referential relationships between the sensitive columns. You can download a sensitive data model, modify it offline, and then upload it into the same or other Oracle Data Safe regions. A sensitive data model is associated with one target database at a time, although you can change that target database if needed.
You can create an empty sensitive data model directly, allowing for a tailored approach to tracking and masking sensitive objects. Instead of running data discovery and removing unwanted columns, you can create a new sensitive data model with no predefined columns and subsequently add only columns of interest.
To help you understand your sensitive data and for record keeping, Data Discovery provides downloadable reports for sensitive data models and incremental discoveries. Both types of reports provide totals of sensitive tables, columns, and values, and as well as details about the sensitive columns. The sensitive columns are categorized based on their sensitive types.
You can optionally store metadata in a sensitive data model, including sample data and estimated row counts. This information gives you a perspective on the quantity of the different types of sensitive data in your target databases.
You can use a sensitive data model to implement other security controls, such as data masking. For example, you can define a masking policy using an sensitive data model and use it to mask the sensitive data on target databases. You can reuse a sensitive data model for multiple masking policies.
Data Discovery Dashboard in Oracle Cloud Infrastructure
The Data Discovery dashboard provides a high-level view of your sensitive data across the target databases in your selected compartment(s). You can explore key features and workflows with the guided tour option by clicking the "Take the tour" button in the Data Discovery dashboard.
Common sensitive types tab
The Common sensitive types tab on the Data Discovery dashboard provides you with an overview of how frequently the 21 common sensitive types are used across your target database fleet. The 21 common sensitive types have been identified by Oracle as the sensitive types that are most likely to be present within a database.
The Common sensitive types chart helps you to identify which sensitive types are most common within your target databases, by showing you a percentage breakdown of the 21 common sensitive types across your target database fleet.
The Discovery run summary tables helps you identify if Data Discovery is being well utilized across your target database fleet, by showing you the counts of how many databases have and have not had a sensitive data model created.
Figure 6-1 Data discovery common sensitive types tab
Target databases tab
The charts at the top of the dashboard focus on your top five target databases. The Top 5 sensitive types (by sensitive columns) chart helps you to identify the five sensitive types that are most common within your target databases and how many columns have these sensitive types. The Sensitive columns chart helps you to identify which target databases have the most sensitive columns, by showing you a percentage breakdown of sensitive columns across the top five targets. The Sensitive values chart helps you to identify which target databases contain the most sensitive values by showing you a percentage breakdown of sensitive values across the five targets.
Figure 6-2 Data discovery target databases tab
The charts are followed by the sensitive data summary for the target databases in the selected compartment(s). The summary lets you compare statistics across the target databases, including the number of sensitive data models created for each target database and the number of sensitive types, sensitive schemas, sensitive tables, sensitive columns, and sensitive values on each target database.
From the sensitive data summary, click on the Target databases tab, then click on a target database name to view the Sensitive Data Models table, which lists sensitive data models associated with the selected target database. For each sensitive data model, this table shows you the target name, and the quantity of each of the following within the model: sensitive types, sensitive schemas, sensitive tales, sensitive columns, and sensitive values.
You can click on a sensitive data model name to go deeper and view a graph that shows the percentage and distribution of sensitive types within the sensitive data model. This page also provides a Sensitive Columns table that lists each sensitive type, its data type, and row count, as well as the schema, table, and column where the type is stored.
Notifications tab
The Notifications tab shows you what event notifications and subscriptions you have created for Data Discovery. More specifically, it displays the event, rule name, topic name, and when the event notification was created. This table will only show Events that you have created directly within Data Safe. In addition to displaying existing event notifications, you can also create new notifications by using the Create notification button. See Create and Modify Event Notifications in Data Discovery for more information.
Data Discovery Workflow
Before you create a sensitive data model, you need to do the following:
- Obtain the appropriate permissions in Oracle Cloud Infrastructure Identity and Access Management, and then register your target database.
- (Optional) If the schema level statistics are not up-to-date, then
gather schema statistics on your target database to ensure accurate results. To do
this, run the
dbms_stats.gather_schema_stats
procedure. It is recommended that you run this procedure only when needed, because it is a resource-intensive operation. See GATHER_SCHEMA_STATS Procedures for information about the parameters that you can include. The following example gathers statistics on theHCM1
schema:exec dbms_stats.gather_schema_stats(ownname => 'HCM1');
- Provide sensitive data model information and select a target database.
- (Optional) If the schemas on the target database have been updated since the stated time and date, click Refresh Database Schemas.
- Select the schemas in which you want to find sensitive data. You can also select all schemas. Only non-Oracle mantained schemas are displayed and are selectable.
- Select the sensitive types to search for on your target database. You can also select all sensitive types.
- Select optional discovery options, including whether to retrieve sample data and to search for application-level referential relationships.
After your sensitive data model is initially populated with sensitive columns, your next step is to do the following:
- Review the resulting sensitive columns.
- Modify the sensitive data model, as needed, so that it accurately reflects the sensitive data in the target database.
- Set up event notifications.
For example, you can subscribe to the
Sensitive Data Model Create Begin
event to be automatically informed if a sensitive data model is created.
Over time, you may want to do these tasks:
- Use the sensitive data model with other target databases. To do this, you can download and upload the sensitive data model into a different Oracle Data Safe region. You can also associate a sensitive data model with a different target database.
- Move your sensitive data model to a different compartment.
- Delete your sensitive data model.
- Create a sensitive data model manually, allowing for a tailored approach to tracking and masking sensitive objects.
- Instead of running data discovery and removing unwanted columns, you can create a new sensitive data model with no predefined columns and subsequently add only columns of interest.
- To do this, click the Create sensitive data model manually button on the Dashboard or select the Create sensitive data model manually check box within the Discover sensitive data panel.
- After the new sensitive data model is created, click Add columns to manually add columns of interest to the sensitive data model.
Prerequisites for Using Data Discovery
These are the prerequisites for using Data Discovery:
- Register the target databases that you want to use with Data
Discovery.
- If a target database is already registered with Oracle Data Safe by someone
else, you need to obtain
READ
permission on the target database resource in Oracle Cloud Infrastructure Identity and Access Management (IAM) to run discovery jobs.
- If a target database is already registered with Oracle Data Safe by someone
else, you need to obtain
- Grant the Data Discovery role on the target database. A Database Administrator can grant this role to the Oracle Data Safe Service Account on the target database.
-
Obtain permission in IAM to use the Data Discovery feature in Oracle Data Safe. A tenancy administrator can grant these permissions. These resources
require permissions:
data-safe-discovery-jobs
Requires
manage
permission in order to run discovery jobs.data-safe-sensitive-data-models
Requires
manage
for running discovery jobs and for modifying sensitive data models.data-safe-sensitive-types
Requires
manage
for creating sensitive types.data-safe-work-requests
Requires
read
permission to view work requests.
As an alternative to selectively granting permissions, you can grant permissions on
data-safe-discovery-family
in the relevant compartments, which would include permissions on all of the resources above. See data-safe-discovery-family Resource in the Administering Oracle Data Safe guide for more information.
Note:
Because Data Discovery has moved from the Oracle Data Safe Console to Security Center in Oracle Cloud Infrastructure, an administrator must migrate existing Data Discovery privileges to IAM. After this migration is completed, additional user groups can be granted privileges in IAM to use the Data Discovery feature.See Also:
The Administering Oracle Data Safe guide provides these sections to help with establishing the prerequisites:- Migrate to Oracle Cloud Infrastructure You can follow the one-time migration procedure described in the guide or you can do the migration manually.
- Grant Roles to the Oracle Data Safe Service Account on Your Target Database describes the roles required for Data Discovery and for other Oracle Data Safe features.
- Create IAM Policies for Oracle Data Safe describes the privileges required for each feature in Oracle Data Safe.
Unsupported Data Types, Objects, and Database Features for Data Discovery
- LONG
- RAW
- BFILE
- BLOB
- JSON
- XMLTYPE
- HTTPURITYPE
- XDBURITYPE
- DBURITYPE
- ADT
- External tables
- Temporary tables
- View
- Index
- Nested tables