Data Discovery Overview

Data Discovery helps you find sensitive data in your Oracle databases.

How Data Discovery Searches for Sensitive Data

Protecting sensitive data begins with knowing what sensitive data you have, and where it is located. Data Discovery's primary means of discovering sensitive data in your target databases is by using sensitive types. Data Discovery also searches for dictionary-based referential relationships to find parent-child relationships. You can also choose to have Data Discovery search for non-dictionary referential relationships (application-level relationships).

Data Discovery searches for sensitive columns in your Oracle databases using the Oracle predefined and user-defined sensitive types that you choose. You tell Data Discovery what to look for, and it finds the sensitive columns that meet your criteria.

To help you validate the discovered sensitive columns, you can choose to collect sample data from your target databases during data discovery. Please be careful when using this feature because the sample data is sensitive data. Only authorized people should be able to collect and view the sample data.

Discovery through Sensitive Types

A sensitive type defines regular expressions that help search for sensitive columns based on column names, data, and comments. Oracle Data Safe provides over 170 predefined sensitive types that you can use to search for sensitive data. The predefined sensitive types are organized into categories, making it easy to find and use relevant sensitive types. You cannot modify or delete predefined sensitive types. You can, however, create your own sensitive types and sensitive categories. Data Discovery does not discover sensitive columns that are object data types.

The top level categories for predefined sensitive types are as follows:

  • Identification Information: Includes sensitive types for national, personal, and public identifiers. Examples are US Social Security Number (SSN), Canadian Social Insurance Number (SIN) and other national IDs, Visa Number, and Full Name.
  • Biographic Information: Includes sensitive types for address, family data, extended PII, and restricted processing data. Examples are Full Address, Mother's Maiden Name, Date of Birth, and Religion.
  • IT Information: Includes sensitive types for user IT data and device data. Examples are User ID, password, and IP Address.
  • Financial Information: Includes sensitive types for payment card data and bank account data. Examples are Card Number, Card Security PIN, and Bank Account Number.
  • Healthcare Information: Includes sensitive types for health insurance data, healthcare provider data, and medical data. Examples include Health Insurance Number, Healthcare Provider, and Blood Type.
  • Employment Information: Includes sensitive types for employee basic data, organization data, and compensation data. Examples are Job Title, Termination Date, Income, and Stock.
  • Academic Information: Includes sensitive types for student basic data, institution data, and performance data. Examples are Financial Aid, College Name, Grade, and Disciplinary Record.

Discovery through Dictionary-Based Referential Relationships

Data Discovery also searches the Oracle data dictionary to find relationships between primary key columns and foreign key columns. It then flags those related columns as sensitive. For example, suppose that you have two tables. The first is called CUSTOMERS, and it stores information like the customer’s first name, last name, and start date. The second table is called LOCATIONS, and it stores information about all of your sales locations. The LOCATION_ID in the CUSTOMERS table is configured as a foreign key and references the primary key, which is LOCATION_ID in the LOCATIONS table. Data Discovery automatically finds this type of referential relationship. In this example, if there is a sensitive type for location, LOCATION_ID in both tables would be captured as sensitive.

Discovery through Non-Dictionary Referential Relationships

In Oracle Data Safe, you have the option to also use non-dictionary referential relationships to find sensitive columns. These are relationships between database columns that are defined in applications, but not in the Oracle data dictionary. Data Discovery uses column name patterns and column data patterns from your selected sensitive types to discover potential relationships between columns.

For example, suppose that a parent table is called CUSTOMER and a related table is called PAYMENT_METHOD. The sensitive column is CUST_NAME in the parent table and CUST_NM in the related table. If the related table was created without showing a link in the data dictionary to the parent table (that is, no foreign key information was entered into the data dictionary), the relationship between the parent and related table is a “non-dictionary referential relationship.”

Sensitive Data Models

Data Discovery saves the discovery results as a sensitive data model to a specified compartment in Oracle Cloud Infrastructure. You can find sensitive data models to which you have access on the Sensitive Data Models page in Oracle Data Safe. The results consist of sensitive columns and referential relationships. When changes occur on a target database, you can perform incremental updates to a sensitive data model, add and remove sensitive columns from the sensitive data model, and manage the referential relationships between the sensitive columns. You can download a sensitive data model, modify it offline, and then upload it into the same or other Oracle Data Safe regions. A sensitive data model is associated with one target database at a time, although you can change that target database if needed.

You can create an empty sensitive data model directly, allowing for a tailored approach to tracking and masking sensitive objects. Instead of running data discovery and removing unwanted columns, you can create a new sensitive data model with no predefined columns and subsequently add only columns of interest.

To help you understand your sensitive data and for record keeping, Data Discovery provides downloadable reports for sensitive data models and incremental discoveries. Both types of reports provide totals of sensitive tables, columns, and values, and as well as details about the sensitive columns. The sensitive columns are categorized based on their sensitive types.

You can optionally store metadata in a sensitive data model, including sample data and estimated row counts. This information gives you a perspective on the quantity of the different types of sensitive data in your target databases.

You can use a sensitive data model to implement other security controls, such as data masking. For example, you can define a masking policy using an sensitive data model and use it to mask the sensitive data on target databases. You can reuse a sensitive data model for multiple masking policies.

Data Discovery Dashboard in Oracle Cloud Infrastructure

The Data Discovery dashboard provides a high-level view of your sensitive data across the target databases in your selected compartment(s). You can explore key features and workflows with the guided tour option by clicking the "Take the tour" button in the Data Discovery dashboard.

Common sensitive types tab

The Common sensitive types tab on the Data Discovery dashboard provides you with an overview of how frequently the 21 common sensitive types are used across your target database fleet. The 21 common sensitive types have been identifed by Oracle as the sensitive types that are most likely to be present within a database.

The Common sensitive types chart helps you to identify which sensitive types are most common within your target databases, by showing you a percentage breakdown of the 21 common sensitive types across your target database fleet.

The Discovery run summary tables helps you identify if Data Discovery is being well utilized across your target database fleet, by showing you the counts of how many databases have and have not had a sensitive data model created.


An example of the Common sensitive type tab in Data Discovery. Shows a pie chart of how many of the sensitive columns fall into each of the 21 common sensitive types. In addition shows that of the four target databases in the compartment two have had discovery run and two have not. The table at the bottom lists each of the 21 common sensitive types, how many target databases contain the sensitive type, and how many sensitive columns there are across your target databases for each sensitive type.

Target databases tab

The charts at the top of the dashboard focus on your top five target databases. The Top 5 sensitive types (by sensitive columns) chart helps you to identify the five sensitive types that are most common within your target databases and how many columns have these sensitive types. The Sensitive columns chart helps you to identify which target databases have the most sensitive columns, by showing you a percentage breakdown of sensitive columns across the top five targets. The Sensitive values chart helps you to identify which target databases contain the most sensitive values by showing you a percentage breakdown of sensitive values across the five targets.

Description of data-discovery-dashboard.png follows
Description of the illustration data-discovery-dashboard.png

Below the charts is a sensitive data summary for the target databases in the selected compartment. The summary lets you compare statistics across the target databases, including the number of sensitive data models created for each target database and the number of sensitive types, sensitive schemas, sensitive tables, sensitive columns, and sensitive values on each target database.

From the sensitive data summary, you can click on a target database name to view the Sensitive Data Models table, which lists sensitive data models associated with the selected target database. For each sensitive data model, this table shows you the target name, and the quantity of each of the following within the model: sensitive types, sensitive schemas, sensitive tales, sensitive columns, and sensitive values.

You can click on a sensitive data model name to go deeper and view a graph that shows the percentage and distribution of sensitive types within the sensitive data model. This page also provides a Sensitive Columns table that lists each sensitive type, its data type, and row count, as well as the schema, table, and column where the type is stored.

Data Discovery Workflow

Before you create a sensitive data model, you need to do the following:

  1. Obtain the appropriate permissions in Oracle Cloud Infrastructure Identity and Access Management, and then register your target database.
  2. (Optional) If the schema level statistics are not up-to-date, then gather schema statistics on your target database to ensure accurate results. To do this, run the dbms_stats.gather_schema_stats procedure. It is recommended that you run this procedure only when needed, because it is a resource-intensive operation. See GATHER_SCHEMA_STATS Procedures for information about the parameters that you can include. The following example gathers statistics on the HCM1 schema:
    exec dbms_stats.gather_schema_stats(ownname => 'HCM1');
Now you are ready to create a sensitive data model. When working in Data Discovery, follow these general steps when creating a sensitive data model:
  1. Provide sensitive data model information and select a target database.
  2. (Optional) If the schemas on the target database have been updated since the stated time and date, click Refresh Database Schemas.
  3. Select the schemas in which you want to find sensitive data. You can also select all schemas. Only non-Oracle mantained schemas are displayed and are selectable.
  4. Select the sensitive types to search for on your target database. You can also select all sensitive types.
  5. Select optional discovery options, including whether to retrieve sample data and to search for application-level referential relationships.

After your sensitive data model is initially populated with sensitive columns, your next step is to do the following:

  1. Review the resulting sensitive columns.
  2. Modify the sensitive data model, as needed, so that it accurately reflects the sensitive data in the target database.
  3. Set up event notifications. For example, you can subscribe to the Sensitive Data Model Create Begin event to be automatically informed if a sensitive data model is created.

Over time, you may want to do these tasks:

  1. Use the sensitive data model with other target databases. To do this, you can download and upload the sensitive data model into a different Oracle Data Safe region. You can also associate a sensitive data model with a different target database.
  2. Move your sensitive data model to a different compartment.
  3. Delete your sensitive data model.
  4. Create a sensitive data model manually, allowing for a tailored approach to tracking and masking sensitive objects.
    • Instead of running data discovery and removing unwanted columns, you can create a new sensitive data model with no predefined columns and subsequently add only columns of interest.
    • To do this, click the Create empty sensitive data model button on the Dashboard or select the Create empty sensitive data model check box within the Discover sensitive data panel.
    • After the new sensitive data model is created, click Add columns to manually add columns of interest to the sensitive data model.

Prerequisites for Using Data Discovery

These are the prerequisites for using Data Discovery:

  • Register the target databases that you want to use with Data Discovery.
    • If a target database is already registered with Oracle Data Safe by someone else, you need to obtain READ permission on the target database resource in Oracle Cloud Infrastructure Identity and Access Management (IAM) to run discovery jobs.
  • Grant the Data Discovery role on the target database. A Database Administrator can grant this role to the Oracle Data Safe Service Account on the target database.
  • Obtain permission in IAM to use the Data Discovery feature in Oracle Data Safe. A tenancy administrator can grant these permissions. These resources require permissions:
    • data-safe-discovery-jobs

      Requires manage permission in order to run discovery jobs.

    • data-safe-sensitive-data-models

      Requires manage for running discovery jobs and for modifying sensitive data models.

    • data-safe-sensitive-types

      Requires manage for creating sensitive types.

    • data-safe-work-requests

      Requires read permission to view work requests.

    As an alternative to selectively granting permissions, you can grant permissions on data-safe-discovery-family in the relevant compartments, which would include permissions on all of the resources above. See data-safe-discovery-family Resource in the Administering Oracle Data Safe guide for more information.

Note:

Because Data Discovery has moved from the Oracle Data Safe Console to Security Center in Oracle Cloud Infrastructure, an administrator must migrate existing Data Discovery privileges to IAM. After this migration is completed, additional user groups can be granted privileges in IAM to use the Data Discovery feature.

See Also:

The Administering Oracle Data Safe guide provides these sections to help with establishing the prerequisites: