Data Discovery Overview

Data Discovery helps you find sensitive data in your Oracle databases.

This article has the following topics:

About Data Discovery

Protecting sensitive data begins with knowing what sensitive data you have, and where it is located. Data Discovery inspects the metadata and actual data in your Oracle databases to discover sensitive data and provides comprehensive results listing the sensitive columns and related information.

Data Discovery uses sensitive types that define the kinds of data to look for. Oracle Data Safe provides over 125 predefined sensitive types that you can use to search sensitive data. The sensitive types cover personal data pertaining to identification, biographic, IT, financial, healthcare, employment, and academic information. You can also create your own sensitive types. The predefined sensitive types are organized under categories, making it easy to find and use relevant sensitive types. You tell Data Discovery what to look for, and it finds the sensitive columns that meet your criteria.

You can optionally choose to collect sample data from your target databases. Sample data can help you validate the discovered sensitive columns. You should be careful while using this feature, however, as it collects sensitive data. Only authorized people should be able to collect and see the sample data.

Data Discovery saves the discovery results as a sensitive data model. A sensitive data model consists of discovered sensitive columns and referential relationships. You can perform incremental updates to a sensitive data model and manually add and remove columns from a sensitive data model.

You can use a sensitive data model to implement other security controls, such as data masking. For example, you can define a masking policy using an sensitive data model and use it to mask the sensitive data on target databases.

Sensitive data models get stored in the Oracle Data Safe Library, enabling you to reuse an sensitive data model for multiple masking policies. Users can export a sensitive data model and import it into other Oracle Data Safe Libraries for reuse. The verification feature identifies any differences between a sensitive data model and a selected target database.

To help you understand your sensitive data and for record keeping, Data Discovery provides a report that lists the sensitive columns and details about those columns. The sensitive columns are categorized based on their sensitive types. The report also includes the total number of sensitive tables, columns, and values discovered. A chart lets you compare the amount of sensitive data at sensitive category and sensitive type levels. You can also download this report from the Oracle Data Safe console.

Data Discovery Workflow

The general workflow for Data Discovery involves these main steps:

  1. Register the target database on which you want to discover sensitive data.
  2. Gather schema statistics on your target database before running the data discovery job to ensure accurate results. To do this, run the dbms_stats.gather_schema_stats procedure. See GATHER_SCHEMA_STATS Procedures for information about the parameters that you can include. The following example gathers statistics on the HCM1 schema:
    exec dbms_stats.gather_schema_stats(ownname => 'HCM1');
  3. Create a data discovery job using the Data Discovery wizard to discover the sensitive data on the target database and generate a sensitive data model (SDM). In the wizard, you follow these general steps:
    1. Specify the target database and schemas that you want Data Discovery to search.
    2. Specify the sensitive types to be used for data discovery. You can select individual sensitive types and/or categories of sensitive types. Optionally, you can instruct Data Discovery to find non-dictionary referential relationships.
    3. Run the data discovery job. Data Discovery identifies sensitive columns by examining column names, comments, data samples, object relations, and so on, and generates an SDM.
    4. Review the sensitive columns in the SDM. If needed, you can add and remove sensitive columns.
    5. (Optional) Before exiting the wizard, click Back, modify the selection of sensitive types, and rerun the data discovery job. Review the generated SDM again. Repeat this step until you feel the SDM is accurate and complete.
  4. Analyze your sensitive data in the target database by viewing the Data Discovery report.
  5. Manage the SDM:
    • Verify the SDM against the target databases with which you plan to use the SDM. Verification ensures that the target databases have the schemas and sensitive columns listed in the SDM.
    • Update the SDM when needed. You can use the Data Discovery wizard to perform incremental updates. You can also manually add and remove sensitive columns.
    • To use the SDM with other target databases, download and upload the SDM into a different Oracle Data Safe Library.