Harvesting Technical Metadata

Extract data structure information from your data sources into your data catalog repository.

The process of extracting data structure information is known as harvesting.

What is a Data Asset?

To harvest your data source, you need to register your data source as a data asset in your data catalog instance. A data asset is any physical data store or stream of data such as a database, a cloud storage container, or a message stream.

When you harvest a data asset, the Data Catalog harvester extracts, standardizes, and indexes metadata information from the data asset to create a unified and searchable repository in the data catalog. You then browse or explore the data catalog to view the harvested data entities and attributes to annotate and enrich the data assets.

Harvesting a data source involves the following steps:

Identify connectivity details to connect to the data source.
Create a data asset.
Add a connection to your data asset.
Harvest the data asset.

Supported Data Sources for Data Assets

You use the following data sources (accessible using public or private IPs) to create data assets in Data Catalog.

Note

This is a list of supported data sources, not certified data sources.


Data Source Type	Version
Oracle Database	12.1
	12.2
	18
	19
	20
	21
Oracle Database on Oracle Cloud Infrastructure	12.1
	12.2
	18
	19
Exadata DB Systems	12.1
	12.2
	18
	19
Oracle Cloud Infrastructure Object Storage	Latest
Autonomous Database for Analytics and Data Warehousing	18c/19c
Autonomous Database for Transaction Processing and Mixed Workloads	18c/19c
MySQL	8.0.x
OCI MySQL Heatwave Service	8.0.25-u3-cloud
PostgreSQL	10.1
	9.6, 9.5, 9.4, 9.3, 9.2, 9.1, and 9.0
	8.4, 8.3, and 8.2
Apache Hive	CDH 5.4 and higher
Apache Hive	Apache 1.0, 2.0, 3.0 and higher
Microsoft SQL Server	2019
	2017
	2016 Service Pack 2
	2014 Service Pack 3
	2012 Service Pack 4
IBM DB2 LUW (DB2 for Linux, UNIX and Windows)	10.5.0.11
IBM DB2 LUW (DB2 for Linux, UNIX and Windows)	11.5.5.0
IBM DB2 AS400	7.1 and higher
Apache Kafka	2.12-2.3.0
Microsoft Azure SQL Database	12.00.2000

You can also connect to on-premises data sources that are connected to Oracle Cloud Infrastructure Virtual Cloud Networks (VCNs).

Depending on the type of data asset you create, you use different data structures to browse the data entities. For example, if you create an Oracle Database data asset, you browse through database objects to review the table and view data entities.

Note

In data assets of type Oracle Database or Autonomous Databases if the database version is Oracle Database 12c and above, Data Catalog harvester doesn't harvest the Oracle maintained schemas and other common user schemas.

Harvested Objects for Data Sources

The harvested objects for different data sources are listed in the following table:


Data Source	Harvested Objects
Apache Hive	Hive databases Tables Columns
Apache Kafka	Topics Messages Attributes
Oracle Cloud Infrastructure Object Storage	Buckets Files (File types: CSV, Avro, ORC, Parquet, JSON, XML, Excel) Fields (based on the file types)
OCI Data Catalog Metastore	Catalogs Databases Tables Columns
Autonomous Data Warehouse	Schemas Tables Views Columns Constraints (Primary Key and Foreign Key) Comments (Applicable only for Oracle Database)
Autonomous Transaction Processing
IBM DB2
Microsoft Azure SQL database
Microsoft SQL Server
MySQL
Oracle Database
PostgreSQL

Supported File Types

The following file types are supported for Oracle Object Storage:

Comma-Separated Value (CSV) files (.csv, .csv.gz)

Note

The supported separators are , (comma), \t (tab), | (vertical bar), ; (semicolon).
XML files (.xml, .xsd)
Avro files (.avro, .avro.gz)
Excel files (.xls, .xlsx)
Apache Parquet files (.parquet, .pq)
Apache ORC files (.orc)
Simple JSON files (.json, .json.gz)

If you choose to harvest unsupported file types, the Data Catalog harvester extracts only the basic information from those files, such as names and paths.

Data Entities and Attributes

A data asset contains one or more data entities. A data entity is a collection of data such as a database table or view, or a single logical file. A data entity normally has many attributes that describe its data. An attribute describes a data item with a name and data type.


Data Asset	Data Entities	Attributes
Database	Tables and Views	Columns
File Container	Files	Fields
Data Stream	Event or Topic or Payload	Keys