Harvesting Technical Metadata
Harvesting is a process that extracts data structure information from your data sources into your data catalog repository.
What is a Data Asset?
To harvest your data source, you need to register your data source as a data asset in your data catalog instance. A data asset is any physical data store or stream of data such as a database, a cloud storage container, or a message stream.
Supported Data Sources for Data Assets
You use the following data sources (accessible using public or private IPs) to create data assets in Data Catalog.
This is a list of supported data sources, not certified data sources.
|Data Source Type||Version|
|Oracle Database on Oracle Cloud Infrastructure||12.1|
|Exadata DB Systems||12.1|
|Oracle Cloud Infrastructure Object Storage||Latest|
|Autonomous Data Warehouse||18c/19c|
|Autonomous Transaction Processing||18c/19c|
|OCI MySQL Database Service||8.0.25-u3-cloud|
|9.6, 9.5, 9.4, 9.3, 9.2, 9.1, and 9.0|
|8.4, 8.3, and 8.2|
|Apache Hive||CDH 5.4 and higher|
|Apache 1.0, 2.0, 3.0 and higher|
|Microsoft SQL Server||2019|
|2016 Service Pack 2|
|2014 Service Pack 3|
|2012 Service Pack 4|
|Microsoft Azure SQL Database||12.00.2000|
You can also connect to on premise data sources that are connected to Oracle Cloud Infrastructure Virtual Cloud Networks (VCNs).
Depending on the type of data asset you create, you use different data structures to browse the data entities. For example, if you create an Oracle Database data asset, you browse through database objects to review the table and view data entities.
In data assets of type Oracle Database or Autonomous Databases if the database version is Oracle Database 12c and above, Data Catalog harvester doesn't harvest the Oracle maintained schemas and other common user schemas.
Supported File Types
The following file types are supported for Oracle Object Storage:
Comma-Separated Value (CSV) files (
The supported separators are
XML files (
Avro files (
Excel files (
Apache Parquet files (
Apache ORC files (
Simple JSON files (
If you choose to harvest unsupported file types, the Data Catalog harvester extracts only the basic information from those files, such as names and paths.
What are Data Entities and Attributes?
A data asset contains one or more data entities. A data entity is a collection of data such as a database table or view, or a single logical file. A data entity normally has many attributes that describe its data. An attribute describes a data item with a name and data type.
|Data Asset||Data Entities||Attributes|
|Database||Tables and Views||Columns|
|Data Stream||Event or Topic or Payload||Keys|
Harvesting a data source involves the following steps:
- Identify connectivity details to connect to the data source.
- Create a data asset.
- Add a connection to your data asset.
- Harvest the data asset.