Harvest from Autonomous Databases with Private Access

Harvesting is a process that extracts technical metadata from your data source into your data catalog. This tutorial provides the steps to harvest from a data source that is only accessible privately.

In this tutorial, you:

  1. Create the policies needed to harvest an autonomous database with private URL.
  2. Obtain the autonomous database access details.
  3. Create a private endpoint in Data Catalog.
  4. Attach the private endpoint to your data catalog.
  5. Create a data asset.
  6. Add a connection for the data asset.
  7. Harvest the data asset.

For additional information, see configuring a private network.

Before You Begin

To successfully perform this tutorial, you must have the following:

If you already have the autonomous database you want to harvest from, you can use the details for that database to complete this tutorial. If you don't have an existing autonomous database with private access and want to try this tutorial, you can follow the following instructions to set up the resources needed to perform this tutorial.

Setting up the Resources Needed for this Tutorial

1. Create Access Policies for Network Resources

You create policies in Oracle Cloud Infrastructure to allow access to the various resources.

Before you can create a private network in your tenancy, you must have the required networking permissions.

In this setup, you create a policy to allow you to perform all networking operations in any compartment in your tenancy.

Perform the following steps:

  1. Open the navigation menu and click Identity & Security. Under Identity, click Policies.
  2. Click Create Policy.
  3. In the Create Policy panel, enter a unique name for the policy. The name must be unique across all policies in your tenancy. You cannot change the name later. For example, create-private-network-policy.
  4. Enter a Description such as Grant permissions to create a private network .
  5. In the Policy Builder section, move the slider to Show manual editor, and enter the policy rule. For example, for the data-catalog-users group, enter the following policy rule:
    allow group data-catalog-users to manage virtual-network-family in tenancy
    Note

    This policy allows users in the data-catalog-users group to perform all network-related operations in any compartment in the tenancy.
  6. Click Create.
You have successfully created the policy to access networking resources.
2. Create a Virtual Cloud Network

A Virtual Cloud Network (VCN) is a virtual, private network that you set up in a single Oracle Cloud Infrastructure region. A VCN has a single, contiguous IPv4 CIDR block of your choice.

The allowable VCN size range is /16 to /30. Decide on the CIDR block before you create a VCN. You can't change the CIDR value later. For your reference, here's a CIDR Calculator.

To create a VCN, complete the following steps:

  1. Open the navigation menu, click Networking, and then click Virtual Cloud Networks.
  2. Click Create VCN.
  3. Enter a NAME to identify your VPN and select the compartment you have permission to work in.
  4. Enter the CIDR block for the VCN. For this tutorial, you can enter 10.0.0.0/16.
  5. Select DNS RESOLUTION and enter a DNS LABEL.
  6. Click Create VCN.

The VCN is created and the Virtual Cloud Networks Details page for the VCN displays.

By default, a route table, DHCP option, and security list are automatically created for the VCN. You use these default components when you create a private subnet.

3. Create a Private Subnet

Subnets are divisions you create in a VCN. Each subnet consists of a contiguous range of IP addresses that do not overlap with other subnets in the VCN. You create a private subnet when you don't want the resources created in the subnet to have public IP addresses.

Complete the following steps to create a private subnet:

  1. Click Create Subnet from the Virtual Cloud Networks Details page of the VCN you created in the previous step.
  2. Enter a NAME for the private subnet.
  3. Retain the default REGIONAL selection for SUBNET TYPE.
  4. Enter the CIDR block for the private subnet. For this tutorial, you can enter 10.0.0.0/24.
  5. Select the default ROUTE TABLE.
  6. Select PRIVATE SUBNET for SUBNET ACCESS.
  7. Select Use DNS Hostnames in this Subnet for DNS RESOLUTION.
  8. Enter a DNS LABEL.
  9. Select the default DHCP OPTIONS and default SECURITY LISTS.
  10. Click Create Subnet.

The private subnet is created and displayed on the Subnets page in the compartment you chose.

4. Create a Network Security Group

When you create an autonomous database in a VCN, you need to specify the Network Security Group (NSG) for the database. An NSG consists of security rules that apply to only a group of VNICs.

Without security rules, no traffic is allowed in and out of VNICs in the VCN.

Complete the following steps to create an NSG with an ingress rule:

  1. Click Network Security Groups from the Virtual Cloud Networks Details page of the VCN you created previously.
  2. Click Create Network Security Group.
  3. Enter a NAME for the NSG.
  4. Ensure you have permissions to work in the selected compartment and click Next.
  5. Select Ingress for DIRECTION.
  6. Select CIDR for SOURCE TYPE and enter the CIDR for the private subnet in SOURCE CIDR. For this tutorial, you can enter 10.0.0.0/24.
  7. Select TCP for IP PROTOCOL.
  8. Enter 1522 in DESTINATION PORT RANGE.
  9. Click + Another Rule.
  10. Select Egress for DIRECTION.
  11. Select CIDR for DESTINATION TYPE and enter the CIDR for the private subnet in DESTINATION CIDR. For this tutorial, you can enter 10.0.0.0/24.
  12. Retain All Protocols for IP PROTOCOL.
  13. Click Create.
The security rule is added to the NSG.
Important

In this tutorial, the autonomous database is created in the same subnet that is used in Data Catalog to configure the private network for harvesting. For this scenario, you have created the ingress and egress rules specifying the CIDR of the private subnet.

Your autonomous database may be in a different private subnet than the subnet used in Data Catalog to configure the private network for harvesting. In that case, you must create the ingress and egress rules specifying the CIDR of the VCN.

5. Create an Autonomous Database with Private Access

The networking steps to create an Autonomous Data Warehouse or Autonomous Transaction Processing database with a Private IP are the same. For the purposes of this tutorial, you can create either of these databases or even both.

Complete the following steps to create an Autonomous Data Warehouse with private access:

  1. Open the navigation menu and click Oracle Database. Under Autonomous Database, click Autonomous Data Warehouse.
  2. Click Create Autonomous Database.
  3. Ensure you have permission to work in the selected compartment and enter a display name and database name for the autonomous database.
  4. Retain Data Warehouse as the workload type.
  5. Retain Shared Infrastructure as the deployment type.
  6. Configure the database with database version 19c, 1 OCPU count, 1-TB storage, and enable Auto scaling.
  7. Enter a password for the ADMIN user. You need this password when you connect to this database later in this tutorial.
  8. In the Choose network access section, select Private endpoint access only.
  9. Select the VCN, subnet, and NSG you created in the previous steps.
  10. Enter a Hostname prefix. This text appears in the database's private URL.
  11. Retain the BYOL option for license type.
  12. Click Create Autonomous Database.
The autonomous database provisioning process is initiated and the Autonomous Database Details page displays.
6. Create Security Rules

When you create a VCN, a security list is created by default for the VCN. You can add more security rules to this default security list or create a security list to permit traffic in and out of your VCN. In this tutorial, you add security rules to the default security list.

Complete the following steps to create a security list with the required security rules:

  1. Open the navigation menu, click Networking, and then click Virtual Cloud Networks.
  2. Click the VCN you had created previously to view the VCN details.
  3. Click Security Lists from the Virtual Cloud Networks Details page of the VCN you created previously.
  4. Click the Default Security List for <your vcn>.
  5. Click Egress Rules.
  6. Click Add Egress Rules.
  7. Enter the CIDR of your private subnet. For this tutorial, enter 10.0.0.0/24.
  8. Select All Protocols for IP PROTOCOL.
  9. Click Add Egress Rules.
  10. Click Ingress Rules.
  11. Click Add Ingress Rules.
  12. Enter the CIDR of your private subnet. For this tutorial, enter 10.0.0.0/24.
  13. Select TCP for IP PROTOCOL.
  14. Enter 1521–1522 for DESTINATION PORT RANGE.
  15. Click Add Ingress Rules.
The security rule is added to the default security list.
Important

In this tutorial, the autonomous database is created in the same subnet that is used in Data Catalog to configure the private network for harvesting. For this scenario, you have created the ingress and egress rules specifying the CIDR of the private subnet.

Your autonomous database may be in a different private subnet than the subnet used in Data Catalog to configure the private network for harvesting. In that case, you must create the ingress and egress rules specifying the CIDR of the VCN.

1. Create Access Policies

To configure Data Catalog to access the private network of a data source, you need access to networking and data catalog resources.

If you already have access to perform all Data Catalog and Networking operations in your required compartments, you may skip this step.

To create the policy needed to configure a private network in data catalog, perform the following steps:

  1. Open the navigation menu and click Identity & Security. Under Identity, click Policies.
  2. Click Create Policy.
  3. In the Create Policy panel, enter a unique name for the policy. The name must be unique across all policies in your tenancy. You cannot change the name later. For example, data-catalog-private-endpoint-policy.
  4. Next, enter a description such as Grant permissions to create private endpoint and then select Keep Policy Current.
  5. Under Policy Statements field, enter the following policy rule.
    allow group data-catalog-users to manage data-catalog-private-endpoints in tenancy
    Note

    This policy allows users in the data-catalog-users group to perform all data catalog private endpoint operations in any compartment in the tenancy.
  6. Click + Another Statement.
  7. Enter the following policy rule.
    allow group data-catalog-users to manage virtual-network-family in tenancy
    Note

    This policy allows users in the data-catalog-users group to perform all network-related operations in any compartment in the tenancy.
  8. Click Create.
You have successfully created the policies to access the required resources for configuring a private network in Data Catalog.

2. Obtain Data Source Details

You need the private network and database connection information for the autonomous database you want to harvest.

Obtain the following details for the autonomous database:

Information Needed Instructions to Obtain Information
For configuring the private network, you need the VCN and subnet name and the private URL for the database.
  1. From the navigation menu in the Console, click Autonomous Data warehouse.
  2. View the details for the database you want to harvest.
  3. From the Network section, note the VCN, Subnet, and Private Endpoint URL for the database.
Tip

If you have more database in this network (same VCN and subnet) that you want to harvest, make note of the Private URL for those databases too.
For creating the data asset, you need the database name. From the autonomous database details page, note the database name from the General Information section.
For adding a connection, you need the database wallet and login credentials.
  1. From the autonomous database details page, click DB Connection.
  2. Click Download Wallet.
  3. Enter a password for this wallet. You do not use this password in this tutorial.
  4. Click Download.
  5. Save the wallet file in your local machine.

You also need the credentials (username and password) for the database that you specified when you created the autonomous database. If you did not create the autonomous database, obtain the credentials from your admin. While harvesting, you are able to view only the database entities you have access to.

3. Create a Private Endpoint

You create a Data Catalog private endpoint to configure the network access details for the autonomous database data sources you want to harvest.

To create a private endpoint in Data Catalog, perform the following steps:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Data Catalog.
  2. Click Private Endpoints. The Private Endpoints page displays.
  3. Click Create Private Endpoint. A Create Private Endpoint panel displays.
  4. Ensure you have permission to work in the selected compartment, and enter a name for the private endpoint. For example, XYZ Private Endpoint.
  5. Select the VCN and subnet where the autonomous database is hosted.
  6. Enter the DNS zone (Private Endpoint URL) for the autonomous database. Use a comma to enter more than one data source private URL.
    Note

    To view autonomous database Private Endpoint URL in the Console, from the navigation menu click Autonomous Database and view the details for your autonomous database. The Private URL is displayed under Network.
  7. Click Create.
The private endpoint is being created. The create process can take a couple of minutes. When the private endpoint is created successfully, the private endpoint is in ACTIVE status.

If the private endpoint status changes to FAILED, ensure you have the created the access policies and set up your private network correctly.

4. Attach a Private Endpoint

You attach a private endpoint to a data catalog to allow data assets to be created for data sources available in the private network.

To attach a private endpoint to a data catalog, perform the following steps:

  1. Click Data Catalogs.
  2. Click the Actions icon (three dots) for the data catalog where you want to attach the private endpoint and select Attach Private Endpoint.
  3. Select the private endpoint you created in the previous step and click Attach.
The data catalog status changes to Updating, and the private endpoint is being attached. After the private endpoint is attached successfully, the status of the data catalog changes to Active.

5. Create a Data Asset

You are now ready to register your private IP autonomous database with Data Catalog as a data asset . In this tutorial, you create an Autonomous Data Warehouse data asset.

To create an autonomous database data asset, perform the following steps:

  1. Click the data catalog instance where you attached the private endpoint in the previous step.
  2. From your data catalog Home tab, click Create Data Asset from the Quick Actions tile.
  3. In the Create Data Asset panel, enter a Name to uniquely identify your data asset. Optionally, enter a description.
  4. From the Type drop-down list, select Autonomous Data Warehouse.
  5. In the Database Name field, enter the database name you specified when you created the autonomous database.
    Note

    To view autonomous database name in the Console, from the navigation menu click Autonomous Database and view the details for your autonomous database. The database name is displayed under General Information.
  6. Select the Use private endpoint check box.
  7. Click Create.
You have successfully created your autonomous database data asset.

6. Add a Connection

After you register a data source as a data asset in your data catalog, you create a connection to your data asset to be able to harvest it. You can create multiple connections to your data source. At least one connection is needed to be able to harvest a data asset.

For autonomous database data source types, you can use secrets in Oracle Cloud Infrastructure Vault to store the password that you need to connect to the source using a connection. By using OCI Vault, you provide the OCID of the secret when specifying the connection details, so you don't have to enter the actual password when you create the data asset. You can also use secrets for the Oracle wallet and passwords instead of uploading the wallet when you create your data asset.

A vault is a container for keys and secrets. Secrets store credentials such as required passwords for connecting to data sources. You use an encryption key in a vault to encrypt and import secret contents to the vault. Secret contents are based64-encoded. Data Catalog uses the same key to retrieve and decrypt secrets while connecting a data asset to the data source. For more information about vault, key, and secret, see Overview of Vault. For information about copying the secret OCID, see View Secret Details.

To use an Oracle wallet with secrets in OCI Vault, you must:
  • Provide a wallet password when you download the wallet.
  • Remove the .p12 file from the downloaded wallet zip.
  • Use any base64 encoder to encode the modified wallet zip to base64.
  • Copy the base64-encoded data to a secret in a vault.
  • Create a secret for the database password.

To add a connection for the autonomous database data asset, follow these steps:

  1. On the Home tab, click Data Assets.
  2. In the Data Assets list, select the autonomous database data asset that you created.
  3. On the data asset details page, under Summary, in the Connections section, click Add Connection.
  4. In the Add Connection panel, enter the details as described in the following table:
    Field Description
    Name Enter a unique name for your connection.
    Description Enter a short description for your connection.
    Type Select Generic.
    Use Wallet Select this option if you want to upload the file with the client credentials that is downloaded from the autonomous database. When you select this option, the following fields appear:
    • Wallet - Upload the file with the client credentials that is downloaded from your autonomous database. For more information, see downloading client credentials.
    • TNS Alias - From the TNS Alias drop-down list, select a TNS Alias. For better performance, select the <name>_low option.
    • User Name - Enter the admin user name that is set when the autonomous database is created.
    • Password - Enter the password for the autonomous database admin user name.
    Use Vault Secret OCID Select this option if you want to enter the OCID of the secret that is created in OCI Vault for the client credentials file and file password. When you select this option, the following fields appear:
    • Vault Secret OCID for Wallet - Enter the OCID of the secret that is created in OCI Vault for the wallet. For information about copying the secret OCID, see View Secret Details.
    • User Name - Enter the admin user name that is set when the autonomous database is created.
    • Vault Secret OCID for Password - Enter the OCID of the secret that is created in OCI Vault for password for the autonomous database admin user name.
    Make this the default connection for the data asset. Select this check box if you want to make this connection the default connection for the data asset.
    Test Connection Click the button to test your connection.
  5. Click Add.

7. Harvest the Data Asset

You are now ready to harvest your autonomous database data asset. Your autonomous database must have the data from which you want to harvest the technical metadata. If you used the setup instructions in this tutorial, you can harvest metadata from the default data that is available in your autonomous database.

To harvest your autonomous database data asset, perform the following steps:

  1. Click Harvest on the data asset details page of a data asset.
  2. The Select Connection page displays and the default connection is selected. Click Next.
  3. The Select Data Entities page displays. View and add all the data entities you want to harvest from the Available ADW Schema section.
    1. Click the add icon for each data entity you want to include in the harvest job.
    2. Click Add All to select all the entities for harvesting.
    3. Use the Filter ADW Schema box to find a data entity from the available data entities.
    4. Use the page navigation icons to browse all the data entities.
    5. Click the remove icon for any selected data entity that you want to remove from the harvest job.
    6. If you need to start over, click Remove All and then start over.
    After you have reviewed the data entities you want to harvest from the Selected ADW Schema / Data Entities section, click Next.
  4. The Create Job page displays. In the Job Name field, enter a unique name to identify the harvest job.
  5. Optionally, enter a Description.
  6. Select Run job now and then click Create Job.
  7. The job to harvest your autonomous database data asset is created successfully and the Jobs tab displays. To view job details, click the job name.
Your data asset is harvested successfully and you can review the harvest job details.