Harvest On-Premises Data Sources

Harvesting is a process that extracts technical metadata from your data source into your data catalog. This tutorial provides the steps to harvest from an on-premises Oracle Database data source that is connected to Oracle Cloud Infrastructure using a Virtual Cloud Network (VCN).

In this tutorial, you:

  1. Create the policies needed to harvest on-premises data sources.
  2. Obtain the on-premises database access details.
  3. Create a private endpoint in Data Catalog.
  4. Attach the private endpoint to your data catalog.
  5. Create a data asset.
  6. Add a connection for the data asset.
  7. Harvest the data asset.

For additional information, see configuring a private network.

Before You Begin

To successfully perform this tutorial, you must have the following:

Before you can harvest an on-premises data source, you must connect your on-premises data source to Oracle Cloud Infrastructure.

Connecting an On-Premises Data Source to Oracle Cloud Infrastructure

1. Create Access Policies for Network Resources

You create policies in Oracle Cloud Infrastructure to allow access to the various resources. Before you can create a private network in your tenancy, you must have the required networking permissions.

In this setup, you create a policy to allow you to perform all networking operations in any compartment in your tenancy.

Perform the following steps:

  1. Open the navigation menu and click Identity & Security. Under Identity, click Policies.
  2. Click Create Policy.
  3. In the Create Policy panel, enter a unique name for the policy. The name must be unique across all policies in your tenancy. You cannot change the name later. For example, create-private-network-policy.
  4. Enter a Description such as Grant permissions to create a private network .
  5. In the Policy Builder section, move the slider to Show manual editor, and enter the policy rule. For example, for the data-catalog-users group, enter the following policy rule:
    allow group data-catalog-users to manage virtual-network-family in tenancy
    Note

    This policy allows users in the data-catalog-users group to perform all network-related operations in any compartment in the tenancy.
  6. Click Create.
You have successfully created the policy to access networking resources.
2. Create a Virtual Cloud Network

A Virtual Cloud Network (VCN) is a virtual, private network that you set up in a single Oracle Cloud Infrastructure region. A VCN has a single, contiguous IPv4 CIDR block of your choice.

The allowable VCN size range is /16 to /30. Decide on the CIDR block before you create a VCN. You can't change the CIDR value later. For your reference, here's a CIDR Calculator.

To create a VCN, complete the following steps:

  1. Open the navigation menu, click Networking, and then click Virtual Cloud Networks.
  2. Click Create VCN.
  3. Enter a NAME to identify your VPN and select the compartment you have permission to work in.
  4. Enter the CIDR block for the VCN.
    Important

    The CIDR for your VCN and on-premises network should not overlap.
  5. Select DNS RESOLUTION and enter a DNS LABEL.
  6. Click Create VCN.

The VCN is created and the Virtual Cloud Networks Details page for the VCN displays.

By default, a route table, DHCP option, and security list are automatically created for the VCN.

3. Create a Dynamic Routing Gateway

Dynamic Routing Gateway (DRG) is a virtual router that you must add to your VCN to provide a path for private network traffic between your VCN and on-premises network.

After you create a DRG, you configure access to the on-premises network using either VPN Connect or FastConnect. In this setup, instructions to use DRG with FastConnect are provided.

To create a DRG, complete the following steps:

  1. Open the navigation menu and click Networking. Under Customer Connectivity, click Dynamic Routing Gateway.
  2. Click Create Dynamic Routing Gateway.
  3. Ensure you have permissions to work in the selected compartment or select the compartment where you want to create the DRG.
  4. Enter a NAME for the DRG.
  5. Click Create Dynamic Routing Gateway. The DRG is created and the details page displays. You now configure the DRG with FastConnect.
  6. Click Virtual Circuits.
  7. Click the FastConnect link in the Virtual Circuits table. Alternatively, from the navigation menu select Networking and then click FastConnect.
  8. Create your FastConnect connections depending on whether you want to connect with an Oracle Partner, a third-party provider, or colocate with Oracle. While creating your FastConnect connections, select a Private Virtual Circuit and the DRG you created previously.
You have created a DRG and configured the connections to your on-premises data source.
4. Attach the DRG to the VCN

After you have configured the Virtual Circuits for your DRG, you attach this DRG to your VCN. You can attach only one DRG to a VCN at a time, and you can attach a DRG to only one VCN at a time.

Complete the following steps to attach a DRG to a VCN:

  1. Open the navigation menu, click Networking, and then click Virtual Cloud Networks.
  2. Click the VCN you created previously.
  3. Click Dynamic Routing Gateways.
  4. Click Attach Dynamic Routing Gateway.
  5. Select the compartment where you created the DRG and then select the DRG you created and configured previously.
  6. Click Attach.
You have attached the DRG to your VCN.
5. Update the Route Table

You route your private subnet traffic to the DRG using a route table for the private subnet.

Complete the following steps to update the default route table for your private subnet:

  1. Open the navigation menu, click Networking, and then click Virtual Cloud Networks.
  2. Click the VCN you created previously to view the VCN details.
  3. Click Route Tables.
  4. Click Default Route Table for <your-vcn> or the route table you specified for the private subnet.
  5. Click Add Route Rules.
  6. In the Add Route Rules panel, select Dynamic Routing Gateway for TARGET TYPE.
  7. In DESTINATION CIDR BLOCK, enter the CIDR for your on-premises network.
  8. Click Add Route Rules.
The route rule is added to the subnet's route table.
6. Create a Dynamic Host Configuration Protocol (DHCP) Option

You use DHCP options for a VCN to automatically provide configuration information to instances when they boot up.

To create DHCP options, complete the following steps:

  1. Open the navigation menu, click Networking, and then click Virtual Cloud Networks.
  2. Select the VCN you are configuring for your on-premises data source.
  3. Click DHCP Options.
  4. Click Create DHCP Options.
  5. Enter a NAME for the DHCP options.
  6. Ensure you have permissions to work in the selected compartment or select the compartment where you want to create the DHCP options.
  7. Select Internet and VCN Resolver for DNS Type.
  8. Click Create DHCP Options.
The set of DHCP options is created and then displayed on the DHCP Options page of the compartment you chose.
7. (Optional) Create a NAT Gateway

You use a Network Address Translation (NAT) gateway to give instances in your private network access to the Internet without exposing your instances to incoming internet connections. Creating a NAT gateway is an optional step that you can complete for your VCN. You can harvest the on-premises Oracle Database without creating a NAT gateway.

  1. Open the navigation menu, click Networking, and then click Virtual Cloud Networks.
  2. Select the VCN you are configuring for your on-premises data source.
  3. Click NAT Gateways.
  4. Click Create NAT Gateway.
  5. Enter a NAME for the NAT Gateway.
  6. Ensure you have permissions to work in the selected compartment or select the compartment where you want to create the NAT Gateway.
  7. Click Create NAT Gateway.
  8. Create a route rule that directs the required traffic from your private subnet to the NAT gateway. Click Route Tables.
  9. Select the route table for your VCN.
  10. Click Add Route Rule.
  11. Select NAT Gateway for Target Type.
  12. Enter 0.0.0.0/0 for Destination CIDR Block.
  13. Select the compartment where you created the NAT Gateway previously.
  14. Select the NAT Gateway you created previously for Target NAT Gateway.
  15. Click Add Route Rule.
You have created and configured the NAT Gateway for your VCN. Any subnet traffic with a destination that matches the rule is routed to the NAT gateway.
8. Create Security Rules

When you create a VCN, a security list is created by default for the VCN. You can add more security rules to this default security list or create a security list to permit traffic in and out of your VCN. In this tutorial, you add a security rule to the default security list.

Complete the following steps to create a security list with the required security rules:

  1. Open the navigation menu, click Networking, and then click Virtual Cloud Networks.
  2. Click the VCN you had created previously to view the VCN details.
  3. Click Security Lists from the Virtual Cloud Networks Details page of the VCN you created previously.
  4. Click the Default Security List for <your vcn>.
  5. Click Egress Rules.
  6. Ensure you have the default egress rule created to allow traffic for all protocols.
  7. Click Ingress Rules.
  8. Click Add Ingress Rules.
  9. Enter the CIDR of your on-premises network that contains the DNS server IP used to resolve the FQDNs of your on-premises data sources.
  10. Select TCP for IP PROTOCOL.
  11. Enter 1521–1522 for DESTINATION PORT RANGE.
  12. Click Add Ingress Rules.
The security rule is added to the default security list.
9. Create a Private Subnet

Subnets are divisions you create in a VCN. Each subnet consists of a contiguous range of IP addresses that do not overlap with other subnets in the VCN. You create a private subnet when you don't want the resources created in the subnet to have public IP addresses.

Complete the following steps to create a private subnet:

  1. Click Create Subnet from the Virtual Cloud Networks Details page of the VCN you created in the previous step.
  2. Enter a NAME for the private subnet.
  3. Retain the default REGIONAL selection for SUBNET TYPE.
  4. Enter the CIDR block for the private subnet.
  5. Select the default ROUTE TABLE or the route table you updated with the DRG.
  6. Select PRIVATE SUBNET for SUBNET ACCESS.
  7. Select Use DNS Hostnames in this Subnet for DNS RESOLUTION.
  8. Enter a DNS LABEL.
  9. Select the DHCP OPTIONS you created previously.
  10. Select the default SECURITY LISTS where you added the security rule for your VCN.
  11. Click Create Subnet.

The private subnet is created and displayed on the Subnets page in the compartment you chose.

1. Create Access Policies

To configure Data Catalog to access the private network of a data source, you need access to networking and data catalog resources.

If you already have access to perform all Data Catalog and Networking operations in your required compartments, you may skip this step.

To create the policy needed to configure a private network in data catalog, perform the following steps:

  1. Open the navigation menu and click Identity & Security. Under Identity, click Policies.
  2. Click Create Policy.
  3. In the Create Policy panel, enter a unique name for the policy. The name must be unique across all policies in your tenancy. You cannot change the name later. For example, data-catalog-private-endpoint-policy.
  4. Enter a Description such as Grant permissions to create a private network .
  5. In the Policy Builder section, move the slider to Show manual editor, and enter the policy rule. For example, for the data-catalog-users group, enter the following policy rule:
    allow group data-catalog-users to manage data-catalog-private-endpoints in tenancy
    Note

    This policy allows users in the data-catalog-users group to perform all data catalog private endpoint operations in any compartment in the tenancy.
  6. Click + Another Statement.
  7. Enter the following policy rule.
    allow group data-catalog-users to manage virtual-network-family in tenancy
    Note

    This policy allows users in the data-catalog-users group to perform all network-related operations in any compartment in the tenancy.
  8. Click Create.
You have successfully created the policies to access the required resources for configuring a private network in Data Catalog.

2. Obtain Data Source Details

You need the private network and database connection information for the on-premises Oracle Database you want to harvest.

Obtain the following details for the on-premises Oracle Database from your administrator:

  • For configuring the private network, you need the VCN and subnet name and the URL for the Oracle Database.
  • For creating the data asset, you need the Oracle Database host, port, and database service name or SID.
  • For adding a connection, you need the database login credentials.

3. Create a Private Endpoint

You create a Data Catalog private endpoint to configure the network access details for the on-premises Oracle Database data source you want to harvest.

To create a private endpoint in Data Catalog, perform the following steps:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Data Catalog.
  2. Click Private Endpoints. The Private Endpoints page displays.
  3. Click Create Private Endpoint. A Create Private Endpoint panel displays.
  4. Ensure you have permission to work in the selected compartment, and enter a name for the private endpoint. For example, XYZ Private Endpoint.
  5. Select the VCN and subnet that is used to connect your on-premises Oracle Database to Oracle Cloud Infrastructure.
  6. Enter the DNS zone for the Oracle Database. Use a comma to enter more than one data source DNS zone.
  7. Click Create.
The private endpoint is being created. The create process can take a couple of minutes. When the private endpoint is created successfully, the private endpoint is in ACTIVE status.

If the private endpoint status changes to FAILED, ensure you have the created the access policies and set up your private network correctly.

4. Attach a Private Endpoint

You attach a private endpoint to a data catalog to allow data assets to be created for data sources available in the private network.

To attach a private endpoint to a data catalog, perform the following steps:

  1. Click Data Catalogs.
  2. Click the Actions icon (three dots) for the data catalog where you want to attach the private endpoint and select Attach Private Endpoint.
  3. Select the private endpoint you created in the previous step and click Attach.
The data catalog status changes to Updating, and the private endpoint is being attached. After the private endpoint is attached successfully, the status of the data catalog changes to Active.

5. Create a Data Asset

You are now ready to register your on-premises Oracle Database data source with Data Catalog as a data asset .

To create an Oracle Database data asset, perform the following steps:

  1. Click the data catalog instance where you attached the private endpoint in the previous step.
  2. From your data catalog Home tab, click Create Data Asset from the Quick Actions tile.
  3. In the Create Data Asset panel, enter a Name to uniquely identify your data asset. Optionally, enter a description.
  4. From the Type drop-down list, select Oracle Database.
  5. In the Host field, enter the database hostname.
  6. In the Port field, enter the database port.
  7. In the Database field, enter the database service name or SID.
  8. Select the Use private endpoint check box.
  9. Click Create.
You have successfully created your Oracle Database data asset.

6. Add a Connection

After creating the Oracle Database data asset, you add a connection for the data asset.

For Oracle database, you can use secrets in Oracle Cloud Infrastructure Vault to store the password that you need to connect to the source using a connection. By using OCI Vault, you provide the OCID of the secret when specifying the connection details, so you don't have to enter the actual password when you create the data asset.

A vault is a container for keys and secrets. Secrets store credentials such as required passwords for connecting to data sources. You use an encryption key in a vault to encrypt and import secret contents to the vault. Secret contents are based64-encoded. Data Catalog uses the same key to retrieve and decrypt secrets while connecting a data asset to the data source. For more information about vault, key, and secret, see Overview of Vault. For information about copying the secret OCID, see View Secret Details.

To add a connection for the Oracle Database data asset, follow these steps:

  1. On the Home tab, click Data Assets.
  2. In the Data Assets list, select the Oracle Database data asset that you created.
  3. On the data asset details page, under Summary, in the Connections section, click Add Connection.
  4. In the Add Connection panel, enter the details as described in the following table:
    Field Description
    Name Enter a unique name for your connection.
    Description Enter a short description for your connection.
    Type Select JDBC.
    User Name Enter your Oracle Database user name
    Use Password Select this option if you want to enter the password associated with your Oracle Database user name. When you select this option, the following field appears:
    • Password - Enter the password associated with your Oracle Database user name.
    Use Vault Secret OCID Select this option if you want to enter the OCID of the secret that is created in OCI Vault for password associated with your Oracle Database user name. When you select this option, the following field appears:
    • Vault Secret OCID for Password - Enter the OCID of the secret that is created in OCI Vault for the password associated with your Oracle Database user name. For information about copying the secret OCID, see View Secret Details
    Enable TLS Select this check box if you want to enable TLS for this connection.
    Make this the default connection for the data asset. Select this check box if you want to make this connection the default connection for the data asset.
    Test Connection Click the button to test your connection.
  5. Click Add.

7. Harvest the Data Asset

You are now ready to harvest your Oracle Database data asset.

To harvest your Oracle Database data asset, perform the following steps:

  1. Click Harvest on the data asset details page for the data asset.
  2. The Select Connection page displays and the default connection is selected. Click Next.
  3. The Select Data Entities page displays. View and add all the data entities you want to harvest from the Available Oracle Schema section.
    1. Click the add icon for each data entity you want to include in the harvest job.
    2. Click Add All to select all the entities for harvesting.
    3. Use the Filter Oracle Schema box to find a data entity from the available data entities.
    4. Use the page navigation icons to browse all the data entities.
    5. Click the remove icon for any selected data entity that you want to remove from the harvest job.
    6. If you need to start over, click Remove All and then start over.
    After you have reviewed the data entities you want to harvest from the Selected Oracle Schema / Data Entities section, click Next.
  4. The Create Job page displays. In the Job Name field, enter a unique name to identify the harvest job.
  5. Optionally, enter a Description.
  6. Select Run job now and then click Create Job.
  7. The job to harvest your Oracle Database data asset is created successfully and the Jobs tab displays. To view job details, click the job name.
Your data asset is harvested successfully and you can review the harvest job details.