Data Profiles and Semantic Recommendations

When you create a dataset, Oracle Analytics performs column-level profiling to produce a set of semantic recommendations to repair or enrich your data. When you create workbooks, you can also include knowledge enrichments in your visualizations by adding them from the Data Panel.

Note:

Knowledge enrichments are usually enabled by default, but workbook editors can enable or disable them for datasets that they own or have editing privileges for. Oracle Analytics doesn't automatically provide enrichment recommendations for datasets generated from a data flow. In this case, the dataset owner or administrator must first enable the knowledge enrichments option for the dataset. See Enable Knowledge Enrichments for Datasets.

These recommendations are based on the system automatically detecting a specific semantic type during the profile step. For example, datasets based on local subject areas are profiled using a simple Top N sample.

There are categories of semantic types such as geographic locations identified by city names, recognizable patterns as in credit cards, email addresses and social security numbers, dates, and recurring patterns. You can also create your own custom semantic types.

Semantic Type Categories

Profiling is applied to various semantic types.

Semantic type categories are profiled to identify:

  • Geographic locations such as city names.
  • Patterns such as those found with credit cards numbers or email addresses.
  • Recurring patterns such as hyphenated phrase data.

Semantic Type Recommendations

Recommendations to repair, enhance, or enrich the dataset, are determined by the type of data.

Examples of semantic type recommendations:

  • Enrichments - Adding a new column to your data that corresponds to a specific detected type, such as a geographic location. For example, adding population data for a city.
  • Column Concatenations - When two columns are detected in the dataset, one containing first names and the other containing last names, the system recommends concatenating the names into a single column. For example, a first_name_last_name column.
  • Semantic Extractions - When a semantic type is composed of subtypes, for example a us_phone number that includes an area code, the system recommends extracting the subtype into its own column.
  • Part Extraction - When a generic pattern separator is detected in the data, the system recommends extracting parts of that pattern. For example if the system detects a repeating hyphenation in the data, it recommends extracting the parts into separate columns to potentially make the data more useful for analysis.
  • Date Extractions - When dates are detected, the system recommends extracting parts of the date that might augment the analysis of the data. For example, you might extract the day of week from an invoice or purchase date.
  • Full and Partial Obfuscation/Masking/Delete - When sensitive fields are detected such as a credit card number, the system recommends a full or partial masking of the column, or even removal.

Recognized Pattern-Based Semantic Types

Semantic types are identified based on patterns found in your data.

Recommendations are provided for these semantic types:

  • Dates (in more than 30 formats)
  • US Social Security Numbers (SSN)
  • Credit Card Numbers
  • Credit Card Attributes (CVV and Expiration Date)
  • Email Addresses
  • North American Plan Phone Numbers
  • US Addresses

Reference-Based Semantic Types

Recognition of semantic types is determined by loaded reference knowledge provided with the service.

Reference-based recommendations are provided for these semantic types:

  • Country names
  • Country codes
  • State names (Provinces)
  • State codes
  • County names (Jurisdictions)
  • City names (Localized Names)
  • Zip codes

Recommended Enrichments

Recommended enrichments are based on the semantic types.

Enrichments are determined based on the geographic location hierarchy:

  • Country
  • Province (State)
  • Jurisdiction (County)
  • Longitude
  • Latitude
  • Population
  • Elevation (in Meters)
  • Time zone
  • ISO country codes
  • Federal Information Processing Series (FIPS)
  • Country name
  • Capital
  • Continent
  • GeoNames ID
  • Languages spoken
  • Phone country code
  • Postal code format
  • Postal code pattern
  • Phone country code
  • Currency name
  • Currency abbreviation
  • Geographic top-level domain (GeoLTD)
  • Square KM

Required Thresholds

The profiling process uses specific thresholds to decide about specific semantic types.

As a general rule, 85% of the data values in the column must meet the criteria for a single semantic type in order for the system to make the classification determination. As a result, a column that might contain 70% first names and 30% “other”, doesn't meet the threshold requirements and therefore no recommendations are made.

Custom Knowledge Recommendations

Use custom knowledge recommendations to augment the Oracle Analytics system knowledge. Custom knowledge enables the Oracle Analytics semantic profiler to identify more business-specific semantic types and make more relevant and governed enrichment recommendations. For example, you might add a custom knowledge reference that classifies prescription medication into USP drug categories Analgesics or Opioid.

Tutorial icon Tutorial

Ask your administrator to upload custom knowledge files to Oracle Analytics. When you enrich datasets, Oracle Analytics presents enrichment recommendations based on this semantic data. When you create workbooks, you can also include knowledge enrichments in your visualizations by adding them from the Data Panel.

Creating Your Own Custom Knowledge Files

When you create semantic files, follow these guidelines:

  • Create a data file in CSV or Microsoft Excel (XLSX) format. The maximum file size you can upload is 250 MB.
  • Populate the first column with the key, which Oracle Analytics uses to profile the data. For example, the key might be a date with the grain of day to enable data to be analyzed by fiscal year.
  • Populate the other columns with the enrichment values.

Ask your administrator to upload your custom knowledge file to Oracle Analytics.

Example - Integrate Business Timeframes into Your Data

This example illustrates how you can add business timeframes into sales data and enable the analysis of sales by fiscal year if the original dataset doesn't contain fiscal data.

The example visualization shows sales by quarter in years 2019, 2020, 2021, 2022, and 2023, where each year is represented in a different color. You don't have fiscal data in your source sales data, so you deploy additional custom knowledge to add fiscal data to your dataset.

Firstly, you prepare fiscal data in a file Fiscal Calendar.xlsx. Your file contains date (mm-dd-yyyy), fiscal year, fiscal month, and fiscal week. For example, your source file could have 01-23-2025 in the date column, 2025 in the fiscal year column, and the attributes to complete remaining columns..

Ask your administrator to upload Fiscal Calendar.xlsx to the custom knowledge area in Console.

You then create a dataset containing Sales and ORDER_DATE, and in the dataset editor select Enrich ORDER_DATE WITH Fiscal Year and Enrich ORDER_DATE with Fiscal Month in the enrichment recommendations. Oracle Analytics adds these two enrichments to the dataset.

Finally, you creates a workbook and add Fiscal Year and Fiscal Qtr (under ORDER_DATE) and Sales to a visualization. Note: You can add Fiscal Year and Fiscal Qtr directly without having to add the original ORDER_DATE column.