Enrich and Transform Reference

Use this reference information to help you enrich and transform your data.

Transform Reference

Find out about the data transform options that you can access in the transform editor by right-clicking on a dataset column. For example, to categorize racing lap times in a dataset column, you might right-click the 'Lap Time' column and select Bin.

Option Description
Bin Creates your own custom groups for number ranges. For example, you can create bins for an Age column with age ranges binned into Pre-Teen, Young Adult, Adult, or Senior based on custom requirements.
Convert to Date Changes the data type of the column to date and deletes any values that aren’t dates from the column.
Convert to Number Changes the data type of the column to number, which deletes any values that aren't numbers from the column.
Convert to Text Changes the data type of a column to text.
Create Creates a column based on a function.
Duplicate Creates a column with identical content of the selected column.
Edit Edits the column. For example, you can change the name, select another column, or update functions.
Group, Conditional Group Select Group to create your own custom groups. For example, you can group States together with custom regions, and you can categorize dollar amounts into groups indicating small, medium, and large.
Hide Hides the column in the Data Panel and in the visualizations. If you want to see the hidden columns, click Hidden columns (ghost icon) on the page footer. You can then unhide individual columns or unhide all the hidden columns at the same time.
Log Calculates the natural logarithm of an expression.
Lowercase Updates the contents of a column with the values all in lowercase letters.
Power Raises the values of a column to the power that you specify. The default power is 2.
Rename Allows you to change the name of any column.
Replace Changes specific text in the selected column to any value that you specify. For example, you can change all instances of Mister to Mr. in the column.
Sentence Case Updates the contents of a column to make the first letter of the first word of a sentence uppercase.
Split Splits a specific column value into parts. For example, you can split a column called, Name, into first and last name.
Square Root Creates a column populated with the square root of the value in the column selected.
Uppercase Updates the contents of a column with the values in all uppercase letters.

Data Profiles and Semantic Recommendations

When you create a dataset, Oracle Analytics performs column-level profiling to produce a set of semantic recommendations to repair or enrich your data. When you create workbooks, you can also include knowledge enrichments in your visualizations by adding them from the Data Panel.

These recommendations are based on the system automatically detecting a specific semantic type during the profile step. For example, datasets based on local subject areas are profiled using a simple Top N sample.

There are categories of semantic types such as geographic locations identified by city names, recognizable patterns as in credit cards, email addresses and social security numbers, dates, and recurring patterns. You can also create your own custom semantic types.

Semantic Type Categories

Profiling is applied to various semantic types.

Semantic type categories are profiled to identify:

  • Geographic locations such as city names.
  • Patterns such as those found with credit cards numbers or email addresses.
  • Recurring patterns such as hyphenated phrase data.

Semantic Type Recommendations

Recommendations to repair, enhance, or enrich the dataset, are determined by the type of data.

Examples of semantic type recommendations:

  • Enrichments - Adding a new column to your data that corresponds to a specific detected type, such as a geographic location. For example, adding population data for a city.
  • Column Concatenations - When two columns are detected in the dataset, one containing first names and the other containing last names, the system recommends concatenating the names into a single column. For example, a first_name_last_name column.
  • Semantic Extractions - When a semantic type is composed of subtypes, for example a us_phone number that includes an area code, the system recommends extracting the subtype into its own column.
  • Part Extraction - When a generic pattern separator is detected in the data, the system recommends extracting parts of that pattern. For example if the system detects a repeating hyphenation in the data, it recommends extracting the parts into separate columns to potentially make the data more useful for analysis.
  • Date Extractions - When dates are detected, the system recommends extracting parts of the date that might augment the analysis of the data. For example, you might extract the day of week from an invoice or purchase date.
  • Full and Partial Obfuscation/Masking/Delete - When sensitive fields are detected such as a credit card number, the system recommends a full or partial masking of the column, or even removal.

Recognized Pattern-Based Semantic Types

Semantic types are identified based on patterns found in your data.

Recommendations are provided for these semantic types:

  • Dates (in more than 30 formats)
  • US Social Security Numbers (SSN)
  • Credit Card Numbers
  • Credit Card Attributes (CVV and Expiration Date)
  • Email Addresses
  • North American Plan Phone Numbers
  • US Addresses

Reference-Based Semantic Types

Recognition of semantic types is determined by loaded reference knowledge provided with the service.

Reference-based recommendations are provided for these semantic types:

  • Country names
  • Country codes
  • State names (Provinces)
  • State codes
  • County names (Jurisdictions)
  • City names (Localized Names)
  • Zip codes

Recommended Enrichments

Recommended enrichments are based on the semantic types.

Enrichments are determined based on the geographic location hierarchy:

  • Country
  • Province (State)
  • Jurisdiction (County)
  • Longitude
  • Latitude
  • Population
  • Elevation (in Meters)
  • Time zone
  • ISO country codes
  • Federal Information Processing Series (FIPS)
  • Country name
  • Capital
  • Continent
  • GeoNames ID
  • Languages spoken
  • Phone country code
  • Postal code format
  • Postal code pattern
  • Phone country code
  • Currency name
  • Currency abbreviation
  • Geographic top-level domain (GeoLTD)
  • Square KM

Required Thresholds

The profiling process uses specific thresholds to deciside about specific semantic types.

As a general rule, 85% of the data values in the column must meet the criteria for a single semantic type in order for the system to make the classification determination. As a result, a column that might contain 70% first names and 30% “other”, doesn't meet the threshold requirements and therefore no recommendations are made.

Custom Knowledge Recommendations

Use custom knowledge recommendations to augment the Oracle Analytics system knowledge. Custom knowledge enables the Oracle Analytics semantic profiler to identify more business-specific semantic types and make more relevant and governed enrichment recommendations. For example, you might add a custom knowledge reference that classifies prescription medication into USP drug categories Analgesics or Opioid.

Tutorial icon Tutorial

You can use existing semantic files such as Unsupervised Semantic Parsing (USP) files, or you can create your own semantic files. Ask your administrator to upload custom knowledge files to Oracle Analytics. When you enrich datasets, Oracle Analytics presents enrichment recommendations based on this semantic data. When you create workbooks, you can also include knowledge enrichments in your visualizations by adding them from the Data Panel.

Creating Your Own Custom Knowledge Files

When you create your own semantic files, follow these guidelines:

  • Create a data file in CSV or Microsoft Excel (XLSX) format.
  • Populate the first column with the key, which Oracle Analytics uses to profile the data.
  • Populate the other columns with the enrichment values.

Ask your administrator to upload your custom knowledge file to Oracle Analytics.

General Custom Format Strings

You can use general custom format strings to create custom time or date formats.

The table shows the general custom format strings and the results that they display. These allow the display of date and time fields in the user's locale.

General Format String Result

[FMT:dateShort]

Formats the date in the locale's short date format. You can also type [FMT:date].

[FMT:dateLong]

Formats the date in the locale's long date format.

[FMT:dateInput]

Formats the date in a format acceptable for input back into the system.

[FMT:time]

Formats the time in the locale's time format.

[FMT:timeHourMin]

Formats the time in the locale's time format but omits the seconds.

[FMT:timeInput]

Formats the time in a format acceptable for input back into the system.

[FMT:timeInputHourMin]

Formats the time in a format acceptable for input back into the system, but omits the seconds.

[FMT:timeStampShort]

Equivalent to typing [FMT:dateShort] [FMT:time]. Formats the date in the locale's short date format and the time in the locale's time format. You can also type [FMT:timeStamp].

[FMT:timeStampLong]

Equivalent to typing [FMT:dateLong] [FMT:time]. Formats the date in the locale's long date format and the time in the locale's time format.

[FMT:timeStampInput]

Equivalent to [FMT:dateInput] [FMT:timeInput]. Formats the date and the time in a format acceptable for input back into the system.

[FMT:timeHour]

Formats the hour field only in the locale's format, such as 8 PM.

YY or yy

Displays the last two digits of the year, for example 11 for 2011.

YYY or yyy

Displays the last three digits of the year, for example, 011 for 2011.

YYYY or yyyy

Displays the four-digit year, for example, 2011.

M

Displays the numeric month, for example, 2 for February.

MM

Displays the numeric month, padded to the left with zero for single-digit months, for example, 02 for February.

MMM

Displays the abbreviated name of the month in the user's locale, for example, Feb.

MMMM

Displays the full name of the month in the user's locale, for example, February.

D or d

Displays the day of the month, for example, 1.

DD or dd

Displays the day of the month, padded to the left with zero for single-digit days, for example, 01.

DDD or ddd

Displays the abbreviated name of the day of the week in the user's locale, for example, Thu for Thursday.

DDDD or dddd

Displays the full name of the day of the week in the user's locale, for example, Thursday.

DDDDD or ddddd

Displays the first letter of the name of the day of the week in the user's locale, for example, T for Thursday.

r

Displays the day of year, for example, 1.

rr

Displays the day of year, padded to the left with zero for single-digit day of year, for example, 01.

rrr

Displays the day of year, padded to the left with zero for single-digit day of year, for example, 001.

w

Displays the week of year, for example, 1.

ww

Displays the week of year, padded to the left with zero for single-digit weeks, for example, 01.

q

Displays the quarter of year, for example, 4.

h

Displays the hour in 12-hour time, for example 2.

H

Displays the hour in 24-hour time, for example, 23.

hh

Displays the hour in 12-hour time, padded to the left with zero for single-digit hours, for example, 01.

HH

Displays the hour in 24-hour time, padded to the left with zero for single digit hours, for example, 23.

m

Displays the minute, for example, 7.

mm

Displays the minute, padded to the left with zero for single-digit minutes, for example, 07.

s

Displays the second, for example, 2.

You can also include decimals in the string, such as s.# or s.00 (where # means an optional digit, and 0 means a required digit).

ss

Displays the second, padded to the left with zero for single-digit seconds, for example, 02.

You can also include decimals in the string, such as ss.# or ss.00 (where # means an optional digit, and 0 means a required digit).

S

Displays the millisecond, for example, 2.

SS

Displays the millisecond, padded to the left with zero for single-digit milliseconds, for example, 02.

SSS

Displays the millisecond, padded to the left with zero for single-digit milliseconds, for example, 002.

tt

Displays the abbreviation for ante meridiem or post meridiem in the user's locale, for example, pm.

gg

Displays the era in the user's locale.