Using Data Flow Operators

In Data Integration, data flow operators represent input sources, output targets, and transformations that can be used in a data flow.

From the Operators panel, drag and drop operators onto the canvas to design a data flow. Then use the Details tab on the Properties panel to configure basic and required properties for each operator. Where applicable, use the Advanced options tab to specify other properties.

In general, a data flow operator can have one or more inbound ports, and one or more outbound ports for data to flow through. For example, you can connect the same source outbound port to inbound ports on a filter, a join, and a target operator. You can also connect another source operator to the same join inbound port.

Your data flow must include at least one source operator and one target operator to be valid. While Data Integration supports multiple target operators in a data flow, a target operator can have only one inbound port.

To connect operators, hover over an operator until you see the connector (small circle) on the right side of the operator. Then drag and drop the connector to the next operator you want to connect to. A connection is valid when a line connects the operators after you drop the connector.

A connection line symbolizes how data flows from one node to the other. While you might be able to drag a visible connector from one object to another, you cannot have more than one inbound connection line to a filter, expression, aggregate, distinct, sort, and target operator.

Important

For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.

Data Operators

Data Integration provides the source operator and target operator for adding input and output data entities to serve as the input to data flows, and the output for transformed data.

Selecting a data asset, connection, and schema

To configure the input or output data entity for a data operator in a data flow, you begin by selecting a data asset, connection, and schema.

You make the selections only in the order as displayed on the Details tab of the Properties panel, by clicking Select when the option is enabled next to the resource type.

For example, when you first add a source operator, only Data asset is enabled for selection. The select option for the subsequent resource type, Connection, is enabled only after you make a selection for the preceding object.

When you click Select next to a resource, you are presented with a panel to select the object you want. You can use the resource menu to select the object, or click View all to use another panel to browse or search, and then select the object.

Upon selecting a resource object, each subsequent selection is based on the parent-child relation inherited from the preceding selection. A breadcrumb at the top of the selection panel shows the selection hierarchy. For example, let's assume that you have selected the data asset "Object Storage data asset 1," and the connection "Default connection." When you select the Schema, the breadcrumb displays as "From Object Storage data asset1 using connection Default connection."

Selecting a data entity

After selecting a data asset, connection, and schema (or bucket), you select a data entity from the list of available data entities.

In general, you can do the following in the Browse data entities panel:

  • Browse the available data entities and select an entity by its name.

  • Filter the available list to search and then select. In the search field, enter a partial or full entity name and press Enter to start the search. Search is case-sensitive. For example, if the available data entities include BANK_US and BANK_EU, enter BANK and then select from the filtered list.

  • Use one or more parameters in the search string. For example: CUSTOMERS_${COUNTRY}

    To use a parameterized data entity name to select the input or output resource, see Using Parameters in Data Entity Names.

Depending on the data asset type of your resource, after selecting a data entity, more configuration might be necessary on the Details or Advanced options tab of the Properties panel.

Assigning parameters to resource objects

To keep a resource object such as a data asset or connection in a data flow from being permanently bound to a specific resource, you assign a parameter to that object.

You can assign the parameter after or before you select the resource object.

  • In the data flow, add the source or target operator. After making a selection for a resource type, click Assign parameter next to the resource name to use another panel to select and assign a parameter for the selected object. If a suitable parameter type is not available, you can add a parameter and then assign it.

    Note

    Assign parameter is not available for a data entity whose name includes a parameter (such as BANK_${REGION}). You cannot bind a parameterized entity resource to another parameter.
  • Alternatively, add the source or target operator, then click Assign parameter to use one panel to assign a parameter, and select the resource for the object at the same time. In the Add parameter panel, you select a resource based on the parent-child relation inherited from the preceding selection. The breadcrumb at the top of the panel shows the selection hierarchy.

See also Using Data Flow Parameters.

Using parameters in data entity names

You can include one or more data flow parameters in the name of the data entity resource you specify for a data operator.

The syntax to use for a data flow parameter in a data entity name is ${PARAMETER_NAME}. For example: CUSTOMERS_${COUNTRY}

A parameter name is case-sensitive, and each parameter must have a default value.

For example, CUSTOMERS_${COUNTRY} might return the database table CUSTOMERS_USA, and BANK_${COUNTRY}/* might return the Object Storage files in BANK_EU.

To use parameters in data entity names when you configure a data operator, you can:

  • Add the parameter to the data flow before you add the data operator
  • Add the parameter at the time you are configuring the operator's data entity

How to add a parameter

In your data flow, select Parameters from the View menu on the canvas toolbar to open the Parameters panel.

In the Parameters panel, click Config and then click Add.

In the Add parameter panel, configure a parameter of the appropriate data type, for example VARCHAR or NUMERIC, and add a default value.

During data entity configuration in a data flow, you can search available data entities by entering the parameter name in the Browse data entities panel. In the search field, start typing ${ followed by any character. If the front part of the parameter name matches existing parameters in the data flow, a list of suggested names displays. You can then select a parameter from the list.

How to add a parameter at the time of data entity configuration

In the Browse data entities panel, you can do the following:

  • From the More actions menu, select Add data flow parameter to use the Add data flow parameter panel. Specify the data type, default value, and other properties for the parameter you want to add and use.

  • In the search field, start typing ${ followed by any character. If the front part of the parameter name matches existing parameters in the data flow, a list of suggested names displays. You can then select a parameter from the list.

  • In the search field, enter the parameter name. If the parameter name does not yet exist in the data flow, and you press Enter, Data Integration displays the Add data flow parameter panel. Alternatively, after entering the parameter name, select Add data flow parameter from the More actions menu.

    In the Add data flow parameter panel, specify the data type, default value, and other properties for the parameter you want to add and use.

Source Operator

Use the source operator to specify the data entities that serve as the input to your data flow.

Adding and configuring a source operator

You can add multiple source operators on a data flow.

Important

If you are using a hierarchical data entity for a source operator, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab or Data tab in the Properties panel.
  1. From the Operators panel, drag and drop a Source operator onto the canvas.
  2. With the source operator in focus, on the Details tab of the Properties panel, enter a name for the source operator in the Identifier field, or leave the default name as-is.
  3. To configure the input source for this operator, in the resource order as presented under the identifier, click Select next to the resource to select the Data asset, Connection, and Schema, as described in Selecting a data asset, connection, and schema.

    For Schema, note the following:

    • For an Object Storage data asset, select the Compartment first and then select the Bucket (schema).
    • For a BICC Oracle Fusion Applications data asset, select the BICC offering as the schema. The BICC offering you select provides the BICC view objects (VO) for data extraction.

    After selecting the data asset, connection, and schema (or bucket), select a data entity from the list of available entities. How you select a data entity depends on the data asset type for this source operator. See step 4 or step 5.

  4. To select a data entity that is not from an Object Storage data asset, note the following:
    • In the Select data entity panel, where applicable you can:

      • Use the Data entity menu to select a data entity by its name.

      • Click Browse data entities to use another panel to browse or search the available list, and then select a data entity.

        See Selecting a Data Entity, if you need help with searching the list by using a full or partial name.

        You can include one or more data flow parameters in the name of a selected data entity, by using the syntax ${PARAMETER_NAME}. For example: BANK_${REGION}

        See Using Parameters in Data Entity Names, if you need help with adding and using parameters when configuring entity names.

    • For a BICC Oracle Fusion Applications data asset, select a BICC VO as the data entity. Then go to step 6.

    • For a BIP Oracle Fusion Applications data asset, select a report, and specify the staging location. Then go to step 7.

    • For a database data asset, you can click Enter custom SQL to enter a single SQL statement in the editor.

      By providing a SQL query statement, you can define a database source entity and the shape of the entity at the same time. For example, you can define and filter the data in a SQL statement instead of adding a filter operator after you define the entity to use for the source operator.

      After entering your SQL query, click Validate to check your statement for errors.

    • After selecting a data entity, the name of the selected data entity is displayed next to Data entity on the Details tab in the Properties panel.

      • If you provided a SQL statement, the label SQL_ENTITY<nnnnnnnnn> is displayed, for example: SQL_ENTITY123456789

      • If you used a parameter in the entity name, the label includes the parameter name, for example: BANK_${REGION}

        Note

        For a data entity that includes a parameter in the name, you cannot bind the entity resource to another parameter. The Assign parameter option is not available.

  5. To select a data entity that is from an Object Storage data asset:
    1. Use the Select data entity panel to select a single entity by name, or select one or more entities by using a pattern expression.
      • Click Browse by name to use another panel to browse or search the available list, and then select one data entity by name.

        See Selecting a Data Entity, if you need help with searching the list by using a full or partial name.

        You can include one or more data flow parameters in the name of a selected data entity, by using the syntax ${PARAMETER_NAME}. For example: BANK_${REGION}.csv

        See Using Parameters in Data Entity Names, if you need help with adding and using parameters when configuring a data entity by name.

      • Click Browse by pattern to use a regular expression (including wildcards) to select one or more entities that match a specific directory and file pattern. For example, enter CUSTOMERS* in the field and press Enter to filter the list. See Testing and Using a Pattern.

        You can also include parameters in the pattern expression, by using the syntax ${PARAMETER_NAME}. For example: CUSTOMERS_${REGION}/*

        See Using Parameters in Data Entity Names, if you need help with adding and using parameters when configuring data entity names by pattern.

        Note

        When you use a pattern expression, all existing files that match the pattern are assumed to have the same structure. The matched files are treated as a single entity in the data flow. Any future new files that match the pattern are also processed.

    2. After making your data entity selection by name or by pattern, complete the entity configuration in the Select data entity panel.

      How you complete the configuration depends on the File type you choose for the selected Object Storage data entity.

      • Select the File type. Your choices are: CSV, JSON, PARQUET, AVRO

        Complete the rest of the fields if your selected file type is CSV.

      • The default Encoding for CSV and JSON is UTF-8, which cannot be changed.

      • Select the Compression type (compression method) that is used for your file type.

        For CSV and JSON file types, your choices are: Auto (Default), Gzip, Bzip2, Deflate, Lz4, and Snappy. If you do not know the compression method that is used to compress the file, select Auto (Default). For PARQUET and AVRO, only Auto (Default) is available.

      • If you used Browse by name to select the data entity and your selected file type is CSV, you can click Preview raw data to display the first 10 lines of the file.

    3. After completing the data entity configuration, click Select.

      You are brought back to the Details tab in the Properties panel. The name of the selected data entity or entity pattern expression is displayed next to Data entity. For example, CUSTOMERS_EU.csv, CUSTOMERS*, or CUSTOMERS_${REGION}/*

      Note

      For a data entity that includes a parameter in the name, you cannot bind the entity resource to another parameter. The Assign parameter option is not available.

  6. For an Oracle Fusion Applications data asset with a BICC connection, complete the following on the Details tab:
    1. Select one of the following Extract strategy options.
      • Full: Extracts and loads all data from the optional Initial extract date that you specify. If you don't provide a date in Initial extract date, then all data is extracted.
      • Incremental: Extracts and loads only data that is created or modified from the Last extract date that you must specify.
    2. Select a different BICC external storage configuration for the extracted data, if you don't want to use the Data asset default. The Data asset default is the BICC external storage that is configured within the BICC Fusion Applications data asset when the asset was created or edited.
    3. Select Enable BI broker if you want the BICC extractor to read input rows directly from the Fusion Applications database.
  7. For an Oracle Fusion Applications data asset with a BIP connection, complete the following on the Advanced options tab:

    Enable row limit: Specify the maximum number of rows to retrieve. After you select the Enable row limit check box, enter the following details:

    • In the Row limit field, specify the row limit.
    • If you expect huge data volume, specify the size in the Chunk size field under Enable chunking. If the specified row limit is greater than 100000, specifying the chunk size is mandatory.
    • The BIP properties section displays the properties that are from the report you selected in the data entity. Select a property each in the Number of rows returned and Starting offset fields. You cannot select the same property in both the fields.
  8. (Optional) On the Details tab, select the refresh icon next to the selected entity name to ensure that you are using the entity's latest schemas during the current working session in the designer. Only the metadata of the entity is fetched, not the actual data changes.
  9. (Optional) On the Advanced options tab, depending on the data asset type for this source operator, you can select or uncheck applicable options.
    • Fetch file metadata as attributes: For Object Storage and Amazon S3 (Simple Storage Service), select the check box to include file metadata (such as file name and file size) as attributes in the returned data.
    • Allow schema drift: By default, schema drift is enabled to let Data Integration dynamically handle schema definition changes during data flow design time and runtime. Schema drift is changes to schema definitions in the specified data entity. Changes include attributes that are added or removed, and attribute data types or entity names that are changed.

      If you clear the Allow schema drift check box, you disable schema drift to lock the schema definitions when the data flow is defined. When schema drift is disabled, Data Integration uses a fixed shape of the specified data entity even when the underlying shape has changed.

    • Allow pushdown: By default, pushdown is enabled to let Data Integration use the source database system to process transformation operations, if the system supports pushdown. Only Oracle databases, Oracle Autonomous Data Warehouse, Oracle Autonomous Transaction Processing, and MySQL support pushdown. Less data is pulled from the source when Data Integration uses the source database system to process operations.

      By clearing the Allow pushdown check box, you disable pushdown. When pushdown is disabled, Data Integration pulls all data directly from the source system and then processes the transformations.

  10. In the Attributes tab, view the attributes from the data entity you selected. For a BICC Fusion Applications data asset, the attributes are the BICC columns from the VO you selected.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  11. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    For a BICC Fusion Applications data asset, only 10 rows of data are shown.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  12. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Using file patterns

When configuring Oracle Object Storage as a data source, you can use a regular expression to specify a file pattern for selecting one or more data entities.

A file pattern is a rule for finding files that match a directory and file name and how to handle the matched files when found.

Syntax to use

Data Integration supports the glob pattern syntax for specifying a file pattern.

  • An asterisk, *, matches any number of characters (including none).
  • Two asterisks, **, works like * but crosses directory boundaries to match complete paths.
  • A question mark, ?, matches exactly one character.
  • Braces specify a collection of sub patterns. For example:
    • {sun,moon,stars} matches "sun", "moon", or "stars".
    • {temp*,tmp*} matches all strings beginning with "temp" or "tmp".
  • Square brackets convey a set of single characters or, when the hyphen character (-) is used, a range of characters. For example:
    • [aeiou] matches any lowercase vowel.
    • [0-9] matches any digit.
    • [A-Z] matches any uppercase letter.
    • [a-z,A-Z] matches any uppercase or lowercase letter.

    Within the square brackets, *, ?, and \ match themselves.

  • All other characters match themselves.
  • To match *, ?, or the other special characters, you can escape them by using the backslash character, \. For example: \\ matches a single backslash, and \? matches the question mark.
Examples
*.htmlMatches all strings that end in .html
???Matches all strings with exactly three letters or digits
*[0-9]*Matches all strings containing a numeric value
*.{htm,html,pdf}Matches any string ending with .htm, .html, or .pdf
a?*.javaMatches any string beginning with a, followed by at least one letter or digit, and ending with .java
{foo*,*[0-9]*}Matches any string beginning with foo or any string containing a numeric value
directory1/20200209/part-*[0-9]*jsonMatches all files in the folder where the file name starts with part- and have any number of 0-9 numbers and ends with json
directory3/**.csvMatches all files with an extension of csv that are in the folder directory3 and its subfolders
directory3/*.csvMatches all files with an extension of csv that are only in the main folder directory3. Files in subfolders are not included.
Testing and using a pattern

You can test the expression to ensure that the pattern you want to use retrieves the Object Storage files for one or more data entities.

  1. In the Select data entity panel, click Browse by pattern.

  2. In the Browse data entities by pattern panel, select Test pattern from the More actions menu.

  3. In the Test pattern panel, in the Search pattern field, enter a pattern expression you want to test before using. For example, enter department1/2020/*.json to find all files with the extension .json that are in the directory department1/2020.

    Currently, you cannot use parameter syntax when testing a pattern.

  4. To test your search pattern, in the Test filenames block, provide a file name or multiple file names delimited by a new line. For example, for the pattern BANK_C*/*, the file names might be:

    BANK_CITY/part-00002-0aaf87d57fbc-c000.csv	
    BANK_COUNTRY/part-00000-a66df3ab02fd-c000.csv
  5. Click Test pattern.

    Verify that your test file names are returned in the Resulting file name block.

  6. Click Use pattern to add your pattern expression to the Browse data entities by pattern panel.

    You are returned to the Browse data entities by pattern panel. The files that match your pattern expression are displayed in the table.

  7. Click Select pattern.

    You are returned to the Select data entity panel. The pattern expression is displayed next to Data entity.

    When you use a pattern expression, all existing files that match the pattern are assumed to have the same structure. The matched files are treated as a single entity in the data flow. Any future new files that match the pattern are also processed.

Target Operator

Use the target operator to specify the data entities that serve as the output for storing transformed data.

Selecting a target data entity

For each target operator that you add to a data flow, you select a Data asset, Connection, Schema, and Data entity.

All selections are performed on the Details tab of the Properties panel for a target operator. See Selecting a data asset, connection, and schema and Selecting a data entity.

Consider also the following when specifying the data entity to use as a target. Depending on the data asset type, there might be other specifications to consider.

Output target configuration

How data is moved into the target data entity depends on the Integration strategy you choose. You can insert new records, truncate before inserting new records, append to existing data, or merge existing records.

In addition, you can use an existing data entity or create an entity to serve as the output on a target operator:

  • To create a data entity, select the Create new data entity check box. You are prompted to enter a name for the data entity that is to be created on the target you are configuring.
  • To use an existing data entity, do not select the Create new data entity check box. You are prompted to select an existing data entity after selecting the data asset, connection, and schema.
New data entity for a database

When creating a database data entity as a target output, if the entity name you provide does not exist, a data entity is created and data is inserted into the output. If the entity name you provide exists, the outcome of the operation depends on the shape of the target on the data flow and the shape of the existing entity.

A shape is based on the number and name of attributes, data types, type characteristics such as length and precision, and whether data is present. For example, data type varchar2(4000) is not the same shape as varchar2(2000) even though the attribute name on the data flow target is the same as the name of the attribute on the existing data entity.

  • If the shape of the target entity on the data flow is the same as the shape of the existing entity, the database operation:
    • Inserts data, if no data is present
    • Updates or inserts data, if data is present
  • If the shape of the target entity on the data flow is not the same as the shape of the existing entity:
    • The operation adds the attribute to the existing entity when the attribute name does not exist.

      For example, the target on the data flow has the attribute EMPLOYEE_ID NUMBER(10), but EMPLOYEE_ID NUMBER is not present in the existing entity.

    • When the attribute name is present in the existing entity, but the data type shape is not the same:
      • When attribute data is present, the database operation fails when the data shape is smaller. For example, when the target on the data flow has EMPLOYEE_COMMENTS VARCHAR2(4000) and the existing entity has EMPLOYEE_COMMENTS VARCHAR2(2000), the operation fails. When the target on the data flow has EMPLOYEE_COMMENTS VARCHAR2(2000) and the existing entity has EMPLOYEE_COMMENTS VARCHAR2(4000), the operation succeeds.

      • When attribute data is not present, the operation modifies the attribute and inserts data.

        If your target is an autonomous database, you might encounter the database error ORA-22859: invalid modification of columns when an attempt is made to modify a column object, REF, VARRAY, nested table, or LOB column type. Create a column of the preferred type and copy the column data to the new type using the appropriate type constructor.

Single output file (Object Storage)

If one of the following conditions applies, you can choose to write the output to a single file when using Object Storage as the target data entity:

  • When the integration strategy is Insert, and the Create new data entity check box is selected
  • When the integration strategy is Overwrite, and you use an existing data entity (Create new data entity check box is not selected)

When writing output to a single file, you can include parameters in the file name using the syntax ${PARAMETER_NAME}. For example, the file name can be written as:

/bucket1/file_${REGION}.csv
/bucket1/file_${CURRENT_DATE}.csv
Adding and configuring a target operator

You can add multiple target operators to a data flow. Each target can have only one inbound port.

Important

If you are using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes, Map, and Data tabs in the Properties panel.
  1. From the Operators panel, drag and drop a Target operator onto the canvas.
  2. With the target operator in focus, on the Details tab of the Properties panel, enter a name for the target operator in the Identifier field, or leave the default name as-is.
  3. Select the Integration strategy that determines how to move data into the target. You can choose from the following options:
    • Insert: Inserts new records, or appends the records when the data exists on the target.
    • Overwrite: Performs a truncate on the target before inserting new records.
    • Merge: Inserts new records, and merges existing ones. The primary key that you select for the data entity becomes the default merge key. You can click edit, and in the Merge key panel, select a unique key other than the primary key to merge your data.
    Note

    • If you select the Overwrite strategy, you cannot choose to create a new data entity for the target entity.
    • If you select the Merge strategy, you cannot use Object Storage for the target entity.
  4. (Optional) Instead of updating an existing entity on the target, you can create an entity on the target. Select the Create new data entity check box next to the Integration strategy field.

    In a later step, you are prompted to enter a name for the data entity that is to be created on the target.

  5. In the resource order as presented under the Integration strategy field, click Select next to the resource to select the data asset, connection, and schema.
    Note

    • Not all types are available as target data assets. For example, PostgreSQL and Hive are not available as targets.
    • If you selected the Merge integration strategy, you cannot select an Object Storage data asset.
    • If you are using an Object Storage data asset, select the compartment first and then select the bucket (schema).
  6. If you selected the Create new data entity check box: Enter a name for the entity that is to be created. A new data entity is created only if the name does not exist.

    See also Parameters in Data Entity Names if you want to use a parameter for part of the new entity name.

    • For Oracle Database, Autonomous Data Warehouse, or Autonomous Transaction Processing, enter the new data entity name.

      If the entity name that you provide exists, the shape of the existing entity and the target on the data flow determines the outcome of the operation.

    • For Object Storage, enter the new data entity name, followed by a forward slash (/). For example, enter newdirectory/. However, if you select the Create output as a single file check box because you want to create a single file output, enter the new entity name without the forward slash (/) at the end.

      Then select the file type and compression type. The compression types that are available for selection depends on the file type:

      • For CSV and JSON, the compression type choices are None (Default), Gzip, Bzip2, Deflate, Lz4, and Snappy. The default encoding is UTF-8, which cannot be changed.
      • For Parquet, the compression type choices are Snappy (Default), Gzip, Lz4, and None.
      • For Avro, the compression type choices are Snappy (Default), Gzip, Lz4, Bzip2, and None.
  7. If you did not select the Create new data entity check box: Select an existing data entity. Also take note of the following:
    • For Object Storage, if you select the Create output as a single file check box, select the entity name that does not have the forward slash (/) at the end.

      Then select the file type and compression type. The compression types that are available for selection depends on the file type:

      • For CSV and JSON, the compression type choices are None (Default), Gzip, Bzip2, Deflate, Lz4, and Snappy. The default encoding is UTF-8, which cannot be changed.
      • For Parquet, the compression type choices are Snappy (Default), Gzip, Lz4, and None.
      • For Avro, the compression type choices are Snappy (Default), Gzip, Lz4, Bzip2. and None.
    • To ensure that you are using the entity's latest schemas during the current working session in the designer, you can select the refresh icon next to the entity name. Only the metadata of the entity is fetched, not the actual data changes.
  8. (Optional) (Single output file) If you are configuring an Object Storage target data entity, select the Create output as a single file check box if you want to use a single output file, otherwise the operation creates multiple files. The single output file is overwritten every time the task is run.

    Creating a single output file might affect the performance of Data Integration. Do not use the single output file option for large datasets.

    If you are creating a new data entity and you select Create output as a single file, do not add the forward slash (/) at the end of the new entity name.

    If you are not creating a new data entity, the Integration strategy must be Overwrite to select Create output as a single file. Select an existing data entity that does not have the forward slash (/) at the end of the entity name.

  9. For Autonomous Data Warehouse or Autonomous Transaction Processing, select the staging location if a default location is not defined for the asset, or edit the default location.
    Note

    If your target is Autonomous Data Warehouse or Autonomous Transaction Processing, the staging location option is enabled for you to select an Object Storage bucket. Data Integration uses the staging location to stage your data before loading the data into the target.

    If your target data asset is configured with a default staging location, Data Integration automatically uses that bucket for staging unless you explicitly edit the value to specify a different staging location.

    To use Autonomous Data Warehouse or Autonomous Transaction Processing as your target, you must first enable the PAR_MANAGE request permission on the staging bucket, for example:

    allow any-user to manage buckets in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.permission='PAR_MANAGE', request.principal.id='<workspace-ocid>'}

    Learn more about pre-authenticated requests.

    1. For Staging location, click Select or Edit.
    2. (For edit) In the Select staging location panel, do not select the Use default staging location settings check box, if you want to select a different staging location.
    3. Click the Data asset menu and select an Object Storage data asset.
    4. Then select the Connection, Compartment, and the Schema (bucket), and click Select.
  10. (Optional) For Autonomous Data Warehouse or Autonomous Transaction Processing, enter a value for Reject limit on the Advanced options tab.

    You can specify the maximum number of erroneous rows that can fail to load into the target before the integration task fails. For example, if your data source has 1,000 rows and you set the reject limit at 200, the task fails immediately after the 200th erroneous row is rejected.

    If you don't specify a value, the default is zero, which means the task fails upon the first erroneous row being rejected.

    If you encounter a task failure, check the logs for the table names where the rows had been rejected. Then query the affected tables and rows in the autonomous database.

  11. (Optional) Enter a value for Load order on the Advanced options tab.

    Only positive integers are allowed. The default is 0. The load order specifies the order of the target nodes that data is loaded into. The target with value 0 is loaded first, then target with value 1, and so on. If multiple nodes have the same value, the targets are loaded in no particular order.

  12. Under the Attributes tab, you can select individual attributes or use the filter icons to filter the list of attributes by a name pattern or by an attribute type. Then apply bulk exclude rules to the selected or filtered attributes.

    You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  13. In the Data tab, view a sampling of the data, based on any rules you have applied in the Attributes tab. Transformations cannot be applied on a target operator as the data is read-only.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  14. In the Map tab, determine how the incoming attributes map to the target data entity's attributes.

    For target entities with hierarchical data types, only first-level fields can be mapped. See Hierarchical Data Types to understand what is supported.

  15. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Viewing data

The Data tab displays the transformed data based on the operators applied in the data flow.

Important

If you are using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported.

You can filter data in the target entity by a name pattern or data type. To filter data by a name pattern, enter a simple regex pattern or wildcards ? and * in the Filter by Pattern field. To filter data by a type, select the data type from the menu next to the pattern field.

Transformations cannot be applied to the target operator as the data is read-only.

Mapping attributes

The Map tab is only available for a target operator.

Important

If you are using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported.

When you're creating a new target data entity, the Map tab is not available. The incoming attributes are used to create the table or file structure with a 1-to-1 mapping.

When using an existing target data entity, you map the incoming attributes to the target data entity's attributes. The actions you can perform are:

Map by Position

Maps incoming attributes to target entity attributes according to their position in the lists.

From the Actions menu, select Auto-map by position. The Auto-map by position rule is added.

Map by Name

Maps incoming attributes to target attributes with the same name.

From the Actions menu, select Auto-map by name. The Auto-map by name rule is added.

Map by Pattern

Maps incoming attributes to target attributes based on simple, user-defined regex rules.

From the Actions menu, select Map by pattern.

You can define custom patterns using asterisk (*) and question mark (?) symbols. Use an asterisk to indicate a wildcard of any number of characters in a string pattern or a question mark to indicate a single character wildcard. For example, *INPUT? would map any matching attribute starting with n number of characters containing the string INPUT followed by a single character, such as NEWINPUTS.

You can also use $n to indicate different capture groups. For example, let's say you want to map LAST_NAME, FIRST_NAME, and USERNAME from a source or upstream operator to TGT_LAST_NAME, TGT_FIRST_NAME, and TGT_USERNAME in the target data entity. You would enter *NAME in the Source pattern field and TGT_$1 in the Target pattern field. The asterisk (*) in *NAME means the character string before NAME is to be identical to the character string found in $1 of the target pattern. $1 refers to the first capture group in source pattern, which in this case is the asterisk (*).

Note

Pattern matching is case-sensitive.

Click Preview mapping to test your source and target patterns.

Direct Manual Map

Drag and drop an incoming attribute from the source list to an attribute in the target list to create a mapping.

Alternatively, you can select Manual map from the Actions menu. Then use the Map Attribute dialog to create a mapping by selecting a source attribute and a target attribute.

Remove a Mapping

Removes the selected mapping.

Select View rules. In the Rules panel, select one or more rules and click Remove. Alternatively, you can select Remove from the rule's Actions menu to clear that rule.

Clear All Mappings

Removes all mappings.

From the Actions menu, select Reset mappings. All manual and auto-map rules are removed.

Shaping Operators

Important

For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.

Filter Operator

Use the filter operator to select a subset of data from the inbound port to continue downstream to the outbound port based on a filter condition.

Adding and configuring a filter operator
  1. From the Operators panel, drag and drop a Filter operator onto the canvas.
  2. With the filter operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Next to Filter condition, click Create.
  4. In the Create Filter Condition panel, Condition Builder section, you can double-click or drag and drop attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
    Note

    In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
  5. Click Create.
  6. (Optional) Click Assign parameter to use parameters so that the filter condition is not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
  7. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  9. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Creating a filter condition

Use the Condition Builder to visually select elements to create a filter condition. You can also enter a filter condition manually in the editor.

Creating a filter condition lets you select a subset of data from an upstream operator based on the condition.

The elements that you can use in a filter condition include incoming attributes, parameters, and functions. You can double-click or drag and drop an element from the list to add to the editor to build a condition. You can validate the condition before creating it.

Incoming displays the attributes from the upstream operator that are coming into this filter operator.

For example, to filter data by a city name, you can create the condition expression as:

FILTER_1.ADDRESSES.CITY='Redwood Shores'

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Suppose you want to use a parameter for the city name in the filter condition. You can create a VARCHAR parameter with the name P_VARCHAR_CITY, and set the default value to Redwood Shores. Then you can create the filter expression as:

FILTER_1.ADDRESSES.CITY=$P_VARCHAR_CITY

Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.

For example, to filter data by a city name or by population, you can use the OR function to create the filter condition expression as:

FILTER_1.COUNTRIES.CITY=$P_VARCHAR_CITY OR FILTER_1.COUNTRIES.POPULATION>100000000
FILTER_1.COUNTRIES.CITY=$P_VARCHAR_CITY OR FILTER_1.COUNTRIES.POPULATION>$P_NUMERIC

Here's a list of functions that are available for you to add when you construct conditions:

Hash Functions

FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
Arithmetic Functions

FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions

FunctionDescriptionExample
CURRENT_DATEReturns the current date.CURRENT_DATE returns today's date
CURRENT_TIMESTAMPReturns the current date and time with time zone in the session time zone.CURRENT_TIMESTAMP returns today's date and current time
DATE_ADD(date, number)Returns the date that is the specified number of days after the specified date.DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEEE: Named day of the week, such as Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday

DATE_FORMAT(2020-10-10, 'yyyy-MM-dd') returns '2020-10-10'. The first argument is a Date object representing Oct 10th, 2020.

DATE_FORMAT(2018-junio-17, 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date)Returns the date's day in the month.DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date)Returns the date's day in the week.DAYOFWEEK('2020-12-25') returns 6 in the U.S. because Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)Returns the date's day in the year.DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime)Returns the datetime's hour value.HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)Returns the date's last day of the month.LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)Returns the datetime's minute value.HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)Returns the date's month value.MONTH('2020-06-25') returns 6
QUARTER(date)Returns the quarter of year the date is in.QUARTER('2020-12-25') returns 4
SECOND(datetime)Returns the datetime's second value.HOUR('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr])Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM
TO_TIMESTAMP('2020-10-10 11:10:10', 'YYYY-MM-DD HH:mm:SS') returns a TIMESTAMP object representing 11am 10:10 Oct 10th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date)Returns the date's year value.YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1st, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions

FunctionDescriptionExample
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )Returns the value evaluated at the row that is the first row of the window frame.FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])Returns the value evaluated at the row that is the last row of the window frame.LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)Returns the rank of the current row with gaps, counting from 1.RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause)Returns the unique number of the current row within its partition, counting from 1.ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions

FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columnsCONCAT('Oracle','SQL') returns OracleSQL
INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
TO_CHAR(expr[, format])Converts a date into a string, based on the format. Converts a number into a string, no format is required.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of the week, such as 0 for Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday

TO_CHAR(123) returns 123

TO_CHAR(Date'2020-10-30', 'yyyy.MM.dd', 'en-US') returns 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions

FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions

FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions

FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions

FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Join Operator

Use the join operator to link data from multiple inbound sources.

Adding and configuring a join operator
  1. From the Operators panel, drag and drop a Join operator onto the canvas.
  2. With the join operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Select the Join type. The types of join supported are:
    Join Types
    Icon Join Type Description
    inner join venn diagram Inner Selects the data from the two inbound sources where the join condition is met.
    left outer join venn diagram Left outer Selects all the data from the inbound source 1 and the data from inbound source 2 where the join condition is met.
    right outer join venn diagram Right outer Selects the data from inbound source 1 where the join condition is met and all the data from inbound source 2.
    full outer join venn diagram Full outer Selects all data from the two inbound sources, matching rows where matches can be made and inserting nulls where no matches are found.
  4. Next to Join condition, click Create.
  5. In the Create Join Condition panel, Condition Builder section, you can double-click or drag and drop attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
    Note

    In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
  6. Click Create.
  7. (Optional) Click Assign parameter to use parameters so that the join condition is not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
  8. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  9. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  10. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Creating a join condition

Use the Condition Builder to visually select elements to create a join condition. You can also enter a join condition manually in the editor.

Creating a join condition lets you select data from two inbound sources based on the condition.

The elements that you can use in a join condition include incoming attributes, parameters, and functions. You can double-click or drag and drop an element from the list to add to the editor to build a condition. You can validate the condition before creating it.

Incoming displays the attributes from the upstream ports connected to the join operator as two separate JOIN folders. View the attributes from each port by expanding or collapsing the appropriate JOIN folder. For example, JOIN_1_1 and JOIN_1_2.

JOIN_1_1.BANK_CUSTOMER.ADDRESS_ID = JOIN_1_2.BANK_ADDRESS.ADDRESS_ID

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Suppose you want to join two sources and retain only the rows where BANK_NAME='ABC Bank'. You can create a VARCHAR parameter with the name P_VARCHAR, and set the default value to ABC BANK. Then you can create the join expression as:

JOIN_1_1.ADDRESSES.BANK_ID = JOIN_1_2.BANK.BANK_ID AND JOIN_1_2.BANK.BANK_NAME = $P_VARCHAR

Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.

Here's a list of functions that are available for you to add when you construct conditions:

Hash Functions

FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
Arithmetic Functions

FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions

FunctionDescriptionExample
CURRENT_DATEReturns the current date.CURRENT_DATE returns today's date
CURRENT_TIMESTAMPReturns the current date and time with time zone in the session time zone.CURRENT_TIMESTAMP returns today's date and current time
DATE_ADD(date, number)Returns the date that is the specified number of days after the specified date.DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEEE: Named day of the week, such as Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday

DATE_FORMAT(2020-10-10, 'yyyy-MM-dd') returns '2020-10-10'. The first argument is a Date object representing Oct 10th, 2020.

DATE_FORMAT(2018-junio-17, 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date)Returns the date's day in the month.DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date)Returns the date's day in the week.DAYOFWEEK('2020-12-25') returns 6 in the U.S. because Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)Returns the date's day in the year.DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime)Returns the datetime's hour value.HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)Returns the date's last day of the month.LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)Returns the datetime's minute value.HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)Returns the date's month value.MONTH('2020-06-25') returns 6
QUARTER(date)Returns the quarter of year the date is in.QUARTER('2020-12-25') returns 4
SECOND(datetime)Returns the datetime's second value.HOUR('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr])Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM
TO_TIMESTAMP('2020-10-10 11:10:10', 'YYYY-MM-DD HH:mm:SS') returns a TIMESTAMP object representing 11am 10:10 Oct 10th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date)Returns the date's year value.YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1st, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions

FunctionDescriptionExample
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )Returns the value evaluated at the row that is the first row of the window frame.FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])Returns the value evaluated at the row that is the last row of the window frame.LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)Returns the rank of the current row with gaps, counting from 1.RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause)Returns the unique number of the current row within its partition, counting from 1.ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions

FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columnsCONCAT('Oracle','SQL') returns OracleSQL
INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
TO_CHAR(expr[, format])Converts a date into a string, based on the format. Converts a number into a string, no format is required.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of the week, such as 0 for Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday

TO_CHAR(123) returns 123

TO_CHAR(Date'2020-10-30', 'yyyy.MM.dd', 'en-US') returns 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions

FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions

FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions

FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions

FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Expression Operator

Use the expression operator to perform one or more transformations on a single row of data to create new, derivative fields.

To use the Expression operator to change the data type of a group of attributes, see Changing the data type of a group of attributes.

Adding and configuring an expression operator
  1. From the Operators panel, drag and drop an Expression operator onto the canvas.
  2. With the expression operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Under Expressions, click Add expression.
  4. In the Add Expression panel, enter a name for the expression in the Identifier field, or leave the name as-is.
  5. (Optional) To apply the expression to two or more attributes, select Allow bulk selection. Then select the attributes by name or by using a regular expression pattern. For example, suppose you have three name attributes in your source data set and you want to apply the UPPER function to all three source attributes. You can use a pattern to select this group of source attributes.
    1. For Source attributes, select Pattern and click Edit pattern. Then enter the regular expression that matches a group of source attributes. For example, enter *NAME to match attributes FIRST_NAME, LAST_NAME, and FULL_NAME.
    2. Select a data type from the menu.
    3. For Target attributes, enter the regular expression to define the target attribute names in Pattern. By default, $0 is entered, which refers to the original source attribute names. Update the pattern to name the target attributes according to your requirements. For example, enter $0_UPPER to append the text UPPER to the original source attribute names.
    4. By default, Use source attribute data types is selected. You can retain the selection. If you uncheck Use source attribute data types, then select a Data type and complete the fields corresponding to the type selected.
  6. (Optional) If you did not select Allow bulk selection, under Expression data type, you can select Infer data type if you want the expression builder to detect the data type from the expression that you enter. You can click Preview data type and Validate to preview and refresh the data type, and validate the expression.

    Infer data type is useful when you are working with complex data types such as Map, Array, and Struct that might have levels of nested types.

  7. If you did not select the check boxes Allow bulk selection and Infer data type, select a value from the Data type menu, and then complete the fields corresponding to the type selected.
  8. In the Expression builder section, visually construct an expression by double-clicking or dragging incoming attributes, parameters, or functions to add to the editor to build your expression. Or you can manually write the expression yourself. See Adding an expression. You can also validate the expression.

    You can use parameters in your expressions, including user-defined parameters that have been created in the data flow, and system parameters generated by Data Integration at runtime.

    If you selected Allow bulk selection, then use %MACRO_INPUT% in your expression to indicate the attributes on which the function is to be applied. For example, suppose you used the pattern *NAME to match the source attributes FIRST_NAME, LAST_NAME, and FULL_NAME. Now you can specify a function as UPPER(%MACRO_INPUT%) to apply the function on all the attributes that match your pattern.

    To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes check box. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you are adding the expression the first time. When you edit the expression, the Exclude incoming attributes check box is not available.

    Note

    To replace a placeholder in a function with another element, highlight the placeholder and then double-click another element from the list to add it to the function.
  9. Click Add.
    You can repeat the steps from step 3 to add more expressions, if necessary.
  10. Under the Attributes tab, select to view the incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes, and then apply bulk exclude rules to the selected or the filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards such as ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.

      To change the data type of a group of attributes, select Change data type. In the dialog that displays, select the new Data type for the selected or filtered attributes. Then select any other properties that are applicable to the new data type you have selected. When you click Apply, an expression is added to the Details tab instead of a rule.

    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  11. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

    See also Adding an expression.

  12. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Changing the data type of a group of attributes

You can use the Expression operator to change the data type of attributes using a bulk transformation action.

Instead of using a CAST function to change an attribute data type one attribute at a time, you can add the Expression operator and apply a bulk transformation action on several attributes at the same time.
  1. Connect the expression operator with the appropriate incoming object.
  2. With the expression operator in focus on the canvas, under the Attributes tab of the Properties panel, select to view the incoming attributes (linked to the expression operator on the left side).
  3. If you have a few attributes to transform, use the check boxes to select the attributes you want to transform.
  4. If you have many attributes to transform, apply a filter on the attributes:
    1. Click the filter icon in the Name column. In the filter field that is displayed, enter a simple regex pattern using wildcards (? and *) to filter the attributes by a name pattern. For example, enter ADD* to match attributes ADDRESS1, ADDRESS2, and ADDRESS_KEY.
    2. Click the filter icon in the Type column. In the menu that is displayed, select the current attribute type of the attributes you want to change. For example, suppose the type of ADDRESS1 and ADDRESS2 is VARCHAR, and the type of ADDRESS_KEY is NUMERIC. You can select VARCHAR to transform ADDRESS1 and ADDRESS2 only.
  5. From the Actions menu, select Change data type.
  6. In the Change data type dialog, select the new Data type for the selected attributes or the filter-applied attributes.

    Then specify any other properties that are applicable to the new data type you have selected.

    To keep the original attributes in the data set, select Keep source attributes. If the check box is not selected, only the new attributes are available in the resulting data.

  7. Click Apply.
  8. Select the Details tab.
    Review the expression that is added to the operator for you.
Adding an expression

Use the Expression Builder to visually select elements to build an expression in the editor. You can also manually write an expression yourself.

Take note of the following when you create expressions:

  • Enclose a string literal within single quotation marks. For example: CONCAT('We ', 'like') and UPPER('oracle').

  • Enclose an attribute name within double quotation marks. For example: UPPER("Sales") and CONCAT(CONCAT("EXPRESSION_1.EMP_UTF8_EN_COL_CSV.EN_NAME", ' '), "EXPRESSION_1.EMP_UTF8_EN_COL_CSV.NAME").

    Using double quotation marks around attribute names is mandatory for multibyte characters, and names that have special characters in the fully qualified name.

The Add Expression panel has two sections: Expression information and Expression builder. The Expression information fields let you specify a name and data type for the expression. You can also create the expression to apply to two or more attributes. When working with complex data types such as Map, Array, and Struct that might have levels of nested types, you can choose to let the builder detect the data type from the expression that you enter. When you let the builder infer the data type, you can preview and refresh the data type, and validate the expression.

The Expression builder section lists the elements for building an expression. The elements that you can use in an expression include incoming attributes, parameters, and functions. Double-click or drag an element from the list to add to the editor to build the expression, or manually write the expression yourself. You can validate the expression before creating it.

Incoming displays the attributes from the upstream operator that are coming into this expression operator. Below the attributes list is a check box that lets you apply Exclude rules. To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes check box. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you are adding the expression the first time. When you edit the expression, the Exclude incoming attributes check box is not available.

Parameters include user-defined parameters and system-generated parameters.

User-defined parameters are the expression parameters that have been added in the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). See Adding an Expression Parameter. The syntax is $PARAMETER_NAME. For example: EXPRESSION_1.ADDRESSES.POSTAL_CODE=$P_CODE

Data Integration generates system parameters such as SYS.TASK_START_TIME. The values of system parameters can be used in expressions to log system information. The syntax is ${SYSTEM_PARAMETER}. For example: ${SYS.TASK_RUN_NAME}

Functions are the functions available in Data Integration that you can use in an expression. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments. You can also add user defined functions that you have created in your workspace. For example: MYLIBRARY.MYFUNCTION

The list of functions available in Data Integration for you to use is as follows:

Hash Functions

FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
Arithmetic Functions

FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions

FunctionDescriptionExample
CURRENT_DATEReturns the current date.CURRENT_DATE returns today's date
CURRENT_TIMESTAMPReturns the current date and time with time zone in the session time zone.CURRENT_TIMESTAMP returns today's date and current time
DATE_ADD(date, number)Returns the date that is the specified number of days after the specified date.DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEEE: Named day of the week, such as Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday

DATE_FORMAT(2020-10-10, 'yyyy-MM-dd') returns '2020-10-10'. The first argument is a Date object representing Oct 10th, 2020.

DATE_FORMAT(2018-junio-17, 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date)Returns the date's day in the month.DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date)Returns the date's day in the week.DAYOFWEEK('2020-12-25') returns 6 in the U.S. because Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)Returns the date's day in the year.DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime)Returns the datetime's hour value.HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)Returns the date's last day of the month.LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)Returns the datetime's minute value.HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)Returns the date's month value.MONTH('2020-06-25') returns 6
QUARTER(date)Returns the quarter of year the date is in.QUARTER('2020-12-25') returns 4
SECOND(datetime)Returns the datetime's second value.HOUR('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr])Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM
TO_TIMESTAMP('2020-10-10 11:10:10', 'YYYY-MM-DD HH:mm:SS') returns a TIMESTAMP object representing 11am 10:10 Oct 10th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date)Returns the date's year value.YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1st, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions

FunctionDescriptionExample
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )Returns the value evaluated at the row that is the first row of the window frame.FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])Returns the value evaluated at the row that is the last row of the window frame.LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)Returns the rank of the current row with gaps, counting from 1.RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause)Returns the unique number of the current row within its partition, counting from 1.ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions

FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columnsCONCAT('Oracle','SQL') returns OracleSQL
INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
TO_CHAR(expr[, format])Converts a date into a string, based on the format. Converts a number into a string, no format is required.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of the week, such as 0 for Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday

TO_CHAR(123) returns 123

TO_CHAR(Date'2020-10-30', 'yyyy.MM.dd', 'en-US') returns 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions

FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions

FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions

FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions

FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Duplicating an expression

You can duplicate expressions that have been added to an expression operator.

  1. On the data flow canvas, select the expression operator.
  2. Under the Details tab of the Properties panel, review the list of expressions.
  3. Select Duplicate from the Actions menu corresponding to the expression you want to duplicate.
    No confirmation dialog displays; the duplicated expression is added to the list right away.
Moving an expression

You can change the order of expressions that have been added to an expression operator.

  1. On the data flow canvas, select the expression operator.
  2. Under the Details tab of the Properties panel, review the list of expressions.
  3. For the expression you want to move, select Move up, Move down, Move to top, or Move to bottom from the Actions menu.
    Only relevant move actions are available, depending on the current position of the expression. For example, Move up and Move to top are not available for the first expression in the list.

Aggregate Operator

Use the aggregate operator to perform calculations, like sum or count, on all rows or a group of rows to create new, derivative attributes.

Adding and configuring an aggregate operator
  1. From the Operators panel, drag and drop an Aggregate onto the canvas.
  2. With the aggregate operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Under Group by, select one of the following:
    • Attributes: Use the menu to select one or more attributes to group by.
    • Pattern: Click Add pattern to add a pattern that selects the attributes to group by. You use a regular expression pattern to match two or more attributes.
    Note

    The expressions you add are performed on each selected attribute. The attributes that continue downstream include the attributes you group by plus the attributes derived from the expressions you add.
  4. Under Expressions, click Add expression.
  5. In the Add Expression panel, enter a name for the aggregate expression in the Identifier field, or leave the name as-is.
  6. (Optional) To select the source attributes, where you want to apply the expression operation, using regular expression patterns, select Allow bulk selection and then select Pattern. For example, consider that you have three order attributes in your data set and you want to apply the SUM operator function to all three of them. You can use a pattern to select this group of source attributes.
    1. For Source attributes, enter the regular expression that matches a group of source attributes in Pattern. For example, enter *ORDER to match attributes FIRST_ORDER, LAST_ORDER, and FINAL_ORDER.
    2. Then for Data type, select a value from the menu and complete the fields corresponding to the type selected.
    3. For Target attributes, enter the regular expression to define the target attribute names in Pattern. By default, $0 is entered to refer to the original source attributes. Update the pattern to name the target attribute according to your requirements. For example, enter $0_SUM to append text SUM to the original source attribute names.
    4. By default, Use source attribute data types is selected. You can retain the selection. If you uncheck Use source attribute data types, then select a Data type and complete the fields corresponding to the type selected.
  7. (Optional) If you did not select Allow bulk selection, under Expression data type, you can select Infer data type if you want the Expression Builder to detect the data type from the expression that you enter. You can click Preview data type and Validate to preview and refresh the data type, and validate the expression.

    Letting the builder infer data type is useful when you are working with complex data types such as Map, Array, and Struct that might have levels of nested types.

  8. If you did not select Allow bulk selection, then for Data type, select a value from the menu, and then complete the corresponding fields according to the type selected.
  9. (Optional) Under Expression data type, you can select Infer data type if you want the Expression Builder to detect the data type from the expression that you enter. You can click Preview data type and Validate to preview and refresh the data type, and validate the expression.

    Letting the builder infer data type is useful when you are working with complex data types such as Map, Array, and Struct that might have levels of nested types.

  10. If you did not select the check boxes Allow bulk selection and Infer data type, select a value from the Data type menu, and then complete the fields corresponding to the type selected.
  11. In the Expression builder section, visually construct an expression by double-clicking or dragging incoming attributes, parameters, or functions to add to the editor to build your expression. Or you can manually write the expression yourself. You can also validate the expression.

    If you selected Allow bulk selection, then use %MACRO_INPUT% in your expression to indicate the attributes on which the function is to be applied. For example, consider you used the pattern *NAME to match source attributes FIRST_NAME, LAST_NAME, and FULL_NAME. Now you can specify a function as UPPER(%MACRO_INPUT%) to apply the function on all the attributes that match your pattern.

    To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes check box. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you are adding the expression the first time. When you edit the expression, the Exclude incoming attributes check box is not available.

    Note

    To replace a placeholder in a function with another element, highlight the placeholder and then double-click another element from the list to add it to the function.
  12. Click Add.
    You can repeat the steps from step 4 to add more expressions, if necessary.
  13. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  14. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab. You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
  15. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Adding an aggregate expression

Use the Expression Builder to visually select elements to build an expression in the editor. You can also manually write an expression yourself.

The Add Expression panel has two sections: Expression information and Expression builder. The Expression information fields let you specify a name and data type for the expression. You can also create the expression to apply to two or more attributes. When working with complex data types such as Map, Array, and Struct that might have levels of nested types, you can choose to let the builder detect the data type from the expression that you enter. When you let the builder infer the data type, you can preview and refresh the data type, and validate the expression.

The Expression builder section lists the elements for building an expression. The elements that you can use in an expression include incoming attributes, parameters, and functions. Double-click or drag an element from the list to add to the editor to build the expression, or manually write the expression yourself. You can validate the expression before creating it.

Incoming displays the attributes from the upstream operator that are coming into this expression operator. Below the attributes list is a check box that lets you apply Exclude rules. To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes check box. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you are adding the expression the first time. When you edit the expression, the Exclude incoming attributes check box is not available.

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments. Here's a list of functions available to use to build your expressions:

FunctionDescriptionExample
COUNT(value[, value]*)Returns the number of rows for which one or more supplied expressions are all non-null.COUNT(expr1)
COUNT(*)Returns the total number of retrieved rows, including rows containing null.COUNT(*)
MAX(value)Returns the maximum value of the argument.MAX(expr)
MIN(value)Returns the minimum value of the argument.MIN(expr)
SUM(numeric)Returns the sum calculated from values of a group.SUM(expr1)
AVG(numeric)Returns the average of numeric values in an expression.AVG(AGGREGATE_1.src1.attribute1)
LISTAGG(column[, delimiter]) WITHIN GROUP (order_by_clause)

Concatenates values of the input column with the specified delimiter, for each group based on the order clause.

column contains the values you want to concatenate together in the result.

The delimiter separates the column values in the result. If a delimiter is not provided, then an empty character is used.

order_by_clause determines the order that the concatenated values are returned.

This function can only be used as an aggregator, and can be used with grouping or without grouping. If you use without grouping, the result is a single row. If you use with a grouping, the function returns a row for each group.

Consider a table with two columns, id, name. The table has three rows. The id column values are 101, 102, 102. The name column values are A, B, C.

+-----+--------+
| id  | name  |
+-----+--------+
| 101 | A     |
+-----+--------+
| 102 | B     |
+-----+--------+
| 102 | C     |
+-----+--------+

Example 1: Without grouping

LISTAGG(id, '-') WITHIN GROUP (ORDER BY id) returns the name column with the value A-B-C

+--------+
| name   |
+--------+
| A-B-C  |
+--------+

Example 2: Group by the id

LISTAGG(id, '-') WITHIN GROUP (ORDER BY id) returns the name column with the values A and B-C in two groups.

+--------+
| name   |
+--------+
| A      |
+--------+
| B-C    |
+--------+

Distinct Operator

Use the distinct operator to return distinct rows with unique values.

Adding and configuring a distinct operator

  1. From the Operators panel, drag and drop an Distinct operator onto the canvas.
  2. With the distinct operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.

    The Details tab configurations are set by default for the distinct operator.

  3. To view the output of the operator, click the Data tab.
    Distinct rows with unique values are displayed. You can also apply transforms to the data, or select attributes to view a data profile. Learn more about Data Xplorer.
  4. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes. The distinct operator operates only on the filtered attributes, and outputs distinct rows with unique values.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  5. In the Data tab, view a sampling of the data, based on any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  6. In the Validation tab, check for warnings or errors that may cause your data flow to fail.

Sort Operator

Use the sort operator to perform sorting of data in ascending or descending order.

When using a sort operator, it is recommended that you apply the sort operator after you apply other operators. This ensures that the sort operator stays right before the target operator, therefore enabling data to be inserted into the target in a specific order.

Adding and configuring a sort operator

When using a sort operator, apply the sort operator after other shaping operators, and before the operator that requires data to be sorted.

For example, apply the sort operator before a target operator to insert data in the target in a specific sort order.

  1. From the Operators panel, drag and drop a Sort operator onto the canvas.
  2. With the sort operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. In the Sort conditions section, click Add.

    The Add Sort Condition panel displays all the attribute fields fetched from the source table.

  4. In the Add sort condition panel, select the attribute to sort the data by.
  5. To filter attributes using name patterns, enter a name pattern, for example, *CITY*.
  6. For Sort order, select Ascending or Descending, and then click Add.

    Each condition you add is displayed in the sort condition list.

    Note

    You can add multiple sort conditions one-by-one. Sort operates based on the order of sort conditions in the list. For example, the sorting begins based on the first condition in the list. Then the sorted data is resorted based on the second condition, and so on.

    For string data types, the sort operation occurs based on the lexicographic order.

  7. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  9. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Adding sort conditions

After you add a sort operator on the canvas, and connect it with another operator, you can add a sort condition.

From the Add Sort Condition panel, you can select attributes to sort from the fields listed, or filter attribute names using patterns. For string data types, the sort operation occurs based on the lexicographic order.

To add sort conditions:

  1. Under the Details tab, go to the Sort conditions section, and click Add.

    The Add Sort Condition panel displays all the attribute fields fetched from the source table.

  2. In the Add sort condition panel, select the attribute to sort the data by.
  3. To filter attributes using name patterns, enter a name pattern, for example, *CITY*.
  4. For Sort order, select Ascending or Descending, and then click Add.

    Each condition you added is displayed in the sort condition list.

    Note

    You can add multiple sort conditions one-by-one. Sort operates based on the order of sort conditions in the list. For example, the sorting happens first based on the first condition in the list and then the sorted data is re-sorted based on the second condition and so on.
Prioritizing sort conditions

Move the sort conditions up or down in the list to prioritize the sort.

This enables you to sort first by a high priority sort condition, and then re-sort the sorted data by the next condition in the list. For example, if you want to sort first by address and then by the postal code, move the sort condition with address to the top.

To prioritize sort conditions:

  1. Under the Details tab, go to the Sort conditions section.
  2. Click the Actions menu corresponding to the sort condition you want to move up or down.
  3. To make a sort condition to be applied first, select Move up.
  4. To make a sort condition to be applied later, select Move down.
    Sort is applied to the moved-up condition first, and then the moved-down condition.
Editing sort conditions

To edit sort conditions:

  1. Under the Details tab, go to the Sort conditions section.
  2. Click the Actions menu corresponding to the sort condition you want to modify, and then select Edit.
  3. In the Edit sort condition panel, select the new attribute, or filter the attributes by entering a name pattern.
    The already selected attributes or patterns including their associated attributes are greyed out (unavailable) for further selection.
Deleting sort conditions

You can delete your sort conditions one by one or do a bulk delete.

To delete sort conditions:

  1. Under the Details tab, go to the Sort conditions section.
  2. To delete sort conditions one by one, click the Actions menu corresponding to the sort condition you want to delete.
  3. To delete multiple sort conditions, select the check boxes corresponding to each, and then click Delete at the top of the list.

Union Operator

Use the union operator to perform a union operation between two or more data sets.

Understanding union types

You can perform a union operation on up to ten source operators. You must configure at least two source inputs. You can choose to perform the union operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.

Union by attribute name

Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are combined during a union by name operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1, primary input
Department LOCATION-ID Warehouse
IT 1400 San Francisco
Shipping 1500 Southlake, Texas
Accounting 1700 New Jersey
Data Entity 2
Warehouse LOCATION-ID Department
Denver 1600 Benefits
New York 1400 Construction
Resulting Data Entity
Department LOCATION-ID Warehouse
IT 1400 San Francisco
Shipping 1500 Southlake, Texas
Accounting 1700 New Jersey
Benefits 1600 Denver
Construction 1400 New York
Union by attribute position

Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are combined during a union by position operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1
Department LOCATION-ID Warehouse
IT 1400 San Francisco
Shipping 1500 Southlake, Texas
Accounting 1700 New Jersey
Data Entity 2, primary input
Warehouse LOCATION-ID Department
Denver 1600 Benefits
New York 1400 Construction
Resulting Data Entity
Warehouse LOCATION-ID Department
Denver 1600 Benefits
New York 1400 Construction
IT 1400 San Francisco
Shipping 1500 Southlake, Texas
Accounting 1700 New Jersey
Adding and configuring a union operator

  1. To add a union operator to your data flow, drag-and-drop the Union operator from the Operators panel onto the canvas. You can perform the union operation on up to ten data sources.
    The Details tab for the union operator displays in the Properties panel.
  2. On the Details tab, enter a name for this union operation in the Identifier field, or leave the name as-is.
  3. Specify how you want the union operation to be performed using the Match input attribute by field. You can choose from the following two options:
    • Name: The attribute names from the sources are matched to perform the union operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the union by name operation works.
    • Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the union by position operation works.
  4. For Union all, select the check box if you want to the operation to return all rows from all the connected data entities without eliminating any duplicates.
  5. For Primary input, select the source that should be considered as the key source in the union operation. This primary input determines the attribute names, attribute data types, and attribute order for the resulting attributes from the union operation. By default, the first source added to the data flow is set as the primary input.
  6. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  7. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  8. In the Validation tab, check for warnings or errors that may cause your data flow to fail.

Minus Operator

Use the minus operator to compare two data entities and return the rows that are present in one entity but not present in the other entity.

You can choose to keep or eliminate duplicate rows in the resulting data.

Understanding minus types

You can perform a minus operation on two source operators only. You can choose to perform the minus operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.

Minus by attribute name

Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are subtracted during a minus by name operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1, primary input
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Shipping1500Austin
Accounting1700New Jersey
Data Entity 2
DepartmentWarehouseLOCATION-ID
BenefitsDenver1600
ITSan Francisco1400
Resulting Data Entity
DepartmentLOCATION-IDWarehouse
Shipping1500Austin
Accounting1700New Jersey
Minus by attribute position

Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are subtracted during a minus by position operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Shipping1500Austin
Accounting1700New Jersey
Data Entity 2, primary input
Department-NameLocationWarehouse-City
Benefits1600Denver
IT1400San Francisco
Resulting Data Entity
Department-NameLocationWarehouse-City
Benefits1600Denver
Adding and configuring a minus operator

  1. To add a minus operator to your data flow, drag and drop the Minus operator from the Operators panel onto the canvas.
    The Details tab for the minus operator displays in the Properties panel.
  2. On the Details tab, enter a name for this minus operation in the Identifier field, or leave the name as-is.
  3. Specify how you want the minus operation to be performed using the Match input attributes by field. You can choose from the following two options:
    • Name: The attribute names from the sources are matched to perform the minus operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the minus by name operation works.
    • Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the minus by position operation works.
  4. For Minus all, select the check box if you want the operation to return all rows found in one data entity that are not present in the other data entity, without eliminating any duplicates.
  5. For Primary input, select the source to be used as the primary source for the minus operation. The primary input determines the attribute names, attribute data types, and attribute order in the resulting output. By default, the first source that is connected to the minus operator is set as the primary input.
    Icon Primary Input Description
    left outer join venn diagram Left minus Selects all the remaining distinct rows from inbound source 1 after eliminating rows that match inbound source 2.
    right outer join venn diagram Right minus Selects all the remaining distinct rows from inbound source 2 after eliminating rows that match inbound source 1.
  6. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  7. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  8. In the Validation tab, check for warnings or errors that may cause your data flow to fail.

Intersect Operator

Use the intersect operator to compare two or more data entities and return the rows that are present in the connected entities.

You can choose to keep or eliminate duplicate rows in the resulting data.

Understanding intersect types

You can perform an intersect operation on two or more source operators. You can choose to perform the operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.

Intersect by attribute name

Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are intersected by attribute name. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1, primary input
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Shipping1500Austin
Accounting1700New Jersey
Data Entity 2
DepartmentWarehouseLOCATION-ID
BenefitsDenver1600
ITSan Francisco1400
Resulting Data Entity
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Intersect by attribute position

Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are intersected by attribute position. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Shipping1500Austin
Accounting1700New Jersey
Data Entity 2, primary input
Department-NameLocationWarehouse-City
Benefits1600Denver
IT1400San Francisco
Resulting Data Entity
Department-NameLocationWarehouse-City
IT1400San Francisco
Adding and configuring an intersect operator

  1. To add an intersect operator to your data flow, drag and drop the Intersect operator from the Operators panel onto the canvas.
    The Details tab for the intersect operator displays in the Properties panel.
  2. On the Details tab, enter a name for this intersect operation in the Identifier field, or leave the name as-is.
  3. Specify how you want the intersect operation to be performed using the Match input attributes by field. You can choose from the following two options:
    • Name: The attribute names from the sources are matched to perform the intersect operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the intersect by name operation works.
    • Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the intersect by position operation works.
  4. For Intersect all, select the check box if you want to the operation to return all rows in the resulting output, without eliminating any duplicates.
  5. For Primary input, select the source to be used as the primary source for the intersect operation. The primary input determines the attribute names, attribute data types, and attribute order in the resulting output. By default, the first source that is connected to the intersect operator is set as the primary input.
  6. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  7. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  8. In the Validation tab, check for warnings or errors that may cause your data flow to fail.

Split Operator

Use the split operator to divide one source of input data into two or more output ports based on split conditions that are evaluated in a sequence.

Each split condition has an output port. Data that satisfies a condition is directed to the corresponding output port.

By default a split operator is configured with the Unmatched condition, which is always available in the sequence as the last condition. You cannot add your own condition to the Unmatched condition. You also cannot delete the Unmatched condition.

The operator evaluates the conditions one at a time. After all conditions in the sequence have been evaluated, data that does not meet a condition is directed to the Unmatched output port.

Split example

Suppose you have the data entity BANK, with the attributes BANK_ID and BANK_NAME.

You set up two split conditions. The full sequence, including the Unmatched condition, is as follows:

Condition Output PortCondition
CONDITION1SPLIT_1.BANK.BANK_ID<102
CONDITION2SPLIT_1.BANK.BANK_ID<104
UNMATCHEDThe default UNMATCHED condition directs all data that does not meet the other conditions in the sequence to the UNMATCHED output port
BANK Data Entity

The data entity has four rows.

BANK_IDBANK_NAME
101A Bank 101
102B Bank 102
103C Bank 103
104D Bank 104
Condition1 Output, First Matching Condition

CONDITION1 returns one matching row.

BANK_IDBANK_NAME
101A Bank 101
Condition2 Output, First Matching Condition

CONDITION2 returns two matching rows (from the unmatched rows after CONDITION1).

BANK_IDBANK_NAME
102B Bank 102
103C Bank 103
Unmatched Condition Output, First Matching Condition

UNMATCHED condition returns the remaining row.

BANK_IDBANK_NAME
104D Bank 104
Condition1 Output, All Matching Conditions

CONDITION1 returns one matching row.

BANK_IDBANK_NAME
101A Bank 101
Condition2 Output, All Matching Conditions

All data is evaluated by CONDITION2, returning three matching rows.

BANK_IDBANK_NAME
101A Bank 101
102B Bank 102
103C Bank 103
Unmatched Condition Output, All Matching Conditions

UNMATCHED condition returns the rows that do not meet CONDITION1 and CONDITION2.

BANK_IDBANK_NAME
104D Bank 104
Adding and configuring a split operator

When you add a split operator on the canvas, by default the split operator icon is displayed as expanded, showing the Unmatched split condition. The Unmatched condition directs all data that does not meet the other conditions that you add to the sequence.

  1. From the Operators panel, drag and drop a Split operator onto the canvas.
  2. With the split operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Under Match, specify how you want to handle incoming data that meets a split condition. You can choose from the following two options:
    • First matching condition: Data that matches the first condition in the sequence is removed from further processing by subsequent conditions.

      For example, consider a sequence with four conditions, including the last Unmatched condition: Condition1, Condition2, Condition3, and Unmatched.

      For Condition1, the operation does not find matches. For Condition2, matches are found. The operation routes the matched data to the output port that's specified for Condition2, and then evaluates the unmatched data for Condition3. If no matches are found for Condtion3, all the data is routed to the output port that's specified for the Unmatched condition. However, if matches are found for Condition3, the operation routes the matched data to the output port for Condition3, and then routes the remaining data to the port for the Unmatched condition.

    • All matching conditions: Data that matches any condition is evaluated for all conditions in the sequence.

      For example, consider a sequence with Condition1 and Condition2. Matches are found after Condition1 is evaluated. The matched data is routed to the output port that's specified for Condition1. All data (matched and unmatched) is then evaluated by Condition2. Data that matches Condition2 is routed to the corresponding output port. Data that does not meet Condition1 and Condition2 is routed to the output port for the Unmatched split condition.

  4. Under Split conditions, click Add to add a split condition.
    1. In the Add Split Condition panel, edit the Identifier value, or leave it as-is.
    2. In the Condition Builder section, you can double-click or drag and drop attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
      Note

      In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
    3. Click Add.
      The first condition you add is inserted before the Unmatched condition. You can also see the sequence of conditions in the expanded operator icon on the canvas.
    4. Repeat the steps to add conditions to form a sequence.
      The second condition is inserted into the sequence between the first and Unmatched conditions. Subsequent conditions are always inserted before the Unmatched condition. The sequence of split conditions is also reflected in the expanded split operator icon on the canvas.
  5. To connect an output port on a split condition to another operator on the canvas:
    1. Expand the split operator icon to display the conditions sequence.
      If the split operator icon is not expanded, you can still connect the operator to the next operator but the link is created from the port of the Unmatched condition.
    2. Hover over the end of a condition and then drag the small circle on the right side of the condition to the next shaping operator or a target.
    3. Connect each condition to the appropriate operator or target.
    Each condition output port can be linked to multiple shaping operators and target operators. If the split operator icon is not expanded on the canvas, you can hover over a link line to see the condition name of that link.
  6. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  7. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Conditional output port menu shows the same number of items as the number of conditions configured on the operator, including the Unmatched condition. By default, the output from the first condition port is shown. To change the output view, select the port of the condition output from the menu.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  8. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Adding a split condition

Use the Condition Builder to visually select elements to create and add a split condition. You can also enter a split condition manually in the editor.

You can add split conditions to an existing sequence of conditions. Split conditions are added to the end of the sequence, before the Unmatched condition. You cannot add your own condition to the Unmatched condition.

The elements that you can use in a split condition include incoming attributes, parameters, and functions. You can double-click or drag and drop an element from the list to add to the editor to build a condition. You can validate the condition before creating it.

Incoming displays the attributes from the upstream port. For example:

SPLIT_1.BANK.BANK_NAME='ABC Bank'

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Suppose you create a VARCHAR parameter with the name P_VARCHAR_NAME, and set the default value to ABC BANK. Then you can use the parameter in a split condition as:

SPLIT_1.BANK.BANK_NAME=$P_VARCHAR_NAME

Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.

Suppose you create a VARCHAR parameter with the name P_VARCHAR_LIKE, and set the default value to B%. Then you can use the parameter in a split condition as:

SPLIT_1.BANK.BANK_NAME LIKE $P_VARCHAR_LIKE

Here's a list of functions that are available for you to add when you construct conditions:

Hash Functions

FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
Arithmetic Functions

FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions

FunctionDescriptionExample
CURRENT_DATEReturns the current date.CURRENT_DATE returns today's date
CURRENT_TIMESTAMPReturns the current date and time with time zone in the session time zone.CURRENT_TIMESTAMP returns today's date and current time
DATE_ADD(date, number)Returns the date that is the specified number of days after the specified date.DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEEE: Named day of the week, such as Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday

DATE_FORMAT(2020-10-10, 'yyyy-MM-dd') returns '2020-10-10'. The first argument is a Date object representing Oct 10th, 2020.

DATE_FORMAT(2018-junio-17, 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date)Returns the date's day in the month.DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date)Returns the date's day in the week.DAYOFWEEK('2020-12-25') returns 6 in the U.S. because Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)Returns the date's day in the year.DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime)Returns the datetime's hour value.HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)Returns the date's last day of the month.LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)Returns the datetime's minute value.HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)Returns the date's month value.MONTH('2020-06-25') returns 6
QUARTER(date)Returns the quarter of year the date is in.QUARTER('2020-12-25') returns 4
SECOND(datetime)Returns the datetime's second value.HOUR('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr])Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM
TO_TIMESTAMP('2020-10-10 11:10:10', 'YYYY-MM-DD HH:mm:SS') returns a TIMESTAMP object representing 11am 10:10 Oct 10th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date)Returns the date's year value.YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1st, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions

FunctionDescriptionExample
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )Returns the value evaluated at the row that is the first row of the window frame.FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])Returns the value evaluated at the row that is the last row of the window frame.LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)Returns the rank of the current row with gaps, counting from 1.RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause)Returns the unique number of the current row within its partition, counting from 1.ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions

FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columnsCONCAT('Oracle','SQL') returns OracleSQL
INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
TO_CHAR(expr[, format])Converts a date into a string, based on the format. Converts a number into a string, no format is required.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of the week, such as 0 for Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday

TO_CHAR(123) returns 123

TO_CHAR(Date'2020-10-30', 'yyyy.MM.dd', 'en-US') returns 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions

FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions

FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions

FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions

FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Editing split conditions

You can edit any split condition except the Unmatched condition.

  1. On the designer canvas, select the split operator of the condition you want to modify.
    The Details tab for the split operator displays in the Properties panel. On the canvas, you can expand and collapse the operator icon to show and hide the existing split conditions.
  2. In the Details tab, under Split conditions, select Edit from the Actions menu next to the split condition you want to modify.
  3. In the Edit split condition panel, edit the Identifier value, or leave it as-is.
  4. In the Condition builder section, you can double-click to select, or drag and drop incoming attributes or functions to the editor to build your condition. You can also edit the condition expression manually in the editor, and validate the condition before saving it.
  5. Click Save changes.
Changing the sequence of split conditions

You can move a split condition up or down in the sequence. Only the Unmatched condition cannot be moved.

  1. On the designer canvas, select the split operator that has the sequence of conditions you want to modify.
    The Details tab for the split operator displays in the Properties panel. On the canvas, you can expand and collapse the operator icon to show and hide the existing split conditions.
  2. In the Details tab, under Split conditions, review the sequence of split conditions.
  3. Next to a condition row, from the Actions menu, select the action to move that condition up or down the sequence.
    Notice that the condition position in the expanded split operator icon on the canvas changes accordingly when you move a condition.
Deleting split conditions

You can delete any split condition except the Unmatched condition.

  1. On the designer canvas, select the split operator of the condition you want to delete.
    The Details tab for the split operator displays in the Properties panel. On the canvas, you can expand and collapse the operator icon to show and hide the split conditions.
  2. In the Details tab, under Split conditions, select Delete from the Actions menu next to the condition you want to delete.
  3. To delete two or more conditions, use the column next to the Sequence column to select the conditions, and then click Delete.
  4. If the output port on a deleted condition was connected to another operator or target, ensure that you review the data flow and make the necessary changes after deleting conditions.

Pivot Operator

The pivot operator lets you take unique row values from one attribute in an input source and pivot the values into multiple attributes in the output.

Using input from multiple rows, the pivot operation performs a transformation based on aggregate function expressions, and the values of an attribute that you specify as the pivot key. The result of a pivot operation is a pivoted or rearranged output of rows and attributes.

The number of rows in the output is based on the selection of attributes to group by.

  • If you specify one or more attributes to group by, the incoming rows that have the same group by attribute value is grouped into one row. For example, if you specify one group by attribute that has four unique values, the incoming data is transformed and grouped into four rows in the output.
  • If you don't specify any attributes to group by, all the incoming data is transformed into a single output row.

The number of attributes in the output is:

  • Based on the number of attributes you select to group by
  • A multiple of the number of values you select in the pivot key
  • A result of the number of attributes that the aggregate function expressions transform

For example, if you select one group by attribute and three pivot key values, and add an expression that transforms two attributes, the number of attributes in the output is:

1 + (3 * 2)

The total number of attributes in the resulting pivoted output is calculated as follows:

Number of group by attributes + (Number of pivot key values * Number of attributes that are transformed from expressions)

The names of new attributes in the output are derived from a pattern you add for target attributes when you create the expressions.

The aggregate functions you use with a pivot operator determine the pivoted values in the output. If no data is found, null is inserted where a pivoted value is expected.

Pivot example

Consider the data entity PRODUCT_SALES, that has the attributes STORE, PRODUCT, and SALES. The data entity has five rows. You want to create a pivot on the attribute PRODUCT with an aggregate SUM function on SALES.

You specify to group pivoted rows by the attribute STORE. Each unique STORE value becomes a row in the resulting output. Input rows with the same group by value are grouped into the same row in the pivoted output. If you don't specify a group by attribute, all input rows are transformed into a single row in the resulting output.

You specify PRODUCT as the pivot key, and select all three values to pivot into new attributes in the resulting output.

The aggregate SUM function expression on SALES is:

SUM(PIVOT_1_1.PRODUCT_SALES.SALES)

The pattern for target attributes is:

%PIVOT_KEY_VALUE%
Data Entity PRODUCT_SALES
STOREPRODUCTSALES
AB StoreTelevision2
AB StoreTelevision4
Country-StoreTelevision6
Country-StoreRefrigerator8
E-StoreCoffee maker10
Pivot Output: Group By STORE, Pivot Key PRODUCT
STORETELEVISIONREFRIGERATORCOFFEE MAKER
AB-Store6--
Country-Store68-
E-Store--10
Pivot Output: Without Group By, Pivot Key PRODUCT
TELEVISIONREFRIGERATORCOFFEE MAKER
12810
Adding and configuring a pivot operator

The pivot operator performs a transformation using one or more aggregate function expressions on one or more values of an attribute that you specify as the pivot key.

You can choose to group the pivoted rows into a single row or select attributes to create multiple output rows based on the same group by value.

  1. From the Operators panel, drag and drop a Pivot operator onto the canvas.
  2. With the pivot operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. (Optional) For Group by, specify the attributes whose distinct values are used to arrange the pivoted data in grouped rows. By default, all incoming rows are aggregated and grouped into one row unless you specify one or more attributes to group the rows by.
    • Select Attributes and then select one or more attributes to group pivoted rows by the same value. For example, if you select an attribute that has two unique values, the resulting output has two rows of aggregated data.
    • Select Pattern to use a regular expression pattern to select the attributes to group by. Only attributes that match the pattern are used to group pivoted rows by the same value.
  4. For Pivot key, select the attribute whose values can be used as new attributes in the pivoted output. Only attributes that are of type VARCHAR, INTEGER, FLOAT, or BOOLEAN can be used as the pivot key.
  5. Under Pivot key values, click Add key values to select one or more values from the selected pivot key that you want to become pivoted attributes in the output. Up to 50 values can be selected.

    You can use an alias for a selected pivot key value. An alias cannot start with a number. Only the _ special character is allowed.

    To use an alias for a selected pivot key value, select Edit from the Actions menu for that key value. In the Edit Pivot Key Alias panel, enter your alias for the key value.

    To delete a pivot key value from the selected list, select the check box next to it and click Delete. Alternatively, you can select Delete from the Actions menu.

  6. Under Expressions, click Add expression to add your aggregate function expressions. See Adding a pivot expression.
  7. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  9. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Adding a pivot expression

One or more aggregate function expressions are required for a pivot operator.

  1. With the pivot operator selected on the data flow canvas, under Expressions in the properties panel, click Add expression.

  2. In the Add Expression panel, enter a name for the expression in the Identifier field, or leave the name as-is.

  3. (Optional) To use a pattern to apply an expression to multiple source attributes, select Allow bulk selection.

    For example, suppose you have two DISCOUNT_ attributes in your data set (DISCOUNT_VALUE and DISCOUNT_RATE) and you want to apply the MAX function to both of them.

    1. Under Source attributes, select Pattern and click Add pattern.

    2. In the Add Source Pattern panel, add a pattern for selecting a group of source attributes that have names beginning with DISCOUNT_. For example, enter DISCOUNT* and click Add. Then select the Data type.

    3. Under Target attributes, use a pattern for the resulting output attribute names.

      By default, the pattern %MACRO_INPUT%_%PIVOT_KEY_VALUE% is already inserted for you. %MACRO_INPUT% corresponds to the names of the sources attributes as selected by the pattern you added. %PIVOT_KEY_VALUE% corresponds to the values as selected in the pivot key.

      For example, if %PIVOT_KEY_VALUE% indicates TELEVISION, and %MACRO_INPUT% indicates DISCOUNT_VALUE and DISCOUNT_RATE, then in the output, the pivoted attributes are <pivot_name>.<expression_name>.DISCOUNT_VALUE_TELEVISION and <pivot_name>.<expression_name>.DISCOUNT_RATE_TELEVISION.

    4. You can retain the Use source attribute data types selection. Otherwise, under Expression data type, select the Data type and complete the fields corresponding to the type selected.
  4. If you did not select Allow bulk selection, under Target attributes, use a pattern for the resulting output attribute names.

    By default, the pattern %PIVOT_KEY_VALUE% is already inserted for you. %PIVOT_KEY_VALUE% corresponds to the values as selected in the pivot key.

    For example, if %PIVOT_KEY_VALUE% indicates TELEVISION and TELEPHONE, then in the output, the pivoted attributes are <pivot_name>.<expression_name>.TELEVISION and <pivot_name>.<expression_name>.TELEPHONE.

    Under Expression data type, select the Data type and complete the fields corresponding to the type selected.

  5. In the Expression builder section, double-click or drag incoming attributes, parameters, and aggregate functions to add to the editor to build your expression. You can also manually write the expression yourself, and validate the expression.

    The following table shows the list of aggregate functions that are available for constructing pivot expressions.

    FunctionDescriptionExample
    COUNT(value[, value]*)Returns the number of rows for which one or more supplied expressions are all non-null.COUNT(expr1)
    COUNT(*)Returns the total number of retrieved rows, including rows containing null.COUNT(*)
    MAX(value)Returns the maximum value of the argument.MAX(expr)
    MIN(value)Returns the minimum value of the argument.MIN(expr)
    SUM(numeric)Returns the sum calculated from values of a group.SUM(expr1)
  6. To construct a pivot expression, specify the attribute or attributes and the aggregate function.

    • If you selected Allow bulk selection, then use %MACRO_INPUT% in your expression to indicate the attributes on which the function is to be applied.

      For example, if you used the pattern DISCOUNT* to match source attributes DISCOUNT_RATE and DISCOUNT_VALUE, then you can specify an aggregate function such as SUM(numeric) to apply the function on all the attributes that match your pattern. %MACRO_INPUT% replaces the numeric placeholder in the function:

      SUM(%MACRO_INPUT%)
    • If you did not select Allow bulk selection, then specify the attribute in the function.

      For example, your data entity is PRODUCT_SALES and you want to use an aggregate SUM function on the SALES attribute. You can specify the function such as SUM(numeric), replacing the numeric placeholder in the function with the attribute name:

      SUM(PIVOT_1_1.PRODUCT_SALES.SALES)
    • You can use an expression parameter for the aggregate function name in the pivot expression. An expression parameter has a name, type, and default value.

      For example, the expression parameter P_VARCHAR has type VARCHAR and MIN as the default value. You can specify the aggregate function as:

      $P_VARCHAR(%MACRO_INPUT%)
      $P_VARCHAR(PIVOT_1_1.PRODUCT_SALES.SALES)
  7. In the Add Expression panel, click Add.

Lookup Operator

The lookup operator performs a query and transformation using a lookup condition and input from two sources, a primary input source and a lookup input source.

The query operation uses the condition and a value in the primary input to find rows in the lookup source. The transformation appends attributes from the lookup source to the primary source.

You can specify the action to take when multiple rows and no rows are returned by the lookup query. For example, you can specify that the action is to skip non-matching rows, and to return any one matching row when there are multiple matching rows.

The resulting output is a combination of both input sources based on the lookup condition, a value in the primary input, and the preferred actions to take. The primary input determines the order of attributes and rows in the output, with the attributes from the primary input placed before the attributes from the lookup input.

Lookup example

Consider two source data entities in a data flow. Data entity 1 (PAYMENTS) is set as the primary input. Data entity 2 (CUSTOMERS) is set as the lookup input. The lookup condition is set as:

LOOKUP_1_1.PAYMENTS.CUSTOMER_ID = LOOKUP_1_2.CUSTOMERS.CUSTOMER_ID

The resulting lookup output shows how the data from the two inputs are combined and transformed. Attributes from the lookup source are appended to the primary source attributes, with the following behaviors:

  • If the operation does not find a matching record for a value in the lookup source, the record is returned with null inserted for the lookup attributes. For example, no matched records are found for the CUSTOMER_ID values 103, 104, and 105. Therefore, in the resulting output, null is populated in the appended attributes CUSTOMER_ID and NAME.
  • If the operation finds multiple matching records for a value in the lookup source, any one matching record is returned.
Data Entity 1, primary input source
PAYMENT_IDCUSTOMER_IDAMOUNT
11012500
21021110
3103500
4104400
5105150
6102450
Data Entity 2, lookup input source
CUSTOMER_IDNAME
101Peter
102Paul
106Mary
102Pauline
Lookup Output
PAYMENT_IDCUSTOMER_IDAMOUNTCUSTOMER_IDNAME
11012500101Peter
21021110102Paul
3103500nullnull
4104400nullnull
5105150nullnull
6102450102Paul
Adding and configuring a lookup operator

A lookup operator uses two input sources on a data flow.

The following procedure assumes you have added and configured two source operators.

  1. From the Operators panel, drag and drop a Lookup operator onto the canvas.
  2. With the lookup operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. For Primary input, select the source that contains a value for matching one or more rows in the lookup source. Matching is based on a lookup condition, which you create in step 5.
    The primary input also determines the order of attributes and rows in the resulting lookup output.
  4. For Lookup input, select the source that the primary source references to match one or more rows.
    Attributes from the lookup input are appended to the primary input attributes in the resulting output.
  5. For Multiple match, select the action to perform when multiple matching rows are found between the primary source and the lookup source. The action you select determines the rows to use in the resulting output.
    • Return any one matching row: Default action. Selects any one matching row.
    • Return the first matching row: Selects the first row from the rows that match.
    • Return the last matching row: Selects the last row from the rows that match.
    • Return error: Displays an error. Does not complete the operation.
    • Return all matching rows: Selects all matching rows.
  6. For No match, select the action to performs when no matching row is found between the primary source and the lookup source. The action you select determines how to handle the primary source row that does not have a matching row in the lookup source.
    • Skip non-matching rows: Does not include non-matching rows in the resulting output.
    • Return non-matching rows: Default action. Includes non-matching rows in the resulting output with null values.
  7. Next to Lookup condition, click Create.
    1. In the Create Lookup Condition panel, Condition Builder section, you can double-click or drag and drop attributes, parameters, and functions to add to the editor to build a condition. You can also enter the condition expression manually in the editor, and validate the expression.
      Note

      In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
    2. Click Create.
  8. (Optional) Click Assign parameter to use parameters so that the lookup condition is not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
  9. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  10. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  11. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Creating a lookup condition

Use the Condition Builder to visually select elements to create a lookup condition. You can also enter a condition manually in the editor.

A lookup condition lets you use a value in a primary input source to look up records in a lookup input source, returning any rows that match. Rows that do not have a match are returned with null values.

The elements that you can use in a lookup condition include incoming attributes, parameters, and functions. You can double-click or drag and drop an element from the list to add to the editor to build a condition. You can validate the condition before creating it.

Incoming displays the attributes from the upstream input ports in two separate LOOKUP folders. View the attributes from each port by expanding or collapsing the appropriate LOOKUP folder. For example, LOOKUP_1_1 is the primary input, LOOKUP_1_2 is the lookup input, lookup conditions based on a value in the primary input attribute ADDRESS_ID could be:

LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = '2001'
LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = LOOKUP_1_2.BANK_CUSTOMER.ADDRESS_ID

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Suppose you want to look up bank customers of a specific bank. You can create a VARCHAR parameter with the name P_LOOK_UP, and set the default value to 2001, which is the specific bank value. Then you can create the lookup condition as:

LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = LOOKUP_1_2.BANK_CUSTOMER.ADDRESS_ID AND LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = $P_LOOK_UP

Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.

Here's a list of functions that are available for you to add when you construct conditions:

Hash Functions

FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
Arithmetic Functions

FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions

FunctionDescriptionExample
CURRENT_DATEReturns the current date.CURRENT_DATE returns today's date
CURRENT_TIMESTAMPReturns the current date and time with time zone in the session time zone.CURRENT_TIMESTAMP returns today's date and current time
DATE_ADD(date, number)Returns the date that is the specified number of days after the specified date.DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEEE: Named day of the week, such as Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday

DATE_FORMAT(2020-10-10, 'yyyy-MM-dd') returns '2020-10-10'. The first argument is a Date object representing Oct 10th, 2020.

DATE_FORMAT(2018-junio-17, 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date)Returns the date's day in the month.DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date)Returns the date's day in the week.DAYOFWEEK('2020-12-25') returns 6 in the U.S. because Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)Returns the date's day in the year.DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime)Returns the datetime's hour value.HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)Returns the date's last day of the month.LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)Returns the datetime's minute value.HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)Returns the date's month value.MONTH('2020-06-25') returns 6
QUARTER(date)Returns the quarter of year the date is in.QUARTER('2020-12-25') returns 4
SECOND(datetime)Returns the datetime's second value.HOUR('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr])Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of week in month, such as 1 if April 1st is the first Friday of the month.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • a: AM or PM
TO_TIMESTAMP('2020-10-10 11:10:10', 'YYYY-MM-DD HH:mm:SS') returns a TIMESTAMP object representing 11am 10:10 Oct 10th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date)Returns the date's year value.YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1st, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions

FunctionDescriptionExample
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )Returns the value evaluated at the row that is the first row of the window frame.FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])Returns the value evaluated at the row that is the last row of the window frame.LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)Returns the value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, default is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)Returns the rank of the current row with gaps, counting from 1.RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause)Returns the unique number of the current row within its partition, counting from 1.ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions

FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columnsCONCAT('Oracle','SQL') returns OracleSQL
INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
TO_CHAR(expr[, format])Converts a date into a string, based on the format. Converts a number into a string, no format is required.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for January 1st
  • dd: Numeric day of the month, such as 01 for January 1st
  • DDD: Numeric day of the year, such as 001 for January 1st
  • F: Numeric day of the week, such as 0 for Sunday
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday

TO_CHAR(123) returns 123

TO_CHAR(Date'2020-10-30', 'yyyy.MM.dd', 'en-US') returns 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions

FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions

FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions

FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions

FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Function Operator

Use the function operator to invoke Oracle Cloud Infrastructure Oracle Functions from within Data Integration.

Note

For complex data structures like Map, Array and Composite (Struct), only the JSON serialization format is currently supported.

Before you use Oracle Functions from within a data flow in Data Integration, ensure that you understand the dependencies, and have completed the prerequisite tasks.

Before you begin

The function you want to invoke must be deployed in Oracle Functions. The function can be written in any language.

Use the following task checklist to ensure that you have the required setup and information for using the function operator with Oracle Functions.

TaskRequirement
Set up to use and access Oracle Functions

This topic assumes that your tenancy and development environment are already set up for deploying functions to Oracle Functions.

See Preparing for Oracle Functions.

Only functions that are deployed to Oracle Functions can be used with the function operator in Data Integration.

Create policies to control the invocation and management of functions.

This topic assumes you or your tenancy administrator have already created the Oracle Cloud Infrastructure policies to grant access to function-related and network resources. See Summary of Policies to Create for Oracle Functions.

In a production environment, you might want to restrict users to invoking functions in a specific application, or to invoking a specific function only.

For example, to restrict users to invoking functions in a specific workspace, enter the policy statements in the following format:

allow any-user to use fn-invocation in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>'}

allow any-user to read fn-function in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>'}

To restrict users to invoking a specific function in a workspace:

allow any-user to use fn-invocation in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>', target.function.id='<function-ocid>'}

allow any-user to read fn-function in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>', target.function.id='<function-ocid>'}

See Controlling Access to Invoke and Manage Functions.

Create an application in Oracle Functions.

An application is a logical grouping of functions.

In an application, you specify one to three subnets in which to run functions. Functions running in one application are isolated from functions running in another application.

See Creating Applications in Oracle Functions.

Deploy a function in Oracle Functions.

When you deploy a function to Oracle Functions using the Fn Project CLI, the function is built as a Docker image and pushed to a specified Docker registry.

See Creating and Deploying Functions.

Collect the information needed to use the deployed function with the function operator.

When you add and configure a function operator, you need to know the following:

  • The compartment that has the Oracle Functions application in which the function you want to use is deployed
  • The application in Oracle Functions that has the deployed function
  • The name of the deployed function
  • The fields that are defined in the function
Adding and configuring a function operator

A function operator lets you use a function deployed in Oracle Functions to process data from an input source. Primitive and complex data types are supported.

Specify the input shape of the function, and the input and output attributes for the function operator to read and write. Then manually map the source attributes to the input attributes.

Currently, only the JSON serialization format is supported for complex data structures like Map, Array, and Composite (Struct).

The following procedure assumes you have added and configured a source operator with the data entity you want to use with the function.

  1. From the Operators panel, drag and drop a Function operator onto the canvas.
  2. With the function operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Select the OCI Function to use for this operator. See Selecting a function.
  4. Select the Serialization format and define the function input and output data.
  5. Specify the Input attributes and Output attributes for the operator, and the Function configuration values for the function. See Adding a function operator property.

    The Input attributes define the input shape for the function operator, which is one or more incoming attributes for the function to work on. Input attributes are mapped to the incoming attributes from the data entity source.

    The Function configuration values define the input shape of the function, which is one or more fields that are defined in the function. Define a function field with a name and a value. The value of a field matches an input attribute. You can parameterize a function configuration value.

    The Output attributes define the output shape for the function operator, which is one or more attributes for the function to write output to. Add an output attribute for each function field that you want in the output. Subsequent downstream operators can use the output attributes.

  6. (Optional) Under Function properties, expand Function configuration to review the function configuration values. You can modify or parameterize a value.

    BATCH_SIZE is a pre-defined function configuration that represents the number of rows that a function processes at a time. The default is 10,000 rows.

    For a function configuration value, you can:

  7. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Map tab, drag and drop an attribute from the data entity source to the corresponding input attribute on the function to create a manual mapping.
  9. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The function output attributes are appended to the data set.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  10. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Selecting a function

The function you want to use with a function operator must be deployed in an application in Oracle Cloud Infrastructure Oracle Functions.

  1. On the canvas, select the function operator.
  2. Under the Details tab of the Properties panel, for OCI Function, click Select.
  3. In the Select an Oracle Function panel, select the Compartment that has the Oracle Functions application in which the function you want to use has been deployed.
  4. Select the Application in Oracle Functions that has the deployed function.
  5. In the OCI Function section, select the function.
  6. Click OK.
Adding a function operator property

After selecting a deployed function, specify the properties that define the input data for the function to process, the fields in the function configuration, and the output data that the function returns.

  1. Under Function properties, click Add property.
  2. In the Add Property panel, select the Type of property you want to specify.
    • Input attributes: Input shape for the operator. Specify one or more attributes to represent the incoming attributes for the function to process. Input attributes are mapped to the incoming attributes from the data entity source.
    • Function configuration: Specify one or more function fields that define the input shape of the function. A field has a name and a value. The value of a field specifies an input attribute.
    • Output attributes: Output shape for the operator. Specify zero or more attributes to represent the output from the function after the data is processed. Add an output attribute for each function field that you want in the output. Subsequent downstream operators can use the output attributes.
  3. Enter a name for the property in the Identifier field.
  4. Select the Data type of the property.
  5. For a primitive data type, depending on the Type and Data type of the property you're specifying, specify the following fields, and other applicable fields.

    • Length: For an input or output attribute, enter a length. For example, you specify a length for a numeric or varchar data type.
    • Value: For a function field, enter a value that matches an input attribute.
  6. For a Map complex data type, specify the key-value pair for a map element. Click Add data type to select the key data type and value data type.

    • Key: Select from primitive data types only.
    • Value: Select from primitive and complex data types.
  7. For an Array complex data type, click Add data type to specify the Element data type. You can select from primitive and complex data types.

  8. For a Composite (Struct) data type, specify the schema child or schema children. To add a schema child, click the + symbol next to the parent schema name.

    In the Add Field panel for a schema child, enter the Identifier, and select a Data type. You can select from primitive and complex data types. Click Add to add the schema child.

    Click the + symbol again if you want to add more than one schema child to the parent schema.

    To edit or delete a schema child, use the Actions menu at the end of the row.

    You cannot delete the parent schema element.

  9. In the Add Property panel, click Add.
  10. Repeat the steps to add the properties you need to define the function input, function fields, and function output.