2 Introduction to Component KMs

This chapter provides an introduction to Component KMs. It explains briefly about Component KMs and their different types.

This chapter includes the following sections:

2.1 What is a Component KM?

Component KM is a new, improved style of KM development, which is applicable for IKMs, LKMs, XKMs. KMs for mappings, have two different types of implementation styles: The legacy 11g-style and component-style. 11g-style KMs are designed to use the monolithic odiRef substitution API object and syntax in their template commands. Component-style KMs are designed to use the newer object-oriented substitution API objects and newer template flow control syntax in their template commands. Both styles of KM can be seen in the project and global KM tree, as IKMs, LKMs, and CKMs with a predefined set of component-style KMs called XKMs. All new LKMs, IKMs, and XKMs should be designed and coded using the new component KM style. Component KMs are new since ODI 12c and are first exposed in the ODI studio from 12.2.1.2.1 release. A component KM has the same functionality as any other KM, which includes tasks and options. But it also includes some added new functions that includes:

  • An associated delegate script whose purpose is to produce an Abstract Syntax Tree (AST). The produced AST object is a tree of java class instances that describes the metadata for the mapping component, in a way that is tailored to be used to generate code for a specific language and technology.

  • Source and target language fields at the KM and KM task level, similar to the existing source and target technology fields.

  • Support for KM inheritance, which allows a KM to inherit tasks, options, and field values from a base KM.

  • A set of flow control syntax elements to control the code generation flow for the KM task source and target commands.

  • The ability to include a globally sharable code template snippet, known as a “global template”, as part of a KM task source or target command.

  • A groovy-language script that is associated with each KM task, which allows new substitution variables to be defined. The newly defined substitution variables can be used in the task source and target commands.

The Component KM can be one of 4 types: IKM, LKM, XKM, and CKM. Refer to What is a Knowledge Module?, for more details on the types of component KMs.

2.2 Syntax Elements of Component KMs

There are 2 new basic syntax elements that are specific to Component KMs:

  • The first main Component KM syntax element that can be used in Component KM tasks is a command syntax that looks like this: {# command #}. The {#...#}escape characters are somewhat similar to the JSP-type <%...%> escape commands that exist for all KMs, except that the escaped text is not java. The component KM specific command syntax is similar to the command syntax for Apache Velocity Template Language (VTL) syntax, except that there is an extra character due to the rigorous demands of heterogeneous code generation. Any syntax used with ODI code generation templates must be very strong so that the possibility of confusion with some actual generated text in some language/technology combination is minimized.

  • The other syntax element specific to Component KMs is the escape syntax used to de-reference substitution variables and API calls. The syntax looks like $[variableName] or $[variable.methodName()].

    For Example

    If a variable called "tableName" has been created in the groovy variable definition script, then the command could contain text like this: CREATE TABLE $[tableName]. When the code is generated, this would be rendered as: CREATE TABLE MY_TABLE. If the tableName variable evaluates to MY_TABLE for this mapping.

    An example of a method call on an object type variable would be to define command text as: CREATE TABLE $[physicalNode.getName()]. There is a built-in variable physicalNode that is set to the MapPhysicalNode instance of the node that this KM is assigned to. So during code generation, if the name of the target physical node is "TGT_TABLE", then this code would be rendered as: CREATE TABLE TGT_TABLE.

    This syntax is also similar to Apache Velocity Template Language (VTL) syntax, which uses ${variableName}. The square bracket “[“ is used instead of “{“, because the “{“ is already heavily used by java and shell script syntax, as well as other languages. The new substitution API variable de-reference syntax must not be confused with the JSP-type expression syntax like <%=variableName%>. The main difference is that the escaped text is not java. It only supports a variable name or one method call, not an arbitrarily complex java expression. If some complex java expression is desired, the best practice for component KMs is to define a new substitution variable in the groovy variable definition script associated with the KM task or the KM, and set it equal to the complex java expression. The groovy script does support complex java or groovy expressions. In this way, the specialized java or groovy code is kept separate from the actual template commands.

  • One other important point to note about the Component KM syntax is that it only applies to the code generation execution phase. In the older JSP-type template substitution syntax which uses angle brackets, for example(<%...%>) there are altogether four execution phases defined. They are:

    • code generation phase <%...%>

    • agent phase <?...?>

    • post-agent phase <$...$>

    • pre-execute phase <@...@>

    These define the time phase in which the code generation and execution process occur, when the substitution is done. However, for the Component KM syntax elements, the substitution is always done in the initial code generation phase. The Component KM functionality is thus altogether a code generation functionality as there is no new agent or execution functionality. If some agent-phase or post-agent substitution is desired, for example for late password substitution, then the JSP-type escapes must still be used.

2.3 Component KM — Flow Control Commands

Listed below are the flow control commands used in component KMs:

  1. #IF

    Syntax

    {# IF condition #} text [{# ELSIF condition #} text [{# ELSIF condition #} text ...]] [{# ELSE #} text] {# ENDIF #}

    Description

    Conditionally include text in the generated code. The condition can be a simple substitution variable ref or method call or combinations of simple relational condition predicates. Supported relational operators: [==, !=, <, >, !, OR, AND].

    Example

    {# INCLUDE = 'ConstantFromClauseText' #}
    {# IF $[QUERY.isConstantQuery()] #}
    {# ELSE #}FROM {#NL#}
    {# LIST #}  $[QUERY.getFromList().foreach(getText())]{# SEP #} ,{#NL#} {# ENDLIST #}
    {# ENDIF #}
    
  2. #INCLUDE

    Syntax

    {# INCLUDE 'templateName' #}

    Description

    Includes the text of a named global shared template as part of this template text.

    Example

    {# INCLUDE = 'PreInsertList' #}
  3. #TEMPLATE

    Syntax

    {# TEMPLATE name='templateName' technology='technoName' #}

    Description

    Create a local overridden template, or specify a dependent template, with the specified name and technology. The technology can be GENERIC.

    Example

    {# TEMPLATE name='ANSIJoinText' technology='GENERIC' #}
    templateText...
    {# ENDTEMPLATE #}
    
  4. Substitution Variable Reference

    Syntax

    $[variableName[.methodName([primitiveArg[,primitiveArg...]]).methodName(primitiveArgs)]][.foreach(methodName())]]

    Description

    Substitutes the string value of an in-scope variable or method call result into the generated template text. The square brackets in bold are literal square brackets and the other square brackets are optional syntax indicators. The primitive arguments are method parameters that must have simple literal java data types such as string or int. The foreach method is a special syntax that is only allowed inside a #LIST command block, and causes the specified method to be called for each list item, to return the generated text.

    Example

    Generate the set operation between two queries that are part of a UNION ALL set operation:

    SELECT A, B FROM TAB1
    $[QUERY.getSetOperation()]
    SELECT C, D FROM TAB2
    

    If the currently processing QUERY object has a set operation type of UNION ALL, then the generated code from this command will look like this:

    SELECT A, B FROM TAB1
    UNION ALL
    SELECT C, D FROM TAB2
    

    For more details on the default built-in variables, see SQL Structured Substitution API Reference appendix chapter.

  5. #INDENT

    Syntax

    {# INDENT #}

    Description

    Substitutes a set of tab or space characters into the generated code, depending on the current indent level.

    Example

    {# INDENT #}
  6. #LIST

    Syntax

    {# LIST #} listVariablesAndListText {#SEP#}separatorChars{# ENDLIST #}

    Description

    Substitutes a set of list variables and text into the generated code. The list text must include at least one list variable, and can include multiple list variables, plus random text and non-list substitution variables. The list text can include other template commands. If there are multiple list variables, the lists must be of the same size. If separator characters are specified, they will be reproduced in the generated code after each list item.

    Example

    {# LIST #} $[INSERT.getColumnList().foreach(getText())] {#SEP#},{#NL#}{# ENDLIST #}
  7. #FOR

    Syntax

    {# FOR (listVar[, listVar ...]) IN ($[listItemAlias][,$[listItemAlias]]) SEP = 'separatorChars' #} text $[listItemAlias] [text $[listItemAlias]] {# ENDFOR #}

    Description

    Substitutes a set of list variables and text into the generated code. If there are multiple list variables, the lists must be of the same size. If separator characters are specified, they will be reproduced in the generated code after each list item.

    Example

    {# FOR ($[QUERY.getSelectList()],$[QUERY.getAliasList()]) IN ($[SL],$[AL]) SEP = ',0x000A' #} $[SL] $[QUERY.getColumnAliasSeparator()] $[AL] {# ENDFOR #}
  8. #INC_INDENT

    Syntax

    {# INC_INDENT #}

    Description

    Increments the current indent count.

    Example

    {# INC_INDENT #}
  9. #DEC_INDENT

    Syntax

    {# DEC_INDENT #}

    Description

    Decrements the current indent count.

    Example

    {# DEC_INDENT #}
  10. #NL

    Syntax

    {#NL#}

    Description

    Substitutes a new line character into the generated code.

    Note:

    Actually the new lines in the template may not reproduced to the generated code, if template option REPLICATE_NEWLINE is set to false (default is true).

    Example

    Hello world{#NL#}
  11. #TEMPLATE_OPTIONS

    Syntax

    {# TEMPLATE OPTIONS optName='optValue'[ optName='optValue' ...]

    Description

    Sets some named template options to a value. The options will take effect immediately following the TEMPLATE_OPTIONS command in the template.

    Example

    {# TEMPLATE_OPTIONS REPLICATE_NEWLINE='false' #}
  12. Line Comment

    Syntax

    ## commentText

    Description

    Template comment text that will not be reproduced in the generated code.

    Example

    ## This is a comment.
  13. Multi-line Comment

    Syntax

    #* commentText *#

    Description

    Multi-line comment text that will not be reproduced in the generated code.

    Example

    #* This is a comment. *#

2.4 Global Templates

Global templates are a type of ODI object closely related to Component KMs. They are found in the global object tree, under the Global Templates folder. A global template is basically a snippet of template text that can be reused by any Component KM. It has a name, and an associated language and technology. Some pre-seeded global templates are supplied with ODI when it is installed, for common uses such as SQL queries and inserts, and Spark Python scripts.

  • A global template is used in a KM task command by using the {# INCLUDE #} command, which specifies the template name. During code generation, the template text is substituted into the generated text, at the text position where the {#INCLUDE#} command was inserted.

  • In the KM task command editor, the #INCLUDE statement can be expanded by clicking the “+” code folding button on the left-hand side of the text. When that is done, the global template text will be substituted for the #INCLUDE statement in the command text. It is permitted to have nested #INCLUDE statements, which results in nested template text substitution.

  • When the code generator expands the KM task text as part of generating a session or scenario task, it will find the appropriate matching global template for each #INCLUDE statement, based on the template name specified in the #INCLUDE statement, and the technology and language of the template. It is permitted to define multiple global templates with the same name, but with different technology and language. The right one is picked by matching the execution technology and the KM task language setting with the global template technology and language setting. A specific global template instance can be picked which will disregard the execution technology and language, by specifying the language and technology in the #INCLUDE statement.

2.5 KM Inheritance

KM inheritance allows a KM to inherit tasks, options, and field values from a base KM. To use KM inheritance, the Base Component KM field must be set for a Component KM. When a base KM is set, the derived KM will inherit the following properties from the base KM:

  • All KM options

  • The base KM language, technology, generation type, generation style settings, as well as the base KM delegate script

  • Some base KM tasks may be inherited, according to the following set of rules:

    1. For each task type (MAP_BEGIN, EX_UNIT_BEGIN, etc.), if no tasks of that type are directly owned by the derived KM, and if base tasks have not explicitly been removed, then all base tasks of that type are inherited. This rule typically applies to a newly created KM that has a base KM set.

    2. If some directly owned tasks of a specified line type have been added, then by default no base tasks of that type will be inherited, unless they have been explicitly added using the “Include tasks” button in the KM editor.

    3. Any arbitrary set of base tasks can be included in the derived KM by individually adding them using the "Include tasks" icon present in the KM editor.

    4. Any included base task can be removed by using the remove icon present in the KM editor tasks tab.

    5. Some seeded KMs may use a different type of task inclusion, that is only available internally. This involves matching tasks based on a keyword. However any KM that inherits from these KMs can include and exclude tasks as specified above, and from that point onwards the normal task inclusion rules are followed.

2.6 Groovy Variable Definition Scripts

Each Component KM task has a source and target command template, just like all KMs. However they also have an associated groovy variable definition script which can be used to define new substitution variables, starting with the existing built-in substitution API variables. For example, a column list string can be created by looping through the SelectItem objects contained by a SqlQuery object, which in turn is contained by a SqlInsertStatement object.

In addition, there is a groovy variable definition script that is owned by the KM itself. The variables defined in the KM-level variable definition script can be reused by all the tasks in the KM. This is useful for cases where some value may be used by multiple tasks in the KM.

Both the task-level and KM-level groovy variable definition script can be edited and viewed in the KM task command editor.

2.7 Structured Substitution API

The Component KMs are the only type of KMs that have access to the structured substitution API. The structured substitution API objects are accessed by a set of built-in substitution API variables, similar to the “odiRef” variable that is available inside the <%...%> escape characters, for all KMs. There are two types of built-in variables. They are:

  • Built-in Public SDK Variables — These variables are always the same regardless of the KM, and they expose the mapping public SDK objects for the component node and KM.

  • Built-in Structured Substitution API Variables — The substitution API built-in variables are always in all-caps and represent the API objects produced by the particular IKM or LKM that is being edited. The full list of built-in substitution API variables can be obtained from the in-scope tree in the bottom right-hand corner of the KM source/target command editor.

The list of typical substitution API objects produced by a SQL IKM or LKM are:

  • INSERT- The SqlInsertStatement object that represents the top-level INSERT DML statement that will load the target. It is available only for insert-type KMs.

  • UPDATE- The SqlUpdateStatement object that represents the top-level update DML statement that will load the target. It is available only for update-type KMs.

  • MERGE- The top-level MergeStatement object that is used to produce the merge statement that loads the target.

  • QUERY- The SqlQuery object that represents the top-level extract query that will fetch the data to be loaded.

  • TABLE- The top-level target Table object for KMs that are loading a target table.

  • FROMLIST- The list of FromClause objects for the top-level SqlQuery object.

  • JOIN- The JoinTable object that represents the current join when generating join code. It is available only inside global templates that are used by to produce join text, such as the JoinTable template.

  • ATTR- The current source attribute when generating text for source attributes. It is available only inside global templates that are used to produce source attribute text, for example SourceMapAttribute

  • COLLIST- The list of columns in the C$ staging table, formatted for an insert statement.

  • SUBQUERY - The FilterSubQuery object that represents the filtering for a Filter subquery component.

The list of built-in structured substitution API variables that are available for Component KM source and target commands are:

  • physicalNode— This is the mapping physical node object of type MapPhysicalNode, to which the Component KM is assigned. For example, for an IKM, the physicalNode built-in variable would be a reference to the node associated with a target datastore component. The physical node would have this IKM, assigned as its IKM.

  • component— The mapping logical component that is associated with the physical node to which the Component KM is assigned.

  • connector— The mapping connector point associated with the physical node. For LKMs assigned to an AP node, this connector point is the input connector point of the target component to which the AP node is connected. For IKMs assigned to a target, this is the datastore output connector point. For XKMs, it will be the associted output connector point for the physical node. This is helpful in some cases, for example for a splitter component there is a different physical node for each output connector, so it is helpful to know which one is associated with the node and the XKM.

  • generatorContext— The GeneratorContext object associated with the code generator that will use the Component KM. A generator context is a container that holds named properties that are global to the code generator execution.

  • taskControl— This is a reference to a TaskControl object, which can be used to control the number of session or scenario tasks that are generated for a single Component KM task line. This object should be used in the groovy script, since its methods do not return any string value that can be used as a template substitution string.

  • upstreamASTList— A java.util.List is the object that contains a list of the AST substitution API objects produced by all upstream nodes from the one that this Component KM is assigned to. May contain multiple items, if the current component is a multi-input component such as a join, otherwise the list will contain only one upstream substitution API object. For example, in a SQL generation, the upstream AST object may typically be a SqlQuery object.

  • baseNode— Only applicable to multi-connect IKMs. A reference to the AP node that is connected to the datastore target node to which the multi-connect IKM is assigned.

  • componentKM— A reference to the owning Component KM object, usable within the template command or groovy variable definition context.

  • componentGenerator— A ComponentGenerator instance that is used to generate code for the component to which this Component KM is assigned.

  • subtype— The subtype value for this Component KM, or null, if the subtype is not set. The subtype is a string value that is used to determine the name of the Component KM generator delegate method that is called to produce the AST substitutino API object for this Component KM. targetNameForLoadingTask is applicable only for IKMs and the name of the target is loaded by this IKM.

  • collTableName— Applicable for LKMs only, with LKM type of PERSISTENT. Set to the name of the temporary staging table that will be created to stage the source data.

For more details on the substitution methods used in each built-in object, see SQL Structured Substitution API Reference appendix chapter,

2.8 Task Control Objects

The TaskControl objects can be used in the groovy variable definition scripts, and is accessed using the built-in variable “taskControl”. It has certain methods which can be called to control the number of task instances that are generated from this task. These methods are:

  • skipTaskGeneration()— Calling this method will cause the code generator to skip generating the main instance of this task. Other instances of the task can still be added by calling instantiateTask().

  • instantiateTask()— Calling this will cause the code generator to instantiate an extra instance of the task in the generated session or scenario. An arbitrary number of task instances can be generated by calling the method multiple times. A hash table containing an alternate set of bound variables for task generation can be passed as a parameter. If one of the variables in the alternate variable set has the same name as a standard built-in variable, then the alternate value will be used instead of the standard default value. Each extra instance of the task will be given a unique name, which will be the base task name plus a uniqueness suffix.

2.9 Seeded Component KMs

A set of built-in global Component XKMs, LKMs, and IKMs are provided (starting in ODI version 12.2.1.2.1). They can be viewed in the global KM tree. They are known as seeded KMs. A seeded KM cannot be deleted or edited in the UI studio. However a seeded KM can be duplicated, and the duplicate copy of the KM can be edited, if required. Also a new Component KM can be created, and a seeded KM can be set as its base KM, and thus the new KM will inherit the seeded base KM functionality, and specific functionality can be overridden as desired. This is the best practice that should be used to customize a built-in seeded KM.