Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Custom Encoders User's Guide Java CAPS Documentation |
Understanding the Encoder Framework
Parent, Child, and Sibling Nodes
Creating the Abstract Message Definition
Applying Custom Encoding to an XSD
To Apply the Custom Encoder to an XSD
Validating and Testing the Custom Message Definition
Validating the Custom Message Definition
Testing the Encoder Runtime Behavior
Using Custom Encoders in JBI Projects
To Use a Custom Encoder in a JBI Project
The following topics provide information about delimiters:
You can define a set of delimiters — a delimiter list — for any node in the hierarchical data structure. This delimiter list is used in the external data representation for that node and its descendents. A delimiter list defined for any non-root node overrides the effect of any ancestor node’s delimiter list on both the node itself and its descendents.
Delimiters are defined using the Delimiter List Editor, as illustrated in the following figure. The editor is invoked by clicking the delim property value field in the node's property dialog box and clicking the ellipsis (…) button, or by double-clicking the field. See Defining a Delimiter List for additional information.
Clicking within a field in the Delimiter List Editor enables the field for editing. After typing a value into a field, you must press Enter to set the value. Clicking the drop-down menu button in one of the following three fields displays its menu, as illustrated in the following figure.
Type
Optional
Terminator
Figure 6 Delimiter List Editor: Left Side
Figure 7 Delimiter List Editor: Right Side
Table 8 Delimiter List Editor Command Buttons
|
Table 9 Delimiter Properties
|
Delimiter levels are assigned in order to those hierarchical levels of an Encoder that contain at least one node that is specified as being delimited. If none of the nodes at a particular hierarchical level is delimited, that hierarchical level is skipped in assigning delimiter levels.
Delimiter lists are typically specified on the root node, so that the list applies to the entire Encoder. The root node itself is typically not delimited, so that Level 1 would apply to those nodes that are children of the root node. See the following figure and example.
Figure 8 Encoder Hierarchical and Delimiter Levels
For example, if you want to parse the following data:
a^b|c^d|e
you might create a Custom Encoder as follows:
root
element_1
field_1
field_2
element_2
field_3
field_4
field_5
In this example, the delimiter list is specified on the root node, which is not delimited; therefore, the list has two levels:
Level 1
Delimiter |
Level 2
Delimiter ^
The Level 1 delimiter (|) applies to element_1, element_2, and field_5. The Level 2 delimiter (^) applies to field_1 - field_4.
If the root node is set to be delimited, the Level 1 delimiters will then apply to it. Using the above example, the Level 2 delimiter (^) would then apply to element_1, element_2, and field_5, and a new Level 3 delimiter would apply to field_1 - field_4.
Delimiter lists can be much more complex than this very simple example. For instance, you can create multiple delimiters of different types at any given level, and you can specify a delimiter list on any node within the Encoder— not only the root node as shown in the example. See Defining a Delimiter List for a step-by-step description of the procedure for creating a Delimiter List.
The Delimiter Type property specifies whether the delimiter is a terminator at the end of the byte sequence (normal), a separator between byte sequences in an array (repeat) or an escape sequence.
Table 10 Delimiter Type Options
|
An escape delimiter is simply a sequence that is recognized and ignored during parsing. Its purpose is to allow the use of escape sequences to embed byte sequences in data that would otherwise be seen as delimiter occurrences.
For example, if there is a normal delimiter “+” at a given level, and we define an escape delimiter “\+” as shown in the following figure, then aaa+b\+c+ddd will parse as three fields: aaa, b\+c, and ddd. If the escape delimiter were not defined, the sequence would then parse as four fields: aaa, b\, c, and ddd.
Figure 9 Delimiter Type - Escape
If there is only an escape delimiter on a given level, however, it presents a no delimiter defined situation for delim and array nodes.
Precedence indicates the priority of a certain delimiter, relative to the other delimiters. Precedence is used to resolve delimiter conflicts when one delimiter is a copy or prefix of another. In case of equal precedence, the innermost prevails.
By default, all delimiters are at precedence 10, which means they are all considered the same; fixed fields are hard-coded at precedence 10. Delimiters on parent nodes are not considered when parsing the child fields; only the child’s delimiter (or if it is a fixed field, its length). The range of valid precedence values is from 1 to 100, inclusive. The higher the value, the higher the precedence. Delimiters with higher precedence have a greater chance to be matched.
Changing the precedence of a delimiter will cause it to be applied to the input data-stream in different ways. For example:
root
element (type delim, delimiter = “^”, repeat)
field_1 (type fixed, length = 5)
field_2 (type fixed, length = 8, optional)
Although this will parse ”abcde12345678^zyxvuABCDEFGH’, it will not parse the text ”abcde^zyxvuABCDEFGH’ even though the second fixed field is optional. The reason is that the element’s delimiter is ignored within the fixed field because they have the same precedence. If you want the element’s delimiter to be examined within the fixed field data, you must change its precedence, for example:
root
element (type delim, delimiter = “^”, repeat, precedence = 11)
field_1 (type fixed, length = 5)
field_2 (type fixed, length = 8, optional)
This will successfully parse the text ”abcde^zyxvuABCDEFGH’.
A similar argument can be applied to delimited child nodes. The parser normally attempts to match the child delimiter— setting the precedence to 11 forces the parser to match the parent delimiter first.
The Optional property specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty.
Table 11 Optional Mode Options
|
As illustrative examples, consider the tree structures shown in the following figure and table, where the node a has a caret (^) as its delimiter, and the child nodes b, c, and d all have asterisks (*) as their delimiters.
Example 1: Child node c is optional. (Child nodes c and d must have different values for the match parameter.)
Figure 10 Optional Mode Property (Example 1)
|
Example 2: Child nodes c and d are both optional.
Figure 11 Optional Mode Property (Example 2)
|
The Terminator property specifies whether or not the delimiter should appear for the last node of current level. For example, it determines whether data should look like "a,b,|" or "a,b|" assuming "," (comma) is the current level delimiter and "|" (pipe) is the parent level delimiter.
The delimiter that terminates the last child of the level in question is referred to as the terminator.
Table 12 Terminator Mode Options
|
Consider the tree structure shown in the following figure, where the node a has a caret (^) as its delimiter, and its child nodes b and c have asterisks (*) as their delimiters.
Figure 12 Terminator Mode Property Example
|
Note - There is essentially no limitation on what characters you can use as delimiters; however, you obviously want to avoid characters that can be confused with data or interfere with escape sequences, as described in Escape Option. The backslash (\) is normally used as an escape character; for example, the HL7 protocol uses a double backslash as part of an escape sequence that provides special text formatting instructions. Additionally, a colon ( :) is used as a literal in system-generated time strings. This can interfere with recovery procedures, for example following a Domain shutdown.
Use a backslash (\) to escape special characters. The following table lists the currently supported escape sequences.
Table 13 Escape Sequences
|
*For octal values, the leading variable d can only be 0 - 3 (inclusive), while the other two can be 0 - 7 (inclusive). The maximum value is \377.
**For hexadecimal values, the variable d can be 0 - 9 (inclusive) and A - F (inclusive, either upper or lower case). The maximum value is \xFF.
You can specify multiple delimiters at a given level; for example, if you specify |, ~, and ^ as delimiters for a specific level, the parser will accept any of these delimiters:
root
element (delimiters = “|”, “~”, “^”)
field_1 (delimiter = “#”)
field_2 (delimiters = “|”, “~”, “^”)
This will successfully parse the data abc|def, abc~def, and abc^def.
Figure 13 Multiple Delimiter Example
Anchored delimiters must be the starting and ending characters of the specified element.
Begin delimiters mark the beginning of a fixed-length field, whereas end delimiters mark the end of a field. Usually, the term “delimiter” by itself refers to an end delimiter. We use the term “end delimiters” for clarification when begin delimiters are also present.
Begin delimiters are used to signify the beginning of a fixed-length data field. Since the data field is of fixed length, no delimiter is required to mark the end of the field. Use the Begin Delimiter or Begin Delimiter Detached property to specify it.
Constant delimiters remain unchanged at runtime. Embedded delimiters are embedded in the data, and thus are determined dynamically at runtime. Standard embedded delimiters are specified by the Offset and Length delimiter properties, while embedded begin delimiters are specified by the BegOffset and BegLength delimiter properties.