You can define a set of delimiters — a delimiter list — for any node in the hierarchical data structure. This delimiter list is used in the external data representation for that node and its descendents. A delimiter list defined for any non-root node overrides the effect of any ancestor node’s delimiter list on both the node itself and its descendents.
Delimiters are defined using the Delimiter List Editor, as illustrated in the following figure. The editor is invoked by clicking the delim property value field in the node's property dialog box and clicking the ellipsis (…) button, or by double-clicking the field. See Defining a Delimiter List for additional information.
Clicking within a field in the Delimiter List Editor enables the field for editing. After typing a value into a field, you must press Enter to set the value. Clicking the drop-down menu button in one of the following three fields displays its menu, as illustrated in the following figure.
Type
Optional
Terminator
Command |
Action |
---|---|
Add Level |
Adds a new level after the selected level. |
Add Delimiter |
Adds a new delimiter after the selected delimiter, or to the bottom of list under the selected level. |
Remove |
Deletes the selected line item (level or delimiter) from the list. |
Remove All |
Deletes all items (levels and delimiters) from the list. |
OK |
Saves your entries and closes the editor. |
Cancel |
Discards your entries and closes the editor. |
Property |
Description |
---|---|
Level |
Assigns consecutive sets of delimiter parameters to delimited nodes in the Encoder node hierarchy. See Delimiter Levels for additional information. |
Type |
Specifies how the delimiter is used. See Delimiter Type for additional information. |
Precedence |
Indicates the priority of a certain delimiter, relative to other delimiters. See Precedence for additional information. |
Optional |
Specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty. See Optional for additional information. Note – Does not apply to children of choice element nodes. |
Terminator |
Specifies how delimiters are to be handled for a specific terminator node in the Encoder tree. See Terminator for additional information. |
Bytes |
Specifies the characters (bytes) to use to end the delimited data for the specified level. Delimiters can have begin bytes, end bytes, or both. The term “bytes” (by itself) always indicates end bytes. See Delimiter Characters (Bytes) for additional information. |
Offset |
Offset of the delimited data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is 0. |
Length |
Length of the data field in bytes, if it is of fixed length. Value must be positive integer. Entering a value clears the Bytes field |
Detached |
When checked, indicates that the specified delimiter is a detached, or non-anchored, delimiter, and does not have to appear at a fixed position. |
BegBytes |
Specifies the characters (bytes) to use to begin the delimited data for the specified level. Delimiters can have begin bytes, end bytes, or both. See Delimiter Characters (Bytes) for additional information. |
BegOffset |
Offset of the fixed-length data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is 0. |
BegLength |
Length of the data field in bytes, if it is of fixed length and has a beginning delimiter. Value must be positive integer. Entering a value clears the Bytes field |
BegDetached |
When checked, indicates that the specified delimiter is a detached (non-anchored) beginning delimiter, and does not have to appear at a fixed position. |
Skip |
When checked, skips identical leading delimiters. The delimiters may be defined either as begin bytes or end bytes. The purpose of this flag is to facilitate parsing tabular data. |
Collapse |
When checked, collapses identical, consecutive end delimiters into a single delimiter. As with the Skip flag, the purpose of this flag is to facilitate parsing tabular data. |
Delimiter levels are assigned in order to those hierarchical levels of an Encoder that contain at least one node that is specified as being delimited. If none of the nodes at a particular hierarchical level is delimited, that hierarchical level is skipped in assigning delimiter levels.
Delimiter lists are typically specified on the root node, so that the list applies to the entire Encoder. The root node itself is typically not delimited, so that Level 1 would apply to those nodes that are children of the root node. See the following figure and example.
For example, if you want to parse the following data:
a^b|c^d|e |
you might create a Custom Encoder as follows:
In this example, the delimiter list is specified on the root node, which is not delimited; therefore, the list has two levels:
The Level 1 delimiter (|) applies to element_1, element_2, and field_5. The Level 2 delimiter (^) applies to field_1 - field_4.
If the root node is set to be delimited, the Level 1 delimiters will then apply to it. Using the above example, the Level 2 delimiter (^) would then apply to element_1, element_2, and field_5, and a new Level 3 delimiter would apply to field_1 - field_4.
Delimiter lists can be much more complex than this very simple example. For instance, you can create multiple delimiters of different types at any given level, and you can specify a delimiter list on any node within the Encoder— not only the root node as shown in the example. See Defining a Delimiter List for a step-by-step description of the procedure for creating a Delimiter List.
The Delimiter Type property specifies whether the delimiter is a terminator at the end of the byte sequence (normal), a separator between byte sequences in an array (repeat) or an escape sequence.
Table 10 Delimiter Type Options
Option |
Description |
---|---|
normal |
Indicates the delimiter is a normal delimiter. |
repeat |
Indicates the delimiter is a delimiter that delimits repetitive fields (nodes). If a node is defined to be repetitive, then a repeat delimiter can be used to delimit the repetitive occurrences, while a normal delimiter terminates the repitition. For example, a^b^c1~c2~c3~c4~c5^ where '~' is a delimiter that delimits repetitive nodes and '^' is a normal delimiter that terminates repetitive nodes. |
escape |
Indicates the delimiter is an escape delimiter. The purpose of escape delimiter is to escape special bytes , such as delimiters, using predefined escape sequences. Once the bytes of the escape delimiter are matched, no action is taken except that the search is continued at the position immediately following the delimiter bytes. |
quot-esc |
The quot-esc delimiter is used to escape special bytes using quotation style escaping, that is, whatever appears within the (double) quotes is escaped. For example, assume that ',' (comma) is a normal delimiter. To escape ',' in the data, either we can use an escape sequence such as <data>\,<data> or we can use quotation marks such as "<data>,<data>". The bytes defined in the quot-escape delimiter represent the quotation marks. |
An escape delimiter is simply a sequence that is recognized and ignored during parsing. Its purpose is to allow the use of escape sequences to embed byte sequences in data that would otherwise be seen as delimiter occurrences.
For example, if there is a normal delimiter “+” at a given level, and we define an escape delimiter “\+” as shown in the following figure, then aaa+b\+c+ddd will parse as three fields: aaa, b\+c, and ddd. If the escape delimiter were not defined, the sequence would then parse as four fields: aaa, b\, c, and ddd.
If there is only an escape delimiter on a given level, however, it presents a no delimiter defined situation for delim and array nodes.
Precedence indicates the priority of a certain delimiter, relative to the other delimiters. Precedence is used to resolve delimiter conflicts when one delimiter is a copy or prefix of another. In case of equal precedence, the innermost prevails.
By default, all delimiters are at precedence 10, which means they are all considered the same; fixed fields are hard-coded at precedence 10. Delimiters on parent nodes are not considered when parsing the child fields; only the child’s delimiter (or if it is a fixed field, its length). The range of valid precedence values is from 1 to 100, inclusive. The higher the value, the higher the precedence. Delimiters with higher precedence have a greater chance to be matched.
Changing the precedence of a delimiter will cause it to be applied to the input data-stream in different ways. For example:
root
element (type delim, delimiter = “^”, repeat)
field_1 (type fixed, length = 5)
field_2 (type fixed, length = 8, optional)
Although this will parse ”abcde12345678^zyxvuABCDEFGH’, it will not parse the text ”abcde^zyxvuABCDEFGH’ even though the second fixed field is optional. The reason is that the element’s delimiter is ignored within the fixed field because they have the same precedence. If you want the element’s delimiter to be examined within the fixed field data, you must change its precedence, for example:
root
element (type delim, delimiter = “^”, repeat, precedence = 11)
field_1 (type fixed, length = 5)
field_2 (type fixed, length = 8, optional)
This will successfully parse the text ”abcde^zyxvuABCDEFGH’.
A similar argument can be applied to delimited child nodes. The parser normally attempts to match the child delimiter— setting the precedence to 11 forces the parser to match the parent delimiter first.
The Optional property specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty.
Table 11 Optional Mode Options
Option |
Rule |
---|---|
never |
If the node is absent, the delimiter is not allowed in either input or output. |
allow |
If the node is absent, the delimiter is allowed in input but will not be emitted in output. |
cheer |
If the node is absent, the delimiter is allowed in input and will also be emitted in output. |
force |
If the node is absent, the delimiter must appear in input and will be emitted in output. Note – Only this option allows trailing delimiters for a sequence of absent optional nodes. |
As illustrative examples, consider the tree structures shown in the following figure and table, where the node a has a caret (^) as its delimiter, and the child nodes b, c, and d all have asterisks (*) as their delimiters.
Example 1: Child node c is optional. (Child nodes c and d must have different values for the match parameter.)
Option |
Input |
Output |
---|---|---|
never |
b*d^ |
b*d^ |
allow |
b**d^ |
b*d^ |
cheer |
b**d^ |
b**d^ |
force |
b**d^ |
b**d^ |
Example 2: Child nodes c and d are both optional.
Option |
Input |
Output |
---|---|---|
never |
b^ |
b^ |
allow |
b^, b*^, or b**^ |
b^ |
cheer |
b^, b*^, or b**^ |
b**^ |
force |
b**^ |
b**^ |
The Terminator property specifies whether or not the delimiter should appear for the last node of current level. For example, it determines whether data should look like "a,b,|" or "a,b|" assuming "," (comma) is the current level delimiter and "|" (pipe) is the parent level delimiter.
The delimiter that terminates the last child of the level in question is referred to as the terminator.
Table 12 Terminator Mode Options
Option |
Rule |
---|---|
never |
Specifies that the delimiter is not allowed to be a terminator in input and will not be emitted as terminator in output. |
allow |
Specifies that the delimiter is allowed to be a terminator in input but will not be emitted as terminator in output. |
cheer |
Specifies that the delimiter is allowed to be a terminator in input and will be emitted as terminator in output. |
force |
Specifies that the delimiter must appear as a terminator in input and will also be emitted as terminator in output. |
Consider the tree structure shown in the following figure, where the node a has a caret (^) as its delimiter, and its child nodes b and c have asterisks (*) as their delimiters.
Option |
Input |
Output |
---|---|---|
never |
c^ |
c^ |
allow |
c^ or c*^ |
c^ |
cheer |
c^ or c*^ |
c*^ |
force |
c*^ |
c*^ |
There is essentially no limitation on what characters you can use as delimiters; however, you obviously want to avoid characters that can be confused with data or interfere with escape sequences, as described in Escape Option. The backslash (\) is normally used as an escape character; for example, the HL7 protocol uses a double backslash as part of an escape sequence that provides special text formatting instructions. Additionally, a colon ( :) is used as a literal in system-generated time strings. This can interfere with recovery procedures, for example following a Domain shutdown.
Use a backslash (\) to escape special characters. The following table lists the currently supported escape sequences.
Table 13 Escape Sequences
Sequence |
Description |
|
---|---|---|
\ |
\ |
Backslash |
\ |
b |
Backspace |
\ |
f |
Linefeed |
\ |
n |
Newline |
\ |
r |
Carriage return |
\ |
t |
Tab |
\ |
ddd |
Octal number* |
\ |
xdd |
Hexadecimal number** |
*For octal values, the leading variable d can only be 0 - 3 (inclusive), while the other two can be 0 - 7 (inclusive). The maximum value is \377.
**For hexadecimal values, the variable d can be 0 - 9 (inclusive) and A - F (inclusive, either upper or lower case). The maximum value is \xFF.
You can specify multiple delimiters at a given level; for example, if you specify |, ~, and ^ as delimiters for a specific level, the parser will accept any of these delimiters:
root
element (delimiters = “|”, “~”, “^”)
field_1 (delimiter = “#”)
field_2 (delimiters = “|”, “~”, “^”)
This will successfully parse the data abc|def, abc~def, and abc^def.
Anchored delimiters must be the starting and ending characters of the specified element.
Begin delimiters mark the beginning of a fixed-length field, whereas end delimiters mark the end of a field. Usually, the term “delimiter” by itself refers to an end delimiter. We use the term “end delimiters” for clarification when begin delimiters are also present.
Begin delimiters are used to signify the beginning of a fixed-length data field. Since the data field is of fixed length, no delimiter is required to mark the end of the field. Use the Begin Delimiter or Begin Delimiter Detached property to specify it.
Constant delimiters remain unchanged at runtime. Embedded delimiters are embedded in the data, and thus are determined dynamically at runtime. Standard embedded delimiters are specified by the Offset and Length delimiter properties, while embedded begin delimiters are specified by the BegOffset and BegLength delimiter properties.