Designing Custom Encoders

Specifying Delimiters

Delimiter List

You can define a set of delimiters — a delimiter list — for any node in the hierarchical data structure. This delimiter list is used in the external data representation for that node and its descendents. A delimiter list defined for any non-root node overrides the effect of any ancestor node’s delimiter list on both the node itself and its descendents.

Delimiters are defined using the Delimiter List Editor, as illustrated in the following figure. The editor is invoked by clicking the delim property value field in the node's property dialog box and clicking the ellipsis (…) button, or by double-clicking the field. See Defining a Delimiter List for additional information.

Clicking within a field in the Delimiter List Editor enables the field for editing. After typing a value into a field, you must press Enter to set the value. Clicking the drop-down menu button in one of the following three fields displays its menu, as illustrated in the following figure.

Type
Optional
Terminator

Figure 6 Delimiter List Editor: Left Side

Image of Delimiter List Editor (Left Side).

Figure 7 Delimiter List Editor: Right Side

Image of Delimiter List Editor (Right Side).

Table 8 Delimiter List Editor Command Buttons


Command	Action
Add Level	Adds a new level after the selected level.
Add Delimiter	Adds a new delimiter after the selected delimiter, or to the bottom of list under the selected level.
Remove	Deletes the selected line item (level or delimiter) from the list.
Remove All	Deletes all items (levels and delimiters) from the list.
OK	Saves your entries and closes the editor.
Cancel	Discards your entries and closes the editor.

Delimiter Properties

Table 9 Delimiter Properties


Property	Description
Level	Assigns consecutive sets of delimiter parameters to delimited nodes in the Encoder node hierarchy. See Delimiter Levels for additional information.
Type	Specifies how the delimiter is used. See Delimiter Type for additional information.
Precedence	Indicates the priority of a certain delimiter, relative to other delimiters. See Precedence for additional information.
Optional	Specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty. See Optional for additional information. Note – Does not apply to children of choice element nodes.
Terminator	Specifies how delimiters are to be handled for a specific terminator node in the Encoder tree. See Terminator for additional information.
Bytes	Specifies the characters (bytes) to use to end the delimited data for the specified level. Delimiters can have begin bytes, end bytes, or both. The term “bytes” (by itself) always indicates end bytes. See Delimiter Characters (Bytes) for additional information.
Offset	Offset of the delimited data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is `0`.
Length	Length of the data field in bytes, if it is of fixed length. Value must be positive integer. Entering a value clears the `Bytes` field
Detached	When checked, indicates that the specified delimiter is a detached, or non-anchored, delimiter, and does not have to appear at a fixed position.
BegBytes	Specifies the characters (bytes) to use to begin the delimited data for the specified level. Delimiters can have begin bytes, end bytes, or both. See Delimiter Characters (Bytes) for additional information.
BegOffset	Offset of the fixed-length data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is `0`.
BegLength	Length of the data field in bytes, if it is of fixed length and has a beginning delimiter. Value must be positive integer. Entering a value clears the `Bytes` field
BegDetached	When checked, indicates that the specified delimiter is a detached (non-anchored) beginning delimiter, and does not have to appear at a fixed position.
Skip	When checked, skips identical leading delimiters. The delimiters may be defined either as begin bytes or end bytes. The purpose of this flag is to facilitate parsing tabular data.
Collapse	When checked, collapses identical, consecutive end delimiters into a single delimiter. As with the `Skip` flag, the purpose of this flag is to facilitate parsing tabular data.

Delimiter Levels

Delimiter levels are assigned in order to those hierarchical levels of an Encoder that contain at least one node that is specified as being delimited. If none of the nodes at a particular hierarchical level is delimited, that hierarchical level is skipped in assigning delimiter levels.

Delimiter lists are typically specified on the root node, so that the list applies to the entire Encoder. The root node itself is typically not delimited, so that Level 1 would apply to those nodes that are children of the root node. See the following figure and example.

Figure 8 Encoder Hierarchical and Delimiter Levels

Diagram illustrating delimiter levels as described
in content.

For example, if you want to parse the following data:

a^b|c^d|e

you might create a Custom Encoder as follows:

root
- element_1
  - field_1
  - field_2
- element_2
  - field_3
  - field_4
- field_5

In this example, the delimiter list is specified on the root node, which is not delimited; therefore, the list has two levels:

Level 1
- Delimiter |
Level 2
- Delimiter ^

The Level 1 delimiter (|) applies to element_1, element_2, and field_5. The Level 2 delimiter (^) applies to field_1 - field_4.

If the root node is set to be delimited, the Level 1 delimiters will then apply to it. Using the above example, the Level 2 delimiter (^) would then apply to element_1, element_2, and field_5, and a new Level 3 delimiter would apply to field_1 - field_4.

Delimiter lists can be much more complex than this very simple example. For instance, you can create multiple delimiters of different types at any given level, and you can specify a delimiter list on any node within the Encoder— not only the root node as shown in the example. See Defining a Delimiter List for a step-by-step description of the procedure for creating a Delimiter List.

Delimiter Type

The Delimiter Type property specifies whether the delimiter is a terminator at the end of the byte sequence (normal), a separator between byte sequences in an array (repeat) or an escape sequence.

Table 10 Delimiter Type Options


Option	Description
normal	Indicates the delimiter is a normal delimiter.
repeat	Indicates the delimiter is a delimiter that delimits repetitive fields (nodes). If a node is defined to be repetitive, then a repeat delimiter can be used to delimit the repetitive occurrences, while a normal delimiter terminates the repitition. For example, `a^b^c1~c2~c3~c4~c5^` where '~' is a delimiter that delimits repetitive nodes and '^' is a normal delimiter that terminates repetitive nodes.
escape	Indicates the delimiter is an escape delimiter. The purpose of escape delimiter is to escape special bytes , such as delimiters, using predefined escape sequences. Once the bytes of the escape delimiter are matched, no action is taken except that the search is continued at the position immediately following the delimiter bytes.
quot-esc	The quot-esc delimiter is used to escape special bytes using quotation style escaping, that is, whatever appears within the (double) quotes is escaped. For example, assume that ',' (comma) is a normal delimiter. To escape ',' in the data, either we can use an escape sequence such as `<data>\,<data>` or we can use quotation marks such as `"<data>,<data>"`. The bytes defined in the quot-escape delimiter represent the quotation marks.

Escape Option

An escape delimiter is simply a sequence that is recognized and ignored during parsing. Its purpose is to allow the use of escape sequences to embed byte sequences in data that would otherwise be seen as delimiter occurrences.

For example, if there is a normal delimiter “+” at a given level, and we define an escape delimiter “\+” as shown in the following figure, then aaa+b\+c+ddd will parse as three fields: aaa, b\+c, and ddd. If the escape delimiter were not defined, the sequence would then parse as four fields: aaa, b\, c, and ddd.

Figure 9 Delimiter Type - Escape

Image of Delimiter List Editor, illustrating
Escape specification.

If there is only an escape delimiter on a given level, however, it presents a no delimiter defined situation for delim and array nodes.

Precedence

Precedence indicates the priority of a certain delimiter, relative to the other delimiters. Precedence is used to resolve delimiter conflicts when one delimiter is a copy or prefix of another. In case of equal precedence, the innermost prevails.

By default, all delimiters are at precedence 10, which means they are all considered the same; fixed fields are hard-coded at precedence 10. Delimiters on parent nodes are not considered when parsing the child fields; only the child’s delimiter (or if it is a fixed field, its length). The range of valid precedence values is from 1 to 100, inclusive. The higher the value, the higher the precedence. Delimiters with higher precedence have a greater chance to be matched.

Changing the precedence of a delimiter will cause it to be applied to the input data-stream in different ways. For example:

root
- element (type delim, delimiter = “^”, repeat)
- field_1 (type fixed, length = 5)
- field_2 (type fixed, length = 8, optional)
  
  Although this will parse ”abcde12345678^zyxvuABCDEFGH’, it will not parse the text ”abcde^zyxvuABCDEFGH’ even though the second fixed field is optional. The reason is that the element’s delimiter is ignored within the fixed field because they have the same precedence. If you want the element’s delimiter to be examined within the fixed field data, you must change its precedence, for example:
root
- element (type delim, delimiter = “^”, repeat, precedence = 11)
- field_1 (type fixed, length = 5)
- field_2 (type fixed, length = 8, optional)
  
  This will successfully parse the text ”abcde^zyxvuABCDEFGH’.
  
  A similar argument can be applied to delimited child nodes. The parser normally attempts to match the child delimiter— setting the precedence to 11 forces the parser to match the parent delimiter first.

Optional

The Optional property specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty.

Table 11 Optional Mode Options


Option	Rule
never	If the node is absent, the delimiter is not allowed in either input or output.
allow	If the node is absent, the delimiter is allowed in input but will not be emitted in output.
cheer	If the node is absent, the delimiter is allowed in input and will also be emitted in output.
force	If the node is absent, the delimiter must appear in input and will be emitted in output. Note – Only this option allows trailing delimiters for a sequence of absent optional nodes.

As illustrative examples, consider the tree structures shown in the following figure and table, where the node a has a caret (^) as its delimiter, and the child nodes b, c, and d all have asterisks (*) as their delimiters.

Example 1: Child node c is optional. (Child nodes c and d must have different values for the match parameter.)

Figure 10 Optional Mode Property (Example 1)

Diagram of tree structure as described in content.

Option	Input	Output
never	*bd^**	*bd^**
allow	bd^**	*bd^**
cheer	bd^**	bd^**
force	bd^**	bd^**

Example 2: Child nodes c and d are both optional.

Figure 11 Optional Mode Property (Example 2)

Option	Input	Output
never	b^	b^
allow	b^, *b^, or b^	b^
cheer	b^, *b^, or b^	b^**
force	b^**	b^**

Terminator

The Terminator property specifies whether or not the delimiter should appear for the last node of current level. For example, it determines whether data should look like "a,b,|" or "a,b|" assuming "," (comma) is the current level delimiter and "|" (pipe) is the parent level delimiter.

The delimiter that terminates the last child of the level in question is referred to as the terminator.

Table 12 Terminator Mode Options


Option	Rule
never	Specifies that the delimiter is not allowed to be a terminator in input and will not be emitted as terminator in output.
allow	Specifies that the delimiter is allowed to be a terminator in input but will not be emitted as terminator in output.
cheer	Specifies that the delimiter is allowed to be a terminator in input and will be emitted as terminator in output.
force	Specifies that the delimiter must appear as a terminator in input and will also be emitted as terminator in output.

Consider the tree structure shown in the following figure, where the node a has a caret (^) as its delimiter, and its child nodes b and c have asterisks (*) as their delimiters.

Figure 12 Terminator Mode Property Example

Option	Input	Output
never	c^	c^
allow	c^ or *c^**	c^
cheer	c^ or *c^**	*c^**
force	*c^**	*c^**

Delimiter Characters (Bytes)

Note –

There is essentially no limitation on what characters you can use as delimiters; however, you obviously want to avoid characters that can be confused with data or interfere with escape sequences, as described in Escape Option. The backslash (\) is normally used as an escape character; for example, the HL7 protocol uses a double backslash as part of an escape sequence that provides special text formatting instructions. Additionally, a colon ( :) is used as a literal in system-generated time strings. This can interfere with recovery procedures, for example following a Domain shutdown.

Escape Sequences

Use a backslash (\) to escape special characters. The following table lists the currently supported escape sequences.

Table 13 Escape Sequences


Sequence		Description
\	\	Backslash
\	b	Backspace
\	f	Linefeed
\	n	Newline
\	r	Carriage return
\	t	Tab
\	ddd	Octal number*
\	xdd	Hexadecimal number**

*For octal values, the leading variable d can only be 0 - 3 (inclusive), while the other two can be 0 - 7 (inclusive). The maximum value is \377.

**For hexadecimal values, the variable d can be 0 - 9 (inclusive) and A - F (inclusive, either upper or lower case). The maximum value is \xFF.

Multiple Delimiters

You can specify multiple delimiters at a given level; for example, if you specify |, ~, and ^ as delimiters for a specific level, the parser will accept any of these delimiters:

root
- element (delimiters = “|”, “~”, “^”)
- field_1 (delimiter = “#”)
- field_2 (delimiters = “|”, “~”, “^”)
  
  This will successfully parse the data abc|def, abc~def, and abc^def.

Figure 13 Multiple Delimiter Example

Image of Delimiter List Editor illustrating multiple
delimiters.

Anchored and Detached Delimiters

Anchored delimiters must be the starting and ending characters of the specified element.

Begin and End Delimiters

Begin delimiters mark the beginning of a fixed-length field, whereas end delimiters mark the end of a field. Usually, the term “delimiter” by itself refers to an end delimiter. We use the term “end delimiters” for clarification when begin delimiters are also present.

Begin delimiters are used to signify the beginning of a fixed-length data field. Since the data field is of fixed length, no delimiter is required to mark the end of the field. Use the Begin Delimiter or Begin Delimiter Detached property to specify it.

Constant and Embedded Delimiters

Constant delimiters remain unchanged at runtime. Embedded delimiters are embedded in the data, and thus are determined dynamically at runtime. Standard embedded delimiters are specified by the Offset and Length delimiter properties, while embedded begin delimiters are specified by the BegOffset and BegLength delimiter properties.