Designing Custom Encoders

Specifying Delimiters

Delimiter List

You can define a set of delimiters — a delimiter list — for any node in the hierarchical data structure. This delimiter list is used in the external data representation for that node and its descendents. A delimiter list defined for any non-root node overrides the effect of any ancestor node’s delimiter list on both the node itself and its descendents.

Delimiters are defined using the Delimiter List Editor, as illustrated in the following figure. The editor is invoked by clicking the delim property value field in the node's property dialog box and clicking the ellipsis (…) button, or by double-clicking the field. See Defining a Delimiter List for additional information.

Clicking within a field in the Delimiter List Editor enables the field for editing. After typing a value into a field, you must press Enter to set the value. Clicking the drop-down menu button in one of the following three fields displays its menu, as illustrated in the following figure.

Figure 6 Delimiter List Editor: Left Side

Image of Delimiter List Editor (Left Side).

Figure 7 Delimiter List Editor: Right Side

Image of Delimiter List Editor (Right Side).

Table 8 Delimiter List Editor Command Buttons

Command 

Action 

Add Level 

Adds a new level after the selected level. 

Add Delimiter 

Adds a new delimiter after the selected delimiter, or to the bottom of list under the selected level. 

Remove 

Deletes the selected line item (level or delimiter) from the list. 

Remove All 

Deletes all items (levels and delimiters) from the list. 

OK 

Saves your entries and closes the editor. 

Cancel 

Discards your entries and closes the editor. 

Delimiter Properties

Table 9 Delimiter Properties

Property 

Description 

Level 

Assigns consecutive sets of delimiter parameters to delimited nodes in the Encoder node hierarchy. See Delimiter Levels for additional information.

Type 

Specifies how the delimiter is used. See Delimiter Type for additional information.

Precedence 

Indicates the priority of a certain delimiter, relative to other delimiters. See Precedence for additional information.

Optional 

Specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty. See Optional for additional information.


Note –

Does not apply to children of choice element nodes.


Terminator 

Specifies how delimiters are to be handled for a specific terminator node in the Encoder tree. See Terminator for additional information.

Bytes 

Specifies the characters (bytes) to use to end the delimited data for the specified level. Delimiters can have begin bytes, end bytes, or both. The term “bytes” (by itself) always indicates end bytes. See Delimiter Characters (Bytes) for additional information.

Offset 

Offset of the delimited data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is 0.

Length 

Length of the data field in bytes, if it is of fixed length. Value must be positive integer. Entering a value clears the Bytes field

Detached 

When checked, indicates that the specified delimiter is a detached, or non-anchored, delimiter, and does not have to appear at a fixed position. 

BegBytes 

Specifies the characters (bytes) to use to begin the delimited data for the specified level. Delimiters can have begin bytes, end bytes, or both. See Delimiter Characters (Bytes) for additional information.

BegOffset 

Offset of the fixed-length data field in bytes from the beginning of the data stream (byte 0). Value must be a non-negative integer; the default is 0.

BegLength 

Length of the data field in bytes, if it is of fixed length and has a beginning delimiter. Value must be positive integer. Entering a value clears the Bytes field

BegDetached 

When checked, indicates that the specified delimiter is a detached (non-anchored) beginning delimiter, and does not have to appear at a fixed position. 

Skip 

When checked, skips identical leading delimiters. The delimiters may be defined either as begin bytes or end bytes. The purpose of this flag is to facilitate parsing tabular data. 

Collapse 

When checked, collapses identical, consecutive end delimiters into a single delimiter. As with the Skip flag, the purpose of this flag is to facilitate parsing tabular data.

Delimiter Levels

Delimiter levels are assigned in order to those hierarchical levels of an Encoder that contain at least one node that is specified as being delimited. If none of the nodes at a particular hierarchical level is delimited, that hierarchical level is skipped in assigning delimiter levels.

Delimiter lists are typically specified on the root node, so that the list applies to the entire Encoder. The root node itself is typically not delimited, so that Level 1 would apply to those nodes that are children of the root node. See the following figure and example.

Figure 8 Encoder Hierarchical and Delimiter Levels

Diagram illustrating delimiter levels as described
in content.

For example, if you want to parse the following data:


a^b|c^d|e

you might create a Custom Encoder as follows:

In this example, the delimiter list is specified on the root node, which is not delimited; therefore, the list has two levels:

The Level 1 delimiter (|) applies to element_1, element_2, and field_5. The Level 2 delimiter (^) applies to field_1 - field_4.

If the root node is set to be delimited, the Level 1 delimiters will then apply to it. Using the above example, the Level 2 delimiter (^) would then apply to element_1, element_2, and field_5, and a new Level 3 delimiter would apply to field_1 - field_4.

Delimiter lists can be much more complex than this very simple example. For instance, you can create multiple delimiters of different types at any given level, and you can specify a delimiter list on any node within the Encoder— not only the root node as shown in the example. See Defining a Delimiter List for a step-by-step description of the procedure for creating a Delimiter List.

Delimiter Type

The Delimiter Type property specifies whether the delimiter is a terminator at the end of the byte sequence (normal), a separator between byte sequences in an array (repeat) or an escape sequence.

Table 10 Delimiter Type Options

Option 

Description 

normal 

Indicates the delimiter is a normal delimiter. 

repeat 

Indicates the delimiter is a delimiter that delimits repetitive fields (nodes). If a node is defined to be repetitive, then a repeat delimiter can be used to delimit the repetitive occurrences, while a normal delimiter terminates the repitition. For example, a^b^c1~c2~c3~c4~c5^ where '~' is a delimiter that delimits repetitive nodes and '^' is a normal delimiter that terminates repetitive nodes.

escape 

Indicates the delimiter is an escape delimiter. The purpose of escape delimiter is to escape special bytes , such as delimiters, using predefined escape sequences. Once the bytes of the escape delimiter are matched, no action is taken except that the search is continued at the position immediately following the delimiter bytes.  

quot-esc 

The quot-esc delimiter is used to escape special bytes using quotation style escaping, that is, whatever appears within the (double) quotes is escaped. For example, assume that ',' (comma) is a normal delimiter. To escape ',' in the data, either we can use an escape sequence such as <data>\,<data> or we can use quotation marks such as "<data>,<data>". The bytes defined in the quot-escape delimiter represent the quotation marks.

Escape Option

An escape delimiter is simply a sequence that is recognized and ignored during parsing. Its purpose is to allow the use of escape sequences to embed byte sequences in data that would otherwise be seen as delimiter occurrences.

For example, if there is a normal delimiter “+” at a given level, and we define an escape delimiter “\+” as shown in the following figure, then aaa+b\+c+ddd will parse as three fields: aaa, b\+c, and ddd. If the escape delimiter were not defined, the sequence would then parse as four fields: aaa, b\, c, and ddd.

Figure 9 Delimiter Type - Escape

Image of Delimiter List Editor, illustrating
Escape specification.

If there is only an escape delimiter on a given level, however, it presents a no delimiter defined situation for delim and array nodes.

Precedence

Precedence indicates the priority of a certain delimiter, relative to the other delimiters. Precedence is used to resolve delimiter conflicts when one delimiter is a copy or prefix of another. In case of equal precedence, the innermost prevails.

By default, all delimiters are at precedence 10, which means they are all considered the same; fixed fields are hard-coded at precedence 10. Delimiters on parent nodes are not considered when parsing the child fields; only the child’s delimiter (or if it is a fixed field, its length). The range of valid precedence values is from 1 to 100, inclusive. The higher the value, the higher the precedence. Delimiters with higher precedence have a greater chance to be matched.

Changing the precedence of a delimiter will cause it to be applied to the input data-stream in different ways. For example:

Optional

The Optional property specifies how delimiters for optional nodes are to be handled when the nodes are absent from the input instance or when their fields are empty.

Table 11 Optional Mode Options

Option 

Rule 

never 

If the node is absent, the delimiter is not allowed in either input or output. 

allow 

If the node is absent, the delimiter is allowed in input but will not be emitted in output. 

cheer 

If the node is absent, the delimiter is allowed in input and will also be emitted in output. 

force 

If the node is absent, the delimiter must appear in input and will be emitted in output. 


Note –

Only this option allows trailing delimiters for a sequence of absent optional nodes.


As illustrative examples, consider the tree structures shown in the following figure and table, where the node a has a caret (^) as its delimiter, and the child nodes b, c, and d all have asterisks (*) as their delimiters.

Figure 10 Optional Mode Property (Example 1)

Diagram of tree structure as described in content.

Option 

Input 

Output 

never 

b*d^

b*d^

allow 

b**d^

b*d^

cheer 

b**d^

b**d^

force 

b**d^

b**d^

Figure 11 Optional Mode Property (Example 2)

Diagram of tree structure as described in content.

Option 

Input 

Output 

never 

b^

b^

allow 

b^, b*^, or b**^

b^

cheer 

b^, b*^, or b**^

b**^

force 

b**^

b**^

Terminator

The Terminator property specifies whether or not the delimiter should appear for the last node of current level. For example, it determines whether data should look like "a,b,|" or "a,b|" assuming "," (comma) is the current level delimiter and "|" (pipe) is the parent level delimiter.

The delimiter that terminates the last child of the level in question is referred to as the terminator.

Table 12 Terminator Mode Options

Option 

Rule 

never 

Specifies that the delimiter is not allowed to be a terminator in input and will not be emitted as terminator in output.  

allow 

Specifies that the delimiter is allowed to be a terminator in input but will not be emitted as terminator in output. 

cheer 

Specifies that the delimiter is allowed to be a terminator in input and will be emitted as terminator in output. 

force 

Specifies that the delimiter must appear as a terminator in input and will also be emitted as terminator in output. 

Consider the tree structure shown in the following figure, where the node a has a caret (^) as its delimiter, and its child nodes b and c have asterisks (*) as their delimiters.

Figure 12 Terminator Mode Property Example

Diagram of tree structure as described in content.

Option 

Input 

Output 

never 

c^

c^

allow 

c^ or c*^

c^

cheer 

c^ or c*^

c*^

force 

c*^

c*^

Delimiter Characters (Bytes)


Note –

There is essentially no limitation on what characters you can use as delimiters; however, you obviously want to avoid characters that can be confused with data or interfere with escape sequences, as described in Escape Option. The backslash (\) is normally used as an escape character; for example, the HL7 protocol uses a double backslash as part of an escape sequence that provides special text formatting instructions. Additionally, a colon ( :) is used as a literal in system-generated time strings. This can interfere with recovery procedures, for example following a Domain shutdown.


Escape Sequences

Use a backslash (\) to escape special characters. The following table lists the currently supported escape sequences.

Table 13 Escape Sequences

Sequence 

Description 

Backslash 

Backspace 

Linefeed 

Newline 

Carriage return 

Tab 

ddd 

Octal number* 

xdd 

Hexadecimal number** 

*For octal values, the leading variable d can only be 0 - 3 (inclusive), while the other two can be 0 - 7 (inclusive). The maximum value is \377.

**For hexadecimal values, the variable d can be 0 - 9 (inclusive) and A - F (inclusive, either upper or lower case). The maximum value is \xFF.

Multiple Delimiters

You can specify multiple delimiters at a given level; for example, if you specify |, ~, and ^ as delimiters for a specific level, the parser will accept any of these delimiters:

Figure 13 Multiple Delimiter Example

Image of Delimiter List Editor illustrating multiple
delimiters.

Anchored and Detached Delimiters

Anchored delimiters must be the starting and ending characters of the specified element.

Begin and End Delimiters

Begin delimiters mark the beginning of a fixed-length field, whereas end delimiters mark the end of a field. Usually, the term “delimiter” by itself refers to an end delimiter. We use the term “end delimiters” for clarification when begin delimiters are also present.

Begin delimiters are used to signify the beginning of a fixed-length data field. Since the data field is of fixed length, no delimiter is required to mark the end of the field. Use the Begin Delimiter or Begin Delimiter Detached property to specify it.

Constant and Embedded Delimiters

Constant delimiters remain unchanged at runtime. Embedded delimiters are embedded in the data, and thus are determined dynamically at runtime. Standard embedded delimiters are specified by the Offset and Length delimiter properties, while embedded begin delimiters are specified by the BegOffset and BegLength delimiter properties.