Chapter 2 Implementing the AttrParse Object

The AttrParse object encapsulates a grammar used to parse user listings. It is used primarily by mainframe-based resource adapters that receive a screen of data at a time and must parse out the desired results. (This technique is often called screen scraping.) The Shell Script and Scripted Gateway adapters also use AttrParse with getUser and getAllUsers actions.

The adapters that use the AttrParse object model the screen as a Java string. An instantiation of an AttrParse object contains one or more tokens. Each token defines a portion of the screen. These tokens are used to “tokenize” the screen string and allow the adapters to discover the user properties from the user listing.

After parsing a user listing, AttrParse returns a map of user attribute name/value pairs.

Configuration

As with all other Identity Manager objects, the AttrParse objects are serialized to XML for persistent storage. AttrParse objects can then be configured to support differences in customer environments. For example, the ACF2 mainframe security system is often customized to include additional fields and field lengths. Since AttrParse objects reside in the repository, they can be changed and configured to account for these differences without requiring that a custom adapter be written.

As with all Identity Manager configuration objects, objects that are to be changed should be copied, renamed, and then modified.

For examples of AttrParse objects that ship with Identity Manager see the sample\attrparse.xml file. It lists the default AttrParse objects used by the screen scraping adapters.

AttrParse Element and Tokens

AttrParse Element

Attributes

Data

One or more tokens that parse user listings. The following tokens supported by the AttrParse object

Example

The following example reads the first 19 characters of a line, trims extraneous whitespace, and assigns the string as the value to the USERID resource attribute. It then skips forward five spaces and extracts the NAME resource attribute. This attribute has a maximum of 21 characters, and whitespace is trimmed. The sample checks for the string “Phone number: “. A telephone number will be parsed out and assigned to the PHONE resource attribute. The phone number begins after the space in “Phone number: “ and ends at the next space encountered. The trailing space is trimmed.

Attribute	Description
name	Uniquely defines the AttrParse object. This value will be specified on the Resource Parameters page for the adapter.

<AttrParse name='Example AttrParse'>
   <str name='USERID' trim='true' len='19'/>
   <skip len='5'/>
   <str name='NAME' trim='true' len='21'/>
   <t offset='-1'>Phone number: </t>
   <str name='PHONE' trim='true' term=' '/>
</AttrParse>

The following strings satisfy the Example AttrParse grammar. (The symbols represent spaces.)

collectCsvHeader Token

The collectCsvHeader token reads a line designated as the header of a comma seperated values (CSV) file.

The Scripted Gateway adapter is the only adapter that can use this token. The collectCsvHeader and collectCsvLines tokens are the only tokens that determine attributes that can be used with this adapter.

Each name in the header must be the same as a resource user attribute on the schema map on the resource adapter. If a string in the header does not match a resource user attribute name, it and the values in the corresponding position in the subsequent data lines will be ignored.

Attributes


Attribute	Description
idHeader	Specifies which value in the header is considered the account ID. This attribute is optional, but recommended. If it is not specified, then the value for the nameHeader attribute will be used.
nameHeader	Specifies which value in the header is considered the name for the account. This is often the same value as idHeader, and if not specified, the value in idHeader is used. This attribute is optional but recommended.
delim	Optional. The string that separates values in the header. The default value is , (comma).
minCount	Specifies the minimum number of instances of the string specified in the delim attribute that a valid header must have.
trim	Optional. If set to true, then if a value has leading or trailing blanks, remove them. The default is false.
unQuote	Optional. If set to true, then if a value is enclosed in quotes, remove them. The default is false.

Data

Example

The following example identifies accountId as the value to be used for the account ID. Whitespace and quotation marks are removed from values.

collectCsvLines Token

The collectCvsLines token parses a line in a comma seperated values (CSV) file. The collectCvsHeader token must have been previously invoked.

Attributes

If any of the following attributes are not specified, then the value is inherited from the previously-issued collectCsvHeader token.


Attribute	Description
idHeader	Specifies which value is considered the account ID.
nameHeader	Specifies which value is considered the name for the account.
delim	Optional. The string that separates values in the header. The default value is , (comma).
trim	Optional. If set to true, then if a value has leading or trailing blanks, remove them. The default is false.
unQuote	Optional. If set to true, then if a value is enclosed in quotes, remove them. The default is false.

Data

Example

eol Token

The eol token matches the end of line character (\n). The parse position will be advanced to the first character on the next line.

Attributes

Data

Example

flag Token

The flag token is often used inside an opt token to determine if a flag that defines an account property exists on a user account. This token searches for a specified string. If the text is found, AttrParse assigns the boolean value true to the attribute, then adds the entry to the attribute map.

The parse position will be advanced to the first character after the matched text.

Attributes


Attribute	Description
name	The name of the attribute to use in the attribute value map. The name is usually the same as a resource user attribute on the schema map on the resource adapter, but this is not a requirement.
offset	The number of characters to skip before searching for the text for the token. The offset can have the following values: 1 or higher — Moves the specified number of characters before trying to match the token’s text. 0 — Searches for text at the current parse position. This is the default value. -1 — Indicates the token’s text will be matched at the current parse position, but the parse position will not go past the string specified in the termToken attribute, if present.
termToken	A string to use as an indicator that the text being searched for is not present. This string is often the first word or label in the next line on the screen output. The parse position will be the character after the termToken string. The termToken attribute can only be used if the len attribute is negative one (-1).

Data

Examples

int Token

The int token captures an account attribute that is an integer. The attribute name and integer value will be added to the account attribute map. The parse position will be advanced to the first character after the integer.

Attributes


Attribute	Description
name	The name of the attribute to use in the attribute value map. The name is usually the same as a resource user attribute on the schema map on the resource adapter, but this is not a requirement.
len	Indicates the exact length of the expected integer. The length can have the following values: 1 or higher — Captures the specified number of characters and checks to see if the text is an integer value or if it matches the characters specified in the noval attribute. -1 — The parser will take the longest string of digits starting at the current parse position unless the next characters equal the noval attribute. This is the default value.
noval	Optional. A label on the screen that indicates the attribute does not have an integer value. Essentially, it is a null value indicator. The parse position will be advanced to the first character after the noval string.

Data

Examples

loop Token

The loop token repeatedly executes the elements it contains until the input is exhausted.

Attributes

Data

Example

multiLine Token

The multiLine token matches a pattern that recurs on multiple lines. If the next line matches the multiLine's internal AttrParse string, the parsed output will be added to the account attribute map at the top level. The parse position will be advanced to the first line that doesn't match the internal AttrParse string.

Attributes

Data

Example

Attribute	Description
opt	Indicates the internal AttrParse string might be optional. Indicates that there might be no lines that match the internal AttrParse string and that parsing should continue with the next token.

The following multiLine token matches multiple group lines that have a GROUPS[space][space][space]= tag and a space delimited group list.

<multiLine opt='true'>
   <t>GROUPS[space][space][space]=</t>
   <str name='GROUP' multi='true' delim=' ' trim='true'/>
   <skipToEol/>
</multiLine>

AttrParse would add GROUPS = {Group1,Group2,Group3,Group4} to the account attribute map, given the following string is read as input:

GROUPS[space][space][space]= Group1[space]Group2\n
GROUPS[space][space][space]= Group3[space]Group4\n
Unrelated text...

opt Token

The opt token parses optional strings that are arbitrarily complex, such as those that are composed of multiple tokens. If the match token is present, then the internal AttrParse string is used to parse the next part of the screen. If an optional section is present, the parse position will be advanced to the character after the end of the optional section. Otherwise, the parse position is unchanged.

Attributes

Data

apMatch — Contains the token to match to determine whether the optional section is present. apMatch is a subtoken that can be used only within the opt token. apMatch token always contains the flag token as a subtoken.

AttrParse — Specifies how to parse the optional part of the screen. This version of the AttrParse element does not use the name argument. It can contain any other token.

Example

The following opt token attempts to match a CONSNAME= text token. If it is found, then it will parse a string of length 8, trim whitespace, and add the string to the account attribute map for the NETVIEW.CONSNAME attribute.

<opt>
   <apMatch>
      <t offset='-1'> CONSNAME= </t>
   </apMatch>
   <AttrParse>
      <str name='NETVIEW.CONSNAME' len='8' trim='true' />
   </AttrParse>
</opt>

skip Token

The skip token tokenizes areas of the screen that can be skipped and that don't contain useful information about the user that should be parsed. The parse position will be advanced to the first character after the skipped characters.

Attributes

Data

Examples

In the following examples, the first token skips 17 characters, while the second skips only one character.

skipLinesUntil Token

Attribute	Description
len	indicates the number of characters to skip on the screen.

The skipLinesUntil token skips over lines of input until one is found that has at least the specified number of instances of a given string.

Attributes

Data

Example

The following token skips forward to the next line that contains two commas. The parse position will be at the first character of that line.

skipToEol Token

Attribute	Description
token	The string to search for.
minCount	The minimum number of instances of the string specified in the token attribute that must be present.

The skipToEol token skips all characters from the current parse position to the end of the current line. The parse position will be advanced to the first character on the next line.

Attributes

Data

Example

The following token skips all characters until the end of the current line. The parse position will be at the first character of the next line.

skipWhitespace Token

The skipWhitespace token is used to skip any number of whitespace characters. The system uses Java's definition of whitespace. The parse position will be advanced to the first non-whitespace character.

Attributes

Data

Example

str Token

The str token captures an account attribute that is a string. The attribute name and string value will be added to the account attribute map. The parse position will be advanced to the first character after the string.

Attributes


Attribute	Description
name	The name of the attribute to use in the attribute value map. The name is usually the same as a resource user attribute on the schema map on the resource adapter, but this is not a requirement.
len	Indicates the exact length of the expected string. The length can have the following values: 1 or higher — Captures the specified number of characters, unless the characters equal the noval attribute. -1 — Captures all the characters from the current parse position until the next whitespace character, unless the next characters equal the noval attribute. This is the default.
term	A string that indicates parsing should stop for this str token when any of the characters in the string are reached. If the len argument is 1 or higher, then either the str token will end at len, or the term character, whichever comes first.
termToken	A string to use as an indicator that the text being searched for is not present. This string is often the first word or label in the next line on the screen output. The parse position will be the character after the termToken string. The string added to the attribute map will be all the characters before the termToken was found. The termToken attribute can only be used if the len attribute is negative one (-1).
trim	Optional. A true or false value that indicates whether the returned value or multiple values (if the multi attribute is specified) are trimmed before being added to the account attribute map. The default value is false.
noval	A label on the screen that indicates the attribute doesn't have an string value. Essentially, it is a null value indicator. The parse position will be advanced to the first character after the noval string.
multiLine	A true or false value that indicates whether the string will span multiple screen lines. This attribute can only be used if a len attribute is supplied and is assigned a value greater than zero. If multiLine is present, end of line characters will be skipped until the number of characters specified in the len attribute have been parsed.
multi	A true or false value that indicates that the string captured is a multi-valued attribute that must be further parsed to find each sub-value. The multiple values can either be appended together using the appendSeparator or can be turned into a list of values.
delim	A delimiter for parsing the multi-valued string. This attribute can only be used if the multi attribute is specified. If this is not specified, then the multi str token is assumed to be delimited by spaces.
append	A true or false value that indicates that the multiple values should be appended together into a string using the appendSeparator. If append is not present, the multiple values will be put into a list for the account attribute value map. This attribute is used in conjunction with the multi attribute.
appendSeparator	Indicates the string to separate the multiple values for an append token. This attribute is only valid if the append attribute is set to true. If the appendSeparator is not present, the append attribute does not use a separator. Instead, it concatenates the multiple values into the result string.

Data

Examples

The following token matches a string of length 21 characters and trims whitespace off of the front and back.

Given the string [space][space]George Washington[space][space], AttrParse adds NAME=”George Washington” to the account attribute map.

The following token matches a string of arbitrary length terminated by a ) (right parenthesis).

Given the string, 2 – Monday, Wednesday - )text, AttrParse adds STATISTICS.SEC-VIO=”2 – Monday, Wednesday - “ to the account attribute map.

The following token matches a list of words delimited by spaces from the current parse position to the end of the current line.

Given the string, Group1 Group2 newGroup lastGroup\n, AttrParse adds a list of group name strings {Group1, Group2, newGroup, lastGroup} to the account attribute map for the GROUP attribute.

The following token performs the same function as the previous example, except the account attribute map will contain GROUP={Group1:Group2:newGroup:lastGroup}

t Token

The t token is used to tokenize text. It is commonly used to recognize labels during screen scraping and provide knowledge of where on the screen you are parsing. The parse position will be advanced to the first character after the matched text. The parser always moves left to right within a line of text.

Attributes


Attribute	Description
offset	The number of characters to skip before searching for the text for the token. The offset can have the following values: 1 or higher — Moves the specified number of characters before trying to match the token’s text. 0 — Searches for text at the current parse position. This is the default value. -1 — Indicates the token’s text will be matched at the current parse position, but the parse position will not go past the string specified in the termToken attribute, if present.
termToken	A string that indicates parsing should stop for this token. The parse position will be the character after the termToken string. The termToken attribute can only be used if the offset attribute is negative one (-1).

Previous Contents Index Next
Sun[TM] Identity Manager 8.0 Resources Reference