Sun Identity Manager 8.1 Resources Reference

Chapter 49 Implementing the AttrParse Object

The AttrParse object encapsulates a grammar used to parse user listings. It is used primarily by mainframe-based resource adapters that receive a screen of data at a time and must parse out the desired results. (This technique is often called screen scraping.) The Shell Script and Scripted Gateway adapters also use AttrParse with getUser and getAllUsers actions.

The adapters that use the AttrParse object model the screen as a Java string. An instantiation of an AttrParse object contains one or more tokens. Each token defines a portion of the screen. These tokens are used to “tokenize” the screen string and allow the adapters to discover the user properties from the user listing.

After parsing a user listing, AttrParse returns a map of user attribute name/value pairs.

Configuration

As with all other Identity Manager objects, the AttrParse objects are serialized to XML for persistent storage. AttrParse objects can then be configured to support differences in customer environments. For example, the ACF2 mainframe security system is often customized to include additional fields and field lengths. Since AttrParse objects reside in the repository, they can be changed and configured to account for these differences without requiring that a custom adapter be written.

As with all Identity Manager configuration objects, objects that are to be changed should be copied, renamed, and then modified.

Editing an AttrParse Object

From the Debug page, select AttrParse from the drop-down menu adjacent to the List Objects button. Click List Objects.

From the list of available objects, select the object you want to edit.

Copy, edit, and rename the object in your XML editor-of-choice.

From the Configure page, select Import Exchange File to import the new file into Identity Manager.

In your resource, change the AttrParse resource attribute to the name of the new AttrParse string.

For examples of AttrParse objects that ship with Identity Manager see the sample\attrparse.xml file. It lists the default AttrParse objects used by the screen scraping adapters.

AttrParse Element and Tokens

AttrParse Element

The AttrParse element defines the AttrParse object.

Attributes

Attribute	Description
`name`	Uniquely defines the AttrParse object. This value will be specified on the Resource Parameters page for the adapter.

Data

One or more tokens that parse user listings. The following tokens supported by the AttrParse object

Example

The following example reads the first 19 characters of a line, trims extraneous white space, and assigns the string as the value to the USERID resource attribute. It then skips forward five spaces and extracts the NAME resource attribute. This attribute has a maximum of 21 characters, and white space is trimmed. The sample checks for the string “Phone number: “. A telephone number will be parsed out and assigned to the PHONE resource attribute. The phone number begins after the space in “Phone number: “ and ends at the next space encountered. The trailing space is trimmed.

<AttrParse name=’Example AttrParse’>
   <str name=’USERID’ trim=’true’ len=’19’/>
   <skip len=’5’/>
   <str name=’NAME’ trim=’true’ len=’21’/>
   <t offset=’-1’>Phone number: </t>
   <str name=’PHONE’ trim=’true’ term=’ ’/>
</AttrParse>

The following strings satisfy the Example AttrParse grammar. (The· symbols represent spaces.)

gwashington123·····ABCD·George·Washington····Phone·number:·123-1234·
alincoln···········XYZ··Abraham·Lincoln······Phone·number:·321-4321·

In the first case after parsing, the user attribute map would contain:

USERID=“gwashington123”, NAME=“George Washington”, PHONE=“123-1234”

Similarly, the second user attribute map would contain:

USERID=”alincoln”, NAME=”Abraham Lincoln”, PHONE=“321-4321”

The rest of the text is ignored.

collectCsvHeader Token

The collectCsvHeader token reads a line designated as the header of a comma-separated values (CSV) file.

The Scripted Gateway adapter and Shell Script adapter, among others, can use this token. The collectCsvHeader and collectCsvLines tokens are the only tokens that the Scripted Gateway adapter can use.

Each name in the header must be the same as a resource user attribute on the schema map on the resource adapter. If a string in the header does not match a resource user attribute name, it and the values in the corresponding position in the subsequent data lines will be ignored.

Attributes

Attribute	Description
`idHeader`	Specifies which value in the header is considered the account ID. This attribute is optional, but recommended. If it is not specified, then the value for the `nameHeader` attribute will be used.
`nameHeader`	Specifies which value in the header is considered the name for the account. This is often the same value as `idHeader`, and if not specified, the value in `idHeader` is used. This attribute is optional but recommended.
`delim`	Optional. The string that separates values in the header. The default value is , (comma).
`minCount`	Specifies the minimum number of instances of the string specified in the `delim` attribute that a valid header must have.
`trim`	Optional. If set to `true`, then if a value has leading or trailing blanks, remove them. The default is `false`.
`unQuote`	Optional. If set to `true`, then if a value is enclosed in quotes, remove them. The default is `false`.

Data

None

Example

The following example identifies accountId as the value to be used for the account ID. White space and quotation marks are removed from values.

<collectCsvHeader idHeader=’accountId’ delim=’,’ trim=’true’ unQuote=’true’/>

collectCsvLines Token

The collectCvsLines token parses a line in a comma-separated values (CSV) file. The collectCvsHeader token must have been previously invoked.

Attributes

If any of the following attributes are not specified, then the value is inherited from the previously-issued collectCsvHeader token.

Attribute	Description
`idHeader`	Specifies which value is considered the account ID.
`nameHeader`	Specifies which value is considered the name for the account.
`delim`	Optional. The string that separates values in the header. The default value is , (comma).
`trim`	Optional. If set to `true`, then if a value has leading or trailing blanks, remove them. The default is `false`.
`unQuote`	Optional. If set to `true`, then if a value is enclosed in quotes, remove them. The default is `false`.

Data

None

Example

The following example removes white space and quotation marks from values.

<collectCsvLines trim=’yes’ unQuote=’yes’/>

eol Token

The eol token matches the end of line character (\n). The parse position will be advanced to the first character on the next line.

Attributes

None

Data

None

Example

The following token matches the end-of-line character.

<eol/>

flag Token

The flag token is often used inside an opt token to determine if a flag that defines an account property exists on a user account. This token searches for a specified string. If the text is found, AttrParse assigns the boolean value true to the attribute, then adds the entry to the attribute map.

The parse position will be advanced to the first character after the matched text.

Attributes

Attribute

Description

name

The name of the attribute to use in the attribute value map. The name is usually the same as a resource user attribute on the schema map on the resource adapter, but this is not a requirement.

offset

The number of characters to skip before searching for the text for the token. The offset can have the following values:

1 or higher moves the specified number of characters before trying to match the token’s text.
0 searches for text at the current parse position. This is the default value.
-1 indicates the token’s text will be matched at the current parse position, but the parse position will not go past the string specified in the termToken attribute, if present.

termToken

A string to use as an indicator that the text being searched for is not present. This string is often the first word or label in the next line on the screen output.

The parse position will be the character after the termToken string.

The termToken attribute can only be used if the len attribute is negative one (-1).

Data

The text to match.

Examples

flag Token Examples

The following token will match AUDIT at the current parse position, and if found, adds AUDIT_FLAG=true to the user attribute map.
<flag offset=’-1’ name=’AUDIT’>AUDIT_FLAG</flag>

The following token will match xxxxCICS at the current parse position, where xxxx are any four characters, including spaces. If this string is found, AttrParse adds CICS=true to the user attribute map.
<flag offset=’4’ name=’CICS’>CICS</flag>

int Token

The int token captures an account attribute that is an integer. The attribute name and integer value will be added to the account attribute map. The parse position will be advanced to the first character after the integer.

Attributes

Attribute

Description

name

The name of the attribute to use in the attribute value map. The name is usually the same as a resource user attribute on the schema map on the resource adapter, but this is not a requirement.

len

Indicates the exact length of the expected integer. The length can have the following values:

1 or higher captures the specified number of characters and checks to see if the text is an integer value or if it matches the characters specified in the noval attribute.
-1 indicates the parser will take the longest string of digits starting at the current parse position unless the next characters equal the noval attribute. This is the default value.

noval

Optional. A label on the screen that indicates the attribute does not have an integer value. Essentially, it is a null value indicator. The parse position will be advanced to the first character after the noval string.

Data

None

Examples

int Token Examples

The following token matches a 6-digit integer and puts integer value of those digits into the attribute value map for the SALARY attribute.
<int name=’SALARY’ len=’6’/>
If the value 010250 is found, AttrParse adds SALARY=10250 to the value map.

The following token matches any number of digits and adds that integer value to the attribute map for the AGE attribute.
<int name=’AGE’ len=’-1’ noval=’NOT GIVEN’/>
If the value 34 is found, for example, AGE=34 would be added to the attribute map. For string NOT GIVEN, a value will not be added to the attribute map for the AGE attribute.

loop Token

The loop token repeatedly executes the elements it contains until the input is exhausted.

Attributes

None

Data

Varies

Example

The following example reads the contents of a CSV file.

<loop>
   <skipLinesUntil token=’,’ minCount=’4’ />
   <collectCsvHeader idHeader=’accountId’ />
   <collectCvsLines />
</loop>

multiLine Token

The multiLine token matches a pattern that recurs on multiple lines. If the next line matches the multiLine’s internal AttrParse string, the parsed output will be added to the account attribute map at the top level. The parse position will be advanced to the first line that doesn’t match the internal AttrParse string.

Attributes

Attribute	Description
`opt`	Indicates the internal AttrParse string might be optional. Indicates that there might be no lines that match the internal AttrParse string and that parsing should continue with the next token.

Attribute

Description

opt

Indicates the internal AttrParse string might be optional.

Indicates that there might be no lines that match the internal AttrParse string and that parsing should continue with the next token.

Data

Any AttrParse tokens to parse a line of data.

Example

The following multiLine token matches multiple group lines that have a GROUPS[space][space][space]= tag and a space delimited group list.

<multiLine opt=’true’>
   <t>GROUPS[space][space][space]=</t>
   <str name=’GROUP’ multi=’true’ delim=’ ’ trim=’true’/>
   <skipToEol/>
</multiLine>

AttrParse would add GROUPS = {Group1,Group2,Group3,Group4} to the account attribute map, given the following string is read as input:

GROUPS[space][space][space]= Group1[space]Group2\n
GROUPS[space][space][space]= Group3[space]Group4\n
Unrelated text...

opt Token

The opt token parses optional strings that are arbitrarily complex, such as those that are composed of multiple tokens. If the match token is present, then the internal AttrParse string is used to parse the next part of the screen. If an optional section is present, the parse position will be advanced to the character after the end of the optional section. Otherwise, the parse position is unchanged.

Attributes

None

Data

Contains the apMatch token, followed by an AttrParse token.

apMatch. Contains the token to match to determine whether the optional section is present. apMatch is a subtoken that can be used only within the opt token. apMatch token always contains the flag token as a subtoken.

AttrParse. Specifies how to parse the optional part of the screen. This version of the AttrParse element does not use the name argument. It can contain any other token.

Example

The following opt token attempts to match a CONSNAME= text token. If it is found, then it will parse a string of length 8, trim white space, and add the string to the account attribute map for the NETVIEW.CONSNAME attribute.

<opt>
   <apMatch>
      <t offset=’-1’> CONSNAME= </t>
   </apMatch>
   <AttrParse>
      <str name=’NETVIEW.CONSNAME’ len=’8’ trim=’true’ />
   </AttrParse>
</opt>

skip Token

The skip token tokenizes areas of the screen that can be skipped and that don’t contain useful information about the user that should be parsed. The parse position will be advanced to the first character after the skipped characters.

Attributes

Attribute	Description
`len`	Indicates the number of characters to skip on the screen.

Data

None

Examples

In the following examples, the first token skips 17 characters, while the second skips only one character.

<skip len=’17’/>
<skip len=’1’/>

skipLinesUntil Token

The skipLinesUntil token skips over lines of input until one is found that has at least the specified number of instances of a given string.

Attributes

Attribute	Description
`token`	The string to search for.
`minCount`	The minimum number of instances of the string specified in the token attribute that must be present.

Data

None

Example

The following token skips forward to the next line that contains two commas. The parse position will be at the first character of that line.

<skipLinesUntil token=’,’ minCount=’2’/>

skipToEol Token

The skipToEol token skips all characters from the current parse position to the end of the current line. The parse position will be advanced to the first character on the next line.

Attributes

None

Data

None

Example

The following token skips all characters until the end of the current line. The parse position will be at the first character of the next line.

<skipToEol/>

skipWhitespace Token

The skipWhitespace token is used to skip any number of white-space characters. The system uses Java’s definition of white space. The parse position will be advanced to the first non-white-space character.

Attributes

None

Data

None

Example

The following token skips all the white space at the current parse position.

<skipWhitespace/>

str Token

The str token captures an account attribute that is a string. The attribute name and string value will be added to the account attribute map. The parse position will be advanced to the first character after the string.

Attributes

Attribute	Description
`name`	The name of the attribute to use in the attribute value map. The name is usually the same as a resource user attribute on the schema map on the resource adapter, but this is not a requirement.
`len`	Indicates the exact length of the expected string. The length can have the following values: 1 or higher captures the specified number of characters, unless the characters equal the `noval` attribute. -1 captures all the characters from the current parse position until the next white-space character, unless the next characters equal the `noval` attribute. This is the default.
`term`	A string that indicates parsing should stop for this `str` token when any of the characters in the string are reached. If the len argument is 1 or higher, then either the `str` token will end at len, or the term character, whichever comes first.
`termToken`	A string to use as an indicator that the text being searched for is not present. This string is often the first word or label in the next line on the screen output. The parse position will be the character after the termToken string. The string added to the attribute map will be all the characters before the termToken was found. The termToken attribute can only be used if the len attribute is negative one (-1).
`trim`	Optional. A `true` or `false` value that indicates whether the returned value or multiple values (if the multi attribute is specified) are trimmed before being added to the account attribute map. The default value is `false`.
`noval`	A label on the screen that indicates the attribute doesn’t have an string value. Essentially, it is a null value indicator. The parse position will be advanced to the first character after the `noval` string.
`multiLine`	A `true` or `false` value that indicates whether the string will span multiple screen lines. This attribute can only be used if a len attribute is supplied and is assigned a value greater than zero. If multiLine is present, end of line characters will be skipped until the number of characters specified in the len attribute have been parsed.
`multi`	A `true` or `false` value that indicates that the string captured is a multi-valued attribute that must be further parsed to find each sub-value. The multiple values can either be appended together using the `appendSeparator` or can be turned into a list of values.
`delim`	A delimiter for parsing the multi-valued string. This attribute can only be used if the multi attribute is specified. If this is not specified, then the multi `str` token is assumed to be delimited by spaces.
`append`	A `true` or `false` value that indicates that the multiple values should be appended together into a string using the `appendSeparator`. If append is not present, the multiple values will be put into a list for the account attribute value map. This attribute is used in conjunction with the multi attribute.
`appendSeparator`	Indicates the string to separate the multiple values for an append token. This attribute is only valid if the append attribute is set to true. If the `appendSeparator` is not present, the append attribute does not use a separator. Instead, it concatenates the multiple values into the result string.

Data

None

Examples

The following token matches a string of length 21 characters and trims white space off the front and back.

<str name=’NAME’ trim=’true’ len=’21’/>

Given the string [space][space]George Washington[space][space], AttrParse adds NAME=”George Washington” to the account attribute map.
The following token matches a string of length 21 characters and trims white space off the front and back.

<str name=’NAME’ trim=’true’ len=’21’/>

Given the string [space][space]George Washington[space][space], AttrParse adds NAME=”George Washington” to the account attribute map.
The following token matches a string of arbitrary length terminated by a ) (right parenthesis).

<str name=’STATISTICS.SEC-VIO’ term=’)’ />

Given the string, 2– Monday, Wednesday - )text, AttrParse adds STATISTICS.SEC-VIO=”2– Monday, Wednesday - “ to the account attribute map.
The following token matches a list of words delimited by spaces from the current parse position to the end of the current line.

<str name=’GROUP’ multi=’true’ delim=’ ’ trim=’true’/>

Given the string, Group1 Group2 newGroup lastGroup\n, AttrParse adds a list of group name strings {Group1, Group2, newGroup, lastGroup} to the account attribute map for the GROUP attribute.
The following token performs the same function as the previous example, except the account attribute map will contain GROUP={Group1:Group2:newGroup:lastGroup}

<str name=’GROUP’ multi=’true’ delim=’ ’ trim=’true’ append=’true’ appendSeperator=’:’ />

t Token

The t token is used to tokenize text. It is commonly used to recognize labels during screen scraping and provide knowledge of where on the screen you are parsing. The parse position will be advanced to the first character after the matched text. The parser always moves left to right within a line of text.

Attributes

Attribute	Description
`offset`	The number of characters to skip before searching for the text for the token. The offset can have the following values: 1 or higher moves the specified number of characters before trying to match the token’s text. 0 searches for text at the current parse position. This is the default value. -1 indicates the token’s text will be matched at the current parse position, but the parse position will not go past the string specified in the termToken attribute, if present.
`termToken`	A string that indicates parsing should stop for this token. The parse position will be the character after the termToken string. The termToken attribute can only be used if the offset attribute is negative one (-1).

Attribute

Description

offset

The number of characters to skip before searching for the text for the token. The offset can have the following values:

1 or higher moves the specified number of characters before trying to match the token’s text.
0 searches for text at the current parse position. This is the default value.
-1 indicates the token’s text will be matched at the current parse position, but the parse position will not go past the string specified in the termToken attribute, if present.

termToken

A string that indicates parsing should stop for this token. The parse position will be the character after the termToken string.

The termToken attribute can only be used if the offset attribute is negative one (-1).

Data

The text to match

Examples

The following token matches Address Line 1:[space] at the current parse position.

<t offset=’-1’>Address Line 1: </t>
The following token matches xxZip Code:[space] at the current parse position, where xx can be any two characters, including spaces.

<t offset=’2’>Zip Code: </t>
The following token matches Phone:[space] at the current parse position. If AttrParse finds the string Employee ID first, then it will generate an error.

<t offset=’-1’ termToken=’Employee ID’>Phone: </t>