iPlanet Meta-Directory Configuration and Administration Guide: Chapter 9 Configuring the Universal Text Parser

Previous Contents Index Next

iPlanet Meta-Directory Configuration and Administration Guide

Chapter 9 Configuring the Universal Text Parser

The Universal Text Parser enables you to link an external data source with the Meta-Directory join engine. With the Universal Text Parser, you can synchronize a wide variety of text-based data with your Meta-Directory views.
This chapter has the following sections:

An Overview of the Universal Text Parser

The task.cfg Configuration File

Setting Up the Universal Text Parser

Creating a task.cfg File for Comma-Separated Value Data Files

Creating a task.cfg File for Name-Value Pair Data Files

Creating a task.cfg File for LDIF Data Files

An Overview of the Universal Text Parser

The Universal Text Parser (UTP) is a generic text file parser and generator that you can use to build connectors for sources that supply data in a text-based format. You customize the Universal Text Parser by describing the text-based data that is input to the Universal connector. You can also specify the output of the Universal connector, enabling you to fully synchronize data with an external data source.
The Universal Text Parser uses Perl scripts to provide the linkage between your external data source and the Universal connector. To use the Universal Text Parser, you export your database to ASCII text which the Universal Text Parser reads and synchronizes with the connector view. The Universal Text Parser generates data to a file that your external database can import. (The external database exports and imports data.)
Meta-Directory supplies three pre-configured configuration setups for the most common text-based data representations:

Comma-separated values

Name-value pairs

LDIF
You can configure the Universal Text Parser to support data in just about any text format. iPlanet Professional Services offers assistance if you need to create a custom Universal Text Parser module to adapt to different data formats.

Universal Text Parser Modules

The Universal Text Parser consists of the following program modules:

Table 9-1    Universal Text Parser modules

task.cfg

Text configuration file used to describe data that you want to flow into and out of the connector view. You must customize this file for each external data source you want to synchronize with your Meta-Directory views.

template.pl

Perl library that implements the interface between the script and the Universal Text Connector.

universal.pm

Perl module that contains the main engine for interpreting user settings in the configuration file.

textparser.pm

Perl module that contains a generic set of routines for parsing files.

connectorutils.pm

Perl module that contains various generic routines that a Perl connector may use.

The program modules are located in the following directory of your installed Meta-Directory product:
NETSITE_ROOT/bin/utc50/install/templates/universalparser
In the path above, NETSITE_ROOT is the root directory of your iPlanet Meta-Directory installation.

Caution
Do not modify any of the Universal Text Parser modules other than the task.cfg configuration file.

The task.cfg Configuration File

To synchronize information between a text-based data source and your Meta-Directory views, you must properly configure an instance of the Universal Text Parser for each separate text-based data source you need to synchronize.
You configure the linkage between your external data source and the Universal Text Parser by creating a custom configuration file, named task.cfg. The configuration file describes the data format and file specifics of the text-based data.
The task.cfg configuration file describes the text-based data that your external data source exports to the Universal Text Parser. The task.cfg configuration also describes the text-based output of the Universal Text Parser, which you can import into your external database.

Pre-configured Configuration Files

The task.cfg configuration file specifies the details of your text-based data. The iPlanet Meta-Directory package supplies three pre-configured files that you can use as the boilerplate for your task.cfg file. Each pre-configured file supports a specific format of text-based information, as shown in Table 9-2.

Table 9-2    Pre-configured templates for the task.cfg configuration file

Format Supported

Boilerplate File Name

Description

Comma-Separated Value
csv.cfg

Supports the synchronization of data formatted as a comma-separated value (CSV) list.

Name-Value Pair
nvp.cfg

Supports the synchronization of data formatted as a name-value pair (NVP) list.

LDIF
ldif.cfg

Supports the synchronization of data formatted in LDIF.

Each of the pre-configured boilerplate files has an example input file that matches the settings in the boilerplate. Accordingly, these files are named sampledata.csv, sampledata.nvp, and sampledata.ldif, and can be found in the directory containing the rest of the Universal Text Parser files.
To customize a configuration file for your needs, modify the appropriate boilerplate file to fit the entry attributes and mappings of your data stream. Details on how to customize each file are located within the file as comments.

Non-Conforming Formats

If you have text-based data that does not conform to the format of one of the pre-configured configuration files, you can modify the task.cfg file to suit your needs.
The modification of the supplied task.cfg file is supported through the services of the iPlanet Professional Services department. If your needs require a more detailed modification of the Universal Text Parser modules than what is described in this chapter, please contact your iPlanet Professional Services representative.

Setting Up the Universal Text Parser

To set up the files in the Universal Text Parser, do the following:

Copy the following Universal Text Parser modules to the directory containing the text-based data you want to process:

template.pl

connectorutils.pm

textparser.pm

universal.pm

These modules are described on page 156.

Copy the appropriate boilerplate file to the directory used in Step 1 and rename the file to task.cfg.

The boilerplate file is either csv.cfg, nvp.cfg or ldif.cfg, as described in the section "Pre-configured Configuration Files".

Modify the boilerplate file (now named task.cfg) according to the instructions in one of the following sections:

Creating a task.cfg File for Comma-Separated Value Data Files

Creating a task.cfg File for Name-Value Pair Data Files

Creating a task.cfg File for LDIF Data Files

When modifying the task.cfg file, remember that regular expressions are parsed using the Perl regular expression syntax, and not by the regular expression syntax used by UNIX^®systems.

If you have not already done so, configure the Universal Connector as described in Chapter 8 "Configuring The Universal Connector."

When you create the connector instance, ask the system to load the schema, as described on page 141.

Select the connector instance, then go to the Script tab.

Enter the path name and file name of the script .pl file.

Restart the connector instance, as described on page 150.

Creating a task.cfg File for Comma-Separated Value Data Files

Use the csv.cfg file as a boilerplate for your task.cfg configuration file if your data is formatted in a comma-separated value (CSV) ASCII text file.

Before You Begin

Consider the following advisory information when synchronizing a comma-separated value data file with the Universal Text Parser:

All attribute names in the LineFormat and ImportLineFormat statements must map to the attribute names in the declared LDAP schema before the Universal Text Parser can process entries. By default, the name-value pair boilerplate file uses inetOrgPerson as the default LDAP schema.

When modifying the task.cfg text file, be sure to use a text editor that does not delete trailing whitespace from the end of lines. Removing the whitespace at the end of certain lines might lead to errors when the Universal Text Parser interprets these lines.

Creating the File

After you have set up the Universal Text Parser, make the following modifications to the task.cfg file to tailor the configuration file to your specific data needs:

Modify the LineFormat statement so that it contains the attribute names of the data you want to synchronize. Separate each attribute name with a comma.

If needed, map the attribute names in the data file you are inputting to the names defined in the specified LDAP schema.

All attribute names must be contained in the LDAP schema that is specified in the task.cfg file. By default, this is inetOrgPerson. If the attribute names in your input file differ from those contained in the declared schema, you must map the attribute names in your input file to ones defined in inetOrgPerson.

Map the attribute names by first specifying the external database attribute name, a colon, then the associated LDAP attribute name. Separate attributes with a comma, as shown in the following example:

LineFormat=ID:uid,NTDOMAINACCOUNT:cn,SURNAME:sn,
GIVENNAME:givenName,INITIALS:initials

The order in which you specify the attributes is important; it must match exactly the order of the data supplied in your comma-separated value data file.

Normally, you can find the order of the attributes listed in the header of the data file. For an example of this, review the comma-separated value data file that is supplied with the product, sampledata.csv.

Specify a "format" if your data file uses a separator other than a comma.

The format can be either a character delimiter, a regular expression, or a field size (which is indicated by a digit). To specify a format, follow the external database attribute name with a pound sign (#) and list the necessary format, as shown below in the example for ImportLineFormat.

Specify the ImportLineFormat if you are going to import into your external database any modifications that are generated through the Universal Text Connector.

In the ImportLineFormat statement, supply the names of attributes generated by the Universal Text Connector in the order that you need them to be written to the output file. This is the order that your external database will import the entry attributes.

Specify the LDAP attribute name, a colon, then the external database attribute name. Follow the external database attribute name with a pound sign (#) and the delimiter, which in the following example is a comma. Separate each attribute listed with a comma, as shown in the following example:

ImportLineFormat=operation#,,uid:ID#,,
cn:NTDOMAINACCOUNT#,,   sn:SURNAME#,,
givenName:GIVENNAME#,,initials:INITIALS#,,

As shown in the example above, the first data value is an operation, the value of which can be either add, modify, or delete. The operation indicates the database action needed to process the respective entry.

ImportLineFormat is like the LineFormat statement in that you must map the external database attributes to attribute names defined in the specified LDAP schema, which is inetOrgPerson by default.

If needed, you can specify a different LDAP object class using the AdditionalAttributes statement. Note, however, that you can use only a single schema in your comma-separated value file.

Modify the IndexAttribute= statement if you need to provide an attribute to index other than cn. By default, the line reads:

IndexAttribute=cn

To index on a different attribute value, specify the required attribute name in this statement. When specifying an attribute to index, use an attribute that contains unique values among each of the database entries. It is important, however, that you not use the distinguished name (dn) as the IndexAttribute value.

Specify the name of the file containing your comma-separated value data on the line that reads:

InputFile=%ScriptBase%sampledata.csv

On this line, replace the text sampledata.csv with the name of your data file.

InputFile is generated by DumpCommand, and OutputFile is used by ImportCommand. By default, the InputFile statement declares that your data file is located in the same directory as the Universal Text Parser modules. Because of this, you must copy your data file to this directory so the Universal Text Parser can read the file. If needed, you can place your data file in a different directory; be sure to supply the full path and file name to your data file in this statement.

Configure the ImportCommand statement if you are going to import data into your external database. To do so:

Uncomment the ImportCommand statement by removing the pound sign (#) listed at the beginning of the statement.

Specify the import command your external database uses.

If it is supported, supply the command that exports the data from your external database using the DumpCommand statement:

Uncomment the DumpCommand statement by removing the pound sign (#) listed at the beginning of the statement.

Specify the export command your external database uses.

Creating a task.cfg File for Name-Value Pair Data Files

Use the nvp.cfg file as a boilerplate for your task.cfg configuration file if your data is formatted in a name-value pair (NVP) ASCII text file.

Before You Begin

Consider the following advisory information when synchronizing a name-value pair data file with the Universal Text Parser:

All attribute names listed in the data file must map to the attribute names contained in the declared LDAP schema before the Universal Text Parser can process the entries. By default, the name-value pair boilerplate file uses inetOrgPerson as the default LDAP schema.

When modifying the task.cfg text file, be sure to use a text editor that does not delete trailing whitespace from the end of lines. Removing the whitespace at the end of certain lines might lead to errors when the Universal Text Parser interprets these lines.

The name-value pair for each entry should include the object class of the attributes for that record. For example, the following line shows a valid example entry in an name-value pair text file:

uid=nvp1_uid
ObjectClass=top
ObjectClass=person
ObjectClass=organizationalperson
ObjectClass=inetOrgPerson
cn=nvp1_cn
sn=nvp1
mail=nvp1@iplanet.com
title=Title for nvp1
description=This is the description for nvp1

If you are importing data into your external database that the Universal Text Parser has generated, the external database must not assume the order of the attributes generated for any given entry.

Creating the File

After you have set up the Universal Text Parser, make the following modifications to the task.cfg file to tailor the configuration file to your specific data needs:

Modify the IndexAttribute= statement if you need to provide an attribute to index other than cn. By default, the line reads:

IndexAttribute=cn

To index on a different attribute value, specify the required attribute name in this statement. When specifying an attribute to index, use an attribute that contains unique values among each of the database entries. It is important, however, that you not use the distinguished name (dn) as the IndexAttribute value.

Specify the name of the file containing your name-value pair data on the line that reads:

InputFile=%ScriptBase%sampledata.nvp

On this line, replace the text sampledata.nvp with the name of your data file.

By default, the InputFile statement declares that your data file is located in the same directory as the Universal Text Parser modules. Because of this, you must copy your external data file to this directory so the Universal Text Parser can read the file. If needed, you can place your data file in a different directory; be sure to supply the full path and file name to your data file in this statement.

Configure the ImportCommand statement if you are going to import data into your external database. To do so:

Uncomment the ImportCommand statement by removing the pound sign (#) listed at the beginning of the statement.

Specify the import command that your external database uses.

If it is supported, supply the command that exports the data from your external database using the DumpCommand statement:

Uncomment the DumpCommand statement by removing the pound sign (#) listed at the beginning of the statement.

Specify the export command that your external database uses.

Creating a task.cfg File for LDIF Data Files

Use the ldif.cfg file as a boilerplate for your task.cfg configuration file if your data is formatted in LDIF.

Before You Begin

Consider the following advisory information when synchronizing an LDIF data file with the Universal Text Parser:

All attribute names in the ImportLineFormat statement must map to the attribute names in the declared LDAP schema before the Universal Text Parser can process the entries. By default, the name-value pair boilerplate file uses inetOrgPerson as the default LDAP schema.

When modifying the task.cfg text file, be sure to use a text editor that does not delete trailing whitespace from the end of lines. Removing the whitespace at the end of certain lines might lead to errors when the Universal Text Parser interprets these lines.

In particular, note that the two lines ImportAttributeNamesSeparator and AttributeNamesSeparator must both end with a trailing whitespace.

If you specify an attribute flow rule, you turn on attribute data flow and you must provide the mappings. If you do not specify an attribute flow rule (if you use only entry level data flow), then all the mappings are configured in the LineFormat statement in your task.cfg file.

Creating the File

After you have set up the Universal Text Parser, make the following modifications to the task.cfg file to tailor the configuration file to your specific data needs:

Specify the ImportLineFormat if you are going to import into your external database any modifications that are generated through the Universal Text Connector.

In the ImportLineFormat statement, supply the names of attributes generated by the Universal Text Connector in the order that you need them to be written to the output file, with each attribute separated by a comma. For example:

ImportLineFormat=distinguishedname:dn,
operation:changetype,objectclass,uid,cn,sn,mail,
title,description

The Universal Text Parser will not generate any attributes excluded from this line.

Before listing the attributes you want to generate, you must specify two special values to support data output in LDIF. You specify these values as follows:

distinguishedname:dn,operation:changetype,

Modify the IndexAttribute= statement if you need to provide an attribute to index other than cn. By default, the line reads:

IndexAttribute=cn

To index on a different attribute value, specify the required attribute name in this statement. When specifying an attribute to index, use an attribute that contains unique values among each of the database entries. It is important, however, that you not use the distinguished name (dn) as the IndexAttribute value.

Specify the name of the file containing your LDIF data on the line that reads:

InputFile=%ScriptBase%sampledata.ldif

On this line, replace the text sampledata.ldif with the name of your data file.

By default, the InputFile statement declares that your data file is located in the same directory as the Universal Text Parser modules. Because of this, you must copy your data file to this directory so the Universal Text Parser can read the file. If needed, you can place your data file in a different directory; be sure to supply the full path and file name to your data file in this statement.

Configure the ImportCommand statement if you are going to import data into your external database. To do so:

Uncomment the ImportCommand statement by removing the pound sign (#) listed at the beginning of the statement.

Specify the import command that your external database uses. For example, you could specify the following ImportCommand statement:

ImportCommand=myDBUtil -l administrator -p password -f "d:\\iplanet\\servers\\utc-cv3\\logs\\test.out"

If it is supported, supply the command that exports the data from your external database using the DumpCommand statement:

Uncomment the DumpCommand statement by removing the pound sign (#) listed at the beginning of the statement.

Specify the export command that your external database uses.

task.cfg	Text configuration file used to describe data that you want to flow into and out of the connector view. You must customize this file for each external data source you want to synchronize with your Meta-Directory views.
template.pl	Perl library that implements the interface between the script and the Universal Text Connector.
universal.pm	Perl module that contains the main engine for interpreting user settings in the configuration file.
textparser.pm	Perl module that contains a generic set of routines for parsing files.
connectorutils.pm	Perl module that contains various generic routines that a Perl connector may use.

Format Supported	Boilerplate File Name	Description
Comma-Separated Value	csv.cfg	Supports the synchronization of data formatted as a comma-separated value (CSV) list.
Name-Value Pair	nvp.cfg	Supports the synchronization of data formatted as a name-value pair (NVP) list.
LDIF	ldif.cfg	Supports the synchronization of data formatted in LDIF.

Previous Contents Index Next
Copyright © 2002 Sun Microsystems, Inc. All rights reserved.

Last Updated April 08, 2002