Sun logo      Previous      Contents      Index      Next     

Sun ONE Meta-Directory 5.1.1 Administration Guide

Chapter 5
Configuring the Universal Text Parser

The Universal Text Parser links an external data source with the Meta-Directory Join Engine. You can also synchronize various text-based data with the Meta-Directory views.

This chapter has the following sections:


Overview

The Universal Text Parser (UTP) is a generic text file parser and generator that you can use to build connectors for sources that supply data in a text-based format. You can customize the Universal Text Parser by describing the text-based data that is input to the Universal connector. You can also specify the output of the Universal connector, enabling you to fully synchronize data with an external data source.

The Universal Text Parser uses Perl scripts to provide the link between the external data source and the Universal connector. To use the Universal Text Parser, you need to export the database to ASCII text which the Universal Text Parser synchronizes with the Connector View. The Universal Text Parser generates data to a file that the external database can import. (The external database exports and imports data.)

Meta-Directory supplies three pre-configured configuration files for the most common text-based data representations:

You can configure the Universal Text Parser to support data in any text format. The input text file should have UTF8 data present in distinguished name and vrn values escaped with \xx according to the RFC 2253 standard. UTF8 data present in other attribute values should be base64 encoded over UTF8 encoding. Special characters if present in distinguishedname and vrn values should be escaped.

Universal Text Parser Modules

The Universal Text Parser consists of the following program modules:

Table 5-1  Universal Text Parser modules

task.cfg

Text configuration file used to describe data that you want to flow into and out of the Connector View. You must customize this file for each external data source to synchronize with the Meta-Directory views.

template.pl

Perl library that implements the interface between the script and Universal Text Connector.

universal.pm

Perl module that contains the main engine for interpreting user settings in the configuration file.

textparser.pm

Perl module that contains a generic set of routines for parsing files.

connectorutils.pm

Perl module that contains various generic routines that a Perl connector may use.

These modules are located in the following directory of the installed Meta-Directory:

NETSITE_ROOT/bin/utc50/install/templates/universalparser

NETSITE_ROOT is the root directory of the Sun ONE Meta-Directory installation.


Caution

Do not modify any of the Universal Text Parser modules other than the task.cfg configuration file.



About the Task.cfg Configuration File

To synchronize information between a text-based data source and the Meta-Directory views, you must configure an instance of the Universal Text Parser for each separate text-based data source you need to synchronize.

You can configure the link between the external data source and Universal Text Parser by creating a custom configuration file ‘task.cfg’. This configuration file describes the data format and file specifics of the text-based data.

The task.cfg configuration file describes the text-based data that the external data source exports to the Universal Text Parser. Task.cfg file also describes the text-based output of the Universal Text Parser, which you can import to the external database.

Pre-Configured Files

The task.cfg configuration file specifies the details of the text-based data. The Meta-Directory package includes three pre-configured files that you can use as the boilerplate for your task.cfg file. Each pre-configured file supports a specific format of text-based information, see Table 5-2.

Table 5-2  Pre-configured templates for the task.cfg configuration file

Format Supported

Boilerplate File Name

Description

Comma-Separated Value

csv.cfg

Supports the synchronization of data formatted as a comma-separated value (CSV) list.

Name-Value Pair

nvp.cfg

Supports the synchronization of data formatted as a name-value pair (NVP) list.

LDIF

ldif.cfg

Supports the synchronization of data formatted in LDIF.

Each of the pre-configured boilerplate files has an example input file that matches the settings in the boilerplate. Accordingly, these files are named sampledata.csv, sampledata.nvp, and sampledata.ldif, and can be found in the directory containing the rest of the Universal Text Parser files.

To customize a configuration file for your requirements, modify the appropriate boilerplate file to match the entry attributes and mappings of the data stream. Details on how to customize each file can be reviewed in the file as comments.

Non-Conforming Formats

If you have text-based data that does not conform to the format of one of the pre-configured files, you can modify the task.cfg file.

The modification of the supplied task.cfg file is supported through the services of the Sun ONE Professional Services. If your needs require a more detailed modification of the Universal Text Parser modules than what is described in this chapter, contact your Sun ONE Professional Services representative.


Setting Up the Universal Text Parser

    To set up the files in the Universal Text Parser
  1. Copy these Universal Text Parser modules to the directory containing the text-based data for processing:
  2. Copy the appropriate boilerplate file to the directory used in Step 1 and rename the file to task.cfg.
  3. The boilerplate file is either csv.cfg, nvp.cfg or ldif.cfg, as described in the section "Pre-Configured Files".

  4. Modify the boilerplate file (now named task.cfg) according to the instructions described in these sections:
  5. Configure the Universal Connector as described in Chapter 4, "Configuring the Universal Connector."
    • When you create the connector instance, ensure that you load the schema, as described in "Creating the Universal Connector Instance".
    • Select the connector instance, and then select the Script tab.
    • Enter the path name and file name of the script .pl file.
  6. Restart the connector instance, as described in "Restarting the Connector Instance."


Creating the Configuration Files for Data Files

Creating the File for Comma-Separated Value Files

Use the csv.cfg file as a boilerplate for the task.cfg configuration file if the data is formatted in a comma-separated value (CSV) ASCII text file.

Before You Begin

Consider the following information when synchronizing a comma-separated value data file with the Universal Text Parser:

Creating the File

After you have set up the Universal Text Parser, make the following modifications to the task.cfg file:

  1. Modify the LineFormat statement so that it contains the attribute names of the data you want to synchronize. Separate each attribute name with a comma.
    1. If needed, map the attribute names in the data file you are inputting to the names defined in the specified LDAP schema.
    2. All attribute names must be contained in the LDAP schema that is specified in the task.cfg file. By default, this is inetOrgPerson. If the attribute names in your input file differ from those contained in the declared schema, you must map the attribute names in your input file to ones defined in inetOrgPerson.

      Map the attribute names by first specifying the external database attribute name, a colon, then the associated LDAP attribute name. Separate attributes with a comma, as shown in the following example:

      LineFormat=ID:uid,NTDOMAINACCOUNT:cn,SURNAME:sn,
      GIVENNAME:givenName,INITIALS:initials

      The order in which you specify the attributes is important; it must match exactly the order of the data supplied in your comma-separated value data file.

      Normally, you can find the order of the attributes listed in the header of the data file. For an example of this, review the comma-separated value data file that is supplied with the product, sampledata.csv.

    3. Specify a “format” if your data file uses a separator other than a comma.
    4. The format can be either a character delimiter, a regular expression, or a field size (which is indicated by a digit). To specify a format, follow the external database attribute name with a pound sign (#) and list the necessary format, as shown below in the example for ImportLineFormat.

  2. Specify the ImportLineFormat if you are going to import into your external database any modifications that are generated through the Universal Text Connector.
    1. In the ImportLineFormat statement, supply the names of attributes generated by the Universal Text Connector in the order that you need them to be written to the output file. This is the order that your external database will import the entry attributes.
    2. Specify the LDAP attribute name, a colon, then the external database attribute name. Follow the external database attribute name with a pound sign (#) and the delimiter, which in the following example is a comma. Separate each attribute listed with a comma, as shown in the following example:
    3. ImportLineFormat=operation#,,uid:ID#,,
      cn:NTDOMAINACCOUNT#,,   sn:SURNAME#,,
      givenName:GIVENNAME#,,initials:INITIALS#,,

      As shown in the example above, the first data value is an operation, the value of which can be either add, modify, or delete. The operation indicates the database action needed to process the respective entry.

      ImportLineFormat is like the LineFormat statement in that you must map the external database attributes to attribute names defined in the specified LDAP schema, which is inetOrgPerson by default.

      If needed, you can specify a different LDAP object class using the AdditionalAttributes statement. Note, however, that you can use only a single schema in your comma-separated value file.

  3. Modify the IndexAttribute= statement if you need to provide an attribute to index other than cn. By default, the line reads:
  4. IndexAttribute=cn

    To index on a different attribute value, specify the required attribute name in this statement. When specifying an attribute to index, use an attribute that contains unique values among each of the database entries. It is important, however, that you not use the distinguished name (dn) as the IndexAttribute value.

  5. Specify the name of the file containing your comma-separated value data on the line that reads:
  6. InputFile=%ScriptBase%sampledata.csv

    On this line, replace the text sampledata.csv with the name of your data file.

    InputFile is generated by DumpCommand, and OutputFile is used by ImportCommand. By default, the InputFile statement declares that your data file is located in the same directory as the Universal Text Parser modules. Because of this, you must copy your data file to this directory so the Universal Text Parser can read the file. If needed, you can place your data file in a different directory; be sure to supply the full path and file name to your data file in this statement.

  7. Configure the ImportCommand statement if you are going to import data into your external database. To do so:
    1. Uncomment the ImportCommand statement by removing the pound sign (#) listed at the beginning of the statement.
    2. Specify the import command your external database uses.
  8. If it is supported, supply the command that exports the data from your external database using the DumpCommand statement:
    1. Uncomment the DumpCommand statement by removing the pound sign (#) listed at the beginning of the statement.
    2. Specify the export command your external database uses.
  9. Specify the comma separated list of attributes for which value can go from some value (multiple or single) to no value using ‘MultiValToNoValAttr’. The attribute names listed against this parameter should be the attribute names used in the external data source and one should not specify the attribute names used at the Connector View end. For example,
  10. MultiValToNoValAttr=EMAIL,DESCRIPTION

Creating the File for Name-Value Pair Files

Use the nvp.cfg file as a boilerplate for the task.cfg configuration file if the data is formatted in a name-value pair (NVP) ASCII text file.

Before You Begin

Consider the following information when synchronizing a name-value pair data file with the Universal Text Parser:

    To create the file

After you have set up the Universal Text Parser, make the following modifications to the task.cfg file:

  1. Modify the IndexAttribute= statement if you need to provide an attribute to index other than cn. By default, the line reads:
  2. IndexAttribute=cn

    To index on a different attribute value, specify the required attribute name in this statement. When specifying an attribute to index, use an attribute that contains unique values among each of the database entries. It is important, however, that you not use the distinguished name (dn) as the IndexAttribute value.

  3. Specify the name of the file containing your name-value pair data on the line that reads:
  4. InputFile=%ScriptBase%sampledata.nvp

    On this line, replace the text sampledata.nvp with the name of your data file.

    By default, the InputFile statement declares that your data file is located in the same directory as the Universal Text Parser modules. Because of this, you must copy your external data file to this directory so the Universal Text Parser can read the file. If needed, you can place your data file in a different directory; be sure to supply the full path and file name to your data file in this statement.

  5. Configure the ImportCommand statement if you are going to import data into your external database. To do so:
    1. Uncomment the ImportCommand statement by removing the pound sign (#) listed at the beginning of the statement.
    2. Specify the import command that your external database uses.
  6. If it is supported, supply the command that exports the data from your external database using the DumpCommand statement:
    1. Uncomment the DumpCommand statement by removing the pound sign (#) listed at the beginning of the statement.
    2. Specify the export command that your external database uses.
  7. Specify the comma separated list of attributes for which value can go from some value (multiple or single) to no value using ‘MultiValToNoValAttr’. The attribute names listed against this parameter should be the attribute names used in the external data source and one should not specify the attribute names used at the Connector View end. For example,
  8. MultiValToNoValAttr=mail,description

Creating the File for LDIF Files

Use the ldif.cfg file as a boilerplate for the task.cfg configuration file if the data is formatted in LDIF.

Before You Begin

Consider the following information when synchronizing an LDIF data file with the Universal Text Parser:

    To creating the file

After you have set up the Universal Text Parser, make the following modifications to the task.cfg file:

  1. Specify the ImportLineFormat if you are going to import into your external database any modifications that are generated through the Universal Text Connector.
    1. In the ImportLineFormat statement, supply the names of attributes generated by the Universal Text Connector in the order that you need them to be written to the output file, with each attribute separated by a comma. For example:
    2. ImportLineFormat=distinguishedname:dn,
      operation:changetype,objectclass,uid,cn,sn,mail,
      title,description

      The Universal Text Parser will not generate any attributes excluded from this line.

    3. Before listing the attributes you want to generate, you must specify two special values to support data output in LDIF. You specify these values as follows:
    4. distinguishedname:dn,operation:changetype,

  2. Modify the IndexAttribute= statement if you need to provide an attribute to index other than cn. By default, the line reads:
  3. IndexAttribute=cn

    To index on a different attribute value, specify the required attribute name in this statement. When specifying an attribute to index, use an attribute that contains unique values among each of the database entries. It is important, however, that you not use the distinguished name (dn) as the IndexAttribute value.

  4. Specify the name of the file containing your LDIF data on the line that reads:
  5. InputFile=%ScriptBase%sampledata.ldif

    On this line, replace the text sampledata.ldif with the name of your data file.

    By default, the InputFile statement declares that your data file is located in the same directory as the Universal Text Parser modules. Because of this, you must copy your data file to this directory so the Universal Text Parser can read the file. If needed, you can place your data file in a different directory; be sure to supply the full path and file name to your data file in this statement.

  6. Configure the ImportCommand statement if you are going to import data into your external database. To do so:
    1. Uncomment the ImportCommand statement by removing the pound sign (#) listed at the beginning of the statement.
    2. Specify the import command that your external database uses. For example, you could specify the following ImportCommand statement:
    3. ImportCommand=myDBUtil -l administrator -p password -f ‘d:\\sunone\\servers\\utc-cv3\\logs\\test.out’

  7. If it is supported, supply the command that exports the data from your external database using the DumpCommand statement:
    1. Uncomment the DumpCommand statement by removing the pound sign (#) listed at the beginning of the statement.
    2. Specify the export command that your external database uses.
  8. Specify the comma separated list of attributes for which value can go from some value (multiple or single) to no value using ‘MultiValToNoValAttr’. The attribute names listed against this parameter should be the attribute names used in the external data source and one should not specify the attribute names used at the Connector View end. For example,
  9. MultiValToNoValAttr=mail,description


About UTC Data Exchange Format

UTC and the Connector Perl Script exchange data using UTC Data Exchange Format (UDEF). UDEF allows user to specify binary values and multiple values for the multi-valued attribute using special tags.

Binary Values

Binary (base64 encoded) values are prefixed with the tag [B].

Examples:

Multiple Values

There are two ways to specify multiple values. If the data is multiline, that is, each attribute name-value pair is specified in a separate line; then, the multiple values for an attribute can be specified in a separate line as shown below:

This option is not available for csv data because in this data the record is specified in a single line.

Another way to specify multiple values is to use a comma separated list of values and prefix it with the [L] tag.

Examples:

Multiple Binary Values:


Using Special Characters and UTF8 Data in DN

This section specifies the guidelines for using special characters and UTF8 data in the DN value; while preparing input data files for UTC-UTP based connectors.

Special characters (“,”, “+”, “\”, “ “ ”, “<“, “>”, or “;”) appearing in the DN should be escaped using “\”(ASCII 92). Also, if the DN contains UTF8 data, it must be escaped using the “\xx” notation (RFC 2253). The effect of the configuration parameter ‘EvaluateEscape’ (specified in the task.cfg file) must also be considered.

EvaluateEscapes=(TRUE|FALSE)

Special Characters in DN

Case 1: EvaluateEscapes=TRUE

Special character in the DN value should be prefixed by 2 backlashes

Case 2: EvaluateEscapes=FALSE or undefined

Special character in the DN value should be prefixed by 1 backslash

UTF8 Data in DN

All data is considered UTF8 except the 7-bit ASCII data and it should be escaped using the ‘\xx’ notation when it appears in the DN. For details on enabling UTF8 support for Meta-Directory, see "Enabling UTF8 in Indirect Connectors."

For more details on escaping of special characters and UTF8 data in DN, see this document: “RFC 2253 (Lightweight Directory Access Protocol (v3): UTF-8 String Representation of Distinguished Names)”. For more information, visit: http://www.ietf.org/rfc/rfc2253.txt.



Previous      Contents      Index      Next     


Copyright 2004 Sun Microsystems, Inc. All rights reserved.