Previous     Contents     Index     Next     
iPlanet Directory Server Resource Kit 5.1 Tools Reference



Chapter 18   ldifxform


The ldifxform (LDIF transform) tool reformats the contents of an LDIF (LDAP Data Interchange Format) text file. It can be used to analyze or edit the contents of a directory offline by processing the LDIF output of the db2ldif tool (see Chapter 7, "Command-Line Scripts," in the iPlanet Directory Server Command, Configuration and File Reference).

This tool can perform many transformations on LDIF input, such as converting between all of the most common character sets, extracting attribute values, modifying attribute names, ordering entries based on attribute values, or giving detailed statistics. In all cases, modifications are written to a new LDIF file, and the input file is never modified.

This chapter contains the following sections:



Command Usage

The ldifxform tool acts as a stream filter, reading input from one file, performing any number of transformations and writing the output to another file. Each transformation is specified by a command parameter on the command line. Several compatible transformations may be performed simultaneously.

All possible operations are listed in "Reformatting Commands". Some produce LDIF output destined to be reloaded into a directory. For example, renaming an attribute can be more easily processed on an LDIF file than online through requests to a directory server.

Other reformatting operations do not produce LDIF: they are intended to provide an analysis of directory contents. For example, you may extract all different values of a specific attribute and list them under the DN in which they occur. The statistical operations provide counts of entries and attributes.


Syntax

The ldifxform tool has the following syntax:

ldifxform [-i input .ldif] [-o outputFile ] -c "command " ...

Where:

  • input .ldif is a readable file that contains the LDIF text input.

  • outputFile is a writable file that will contain the reformatted LDIF. Some transformations and character conversions will produce output that is not valid LDIF.

  • command is one of the supported operations to perform on the input, usually enclosed in double quotes ("") for the shell. See "Reformatting Commands" for the list of all available commands. You may specify multiple commands, each preceded by -c, if they are mutually compatible.

The input and output parameters are optional, and the tool will use the standard input and output if either or both are omitted. However, the use of files is recommended for portability because standard 8-bit input and output are not fully supported on the Windows platform.

The ldifxform -h command will display the usage help text that briefly describes all options.


Options

The ldifxform options and parameters are described in the following table.


Table 18-1    Command-Line Options for the ldifxform Tool 

Option

Parameter

Purpose

-i  

input .ldif  

Specify the input file that contains the LDIF text to process. When this option is omitted, the tool will read the standard input.  

-o  

outputFile  

Specify the output file for the reformatted LDIF result. Note that some operations do not produce LDIF output. When this option is omitted, the tool will write to the standard output.  

-c  

command  

Specify an operation for the tool to apply to the input. The command parameter is one of the transformations described in "Reformatting Commands". This option may be repeated on the command line when the corresponding operations are compatible.  

-h  

 

Display the usage help text that briefly describes all options.  



Reformatting Commands



The following tables list the type of transformations available through the -c command parameter of the ldifxform tool. The commands are grouped by operations type:


LDIF Transformations

The LDIF transformations affect the encoding and general appearance of LDIF text files.


Table 18-2    LDIF Text Transformations Using ldifxform  

Command

Formatting Effect

 -c nob64  

Will undo any base-64 transformations. Note that the output will not be reparsable if there are any binary-valued attributes or attributes beginning with special characters.  

 -c sevenbit  

Reformats the output as seven-bit characters by base-64 encoding any attribute values that contain non-ASCII bytes. The output is always reparsable.  

 -c longlines  

Prevents ldifxform from wrapping lines at the 79th column in the output file. This argument is necessary when editing an LDIF file using a UTF-8 aware editor, otherwise a character may be wrapped in the middle of its encoding. The output will be reparsable, but many popular system tools (such as grep or sed) may not be able to handle lines longer than 1024 characters.  

 -c notypes  

Removes attribute type names: the output is no longer LDIF. This is useful for generating a list of values (see "Examples and Sample Output").  

 -c nodn  

Removes distinguished names: the output is no longer LDIF. This is also useful for generating a list of values.  

 -c cleanzero  

Removes trailing zero bytes from attribute values. This option is needed only when processing an LDIF file from a buggy encoder.  


Character Set Conversions

The ldifxform tool can be used to convert LDIF files to different charsets (character sets). Conversions are useful for porting LDIF files between platforms and for use in directories that require localized data. When porting between platforms, you must ensure that all data in the original can be represented in the target charset.


Table 18-3    Character Set Conversions Using ldifxform  

Command

Formatting Effect

 -c from=88591  

Converts the input file from the ISO-8859-1 charset into the UTF-8 charset. This conversion allows the source data to be written using ISO-8859-1 text editors.  

 -c to=88591  

Converts the input file from UTF-8 into the ISO-8859-1 charset. The output is no longer LDIF, and characters that cannot be represented in ISO-8859-1 will be stripped out.  

 -c from=t61  

Converts the input file from the T.61 charset into the UTF-8 charset. This option should be used when converting data obtained from an X.500 or LDAPv2 servers that used T.61 charset encoding by default.  

 -c to=t61  

Converts the input file from UTF-8 into the T.61 charset. The output is no longer LDIF, and characters that cannot be represented in T.61 will be stripped out.  

 -c from=mstext
 -c to=mstext
 

This pair of commands converts between UTF-8 and the Windows Unicode Text file format.  

 -c from=charSet
 -c to=charSet
 

Additional transformations are supported between UTF-8 and the following platform-specific charSet s. Platform-specific transformations will result in data loss for values that cannot be represented in the target charset:

Solaris platform: 646, 8859-1, 8859-2, 8859-3, 8859-4, 8859-5, 8859-6, 8859-7, 8859-8, 8859-9, 8859-10, eucJP, gb2312, iso2022, KOI8-R, PCK, SJIS, UTF-7, zh_CN.euc, zh_CN.iso2022-7, zh_TW-big5, zh_TW-euc, zh_TW-iso2022-7

AIX platform: ASCII-GR, CNS11643.1986-1, CNS11643.1986-2, IBM-1046, IBM-1124, IBM-1129, IBM-850, IBM-856, IBM-932, IBM-eucJP, IBM-eucKR, IBM-eucTW, IBM-sbdTW, IBM-udcJP, IBM-udcTW, ISO8859-1, ISO8859-2, ISO8859-3, ISO8859-4, ISO8859-5, ISO8859-6, ISO8859-7, ISO8859-8, ISO8859-9, JISX0201.1976-0, JISX0208.1983-0, KSC5601.1987-0, TIS-620, big5, ct, fold7, fold8, uucode

HP-UX platform: roman8, iso8859_1, iso8859_2, iso8859_5, iso8859_6, iso8859_7, iso8859_8, iso8859_9, tis620, eucJP, sjis, big5, ccdc, eucKR, chinese-gb  


Attribute Name Modifications

These operations simplify global attribute modifications by replacing or removing attributes for all entries in the LDIF input.


Table 18-4    Attribute Name Modifications Using ldifxform  

Command

Formatting Effect

 -c suppressoptions  

Remove all options other than binary from attribute type names.  

 -c tcut=attr  

Remove the attribute named attr from all entries where it is found. To remove multiple attributes, specify this command multiple times on the command line.  

 -c tpreserve=attr  

Remove all attributes except for the given attr from all entries. To preserve multiple attributes, specify this command multiple times on the command line.  

 -c treplace=old :new  

Replace the old attribute type name with the new name in all entries where it occurs.  


Ordering of Entries

Many directory servers return search results in the order that entries were loaded into the database. By presorting a set of entries known to be static, clients can avoid having to sort results with every query.


Table 18-5    Ordering of LDAP Entries Using ldifxform  

Command

Formatting Effect

 -c order  

Reorders all entries in the file into hierarchical order.  

 -c sort=attr  

Sorts entries by increasing value (lowest to highest) of the given attr attribute. This is equivalent to alphabetical order for string-valued attributes.  

 -c sort=^attr  

Sorts entries by decreasing value (highest to lowest) of the given attr attribute. This is equivalent to reverse alphabetical order for string-valued attributes.  

 -c split=N  

Generates multiple LDIF output files that can be loaded into a server by N clients in parallel. The output files are named:

     outputFile _ldifxform_c _n

Where:

  • outputFile is the parameter of the -o option and specifies a writable directory and filename prefix.

  • c is the number of components in the root DN of the LDIF file.

  • n is the number of the part, from 1 to N.

 


Directory Statistics

The ldifxform tool can be used to analyze the contents of the directory from which the LDIF file is extracted. The output includes a detailed count of DN structures and attribute value occurrences.


Table 18-6    Extracting Directory Statistics Using ldifxform  

Command

Formatting Effect

 -c stats  

Generates statistical information and appends it as an LDIF comment to the end of the output file.  

 -c statsonly  

Generates and outputs only the statistical information. This command is not compatible with any other reformatting or LDIF transformation commands.  



Examples and Sample Output



The examples in this section demonstrate the output of the ldixform tool. These examples are based the input file two.ldif (also used in Chapter 19 in "Examples and Sample Output"):

dn: sn=Jensen,dc=siroe,dc=com
objectclass: top
objectclass: person
cn: Babs Jensen
sn: Jensen
telephoneNumber: 555-5550
createTimestamp: 100

dn: sn=Minsky,dc=siroe,dc=com
objectclass: top
objectclass: person
cn: Pete Minsky
sn: Minsky
telephoneNumber: 555-5559
modifyTimestamp: 200

dn: sn=Morris,dc=siroe,dc=com
objectclass: top
objectclass: person
cn: Ted Morris
sn: Morris
telephoneNumber: 555-5558
createTimestamp: 200

The following example shows how to reformat the information in this file so that it is presented as a simple list. The result appears on the standard output because no -o option was specified. It gives the ordered list of all telephone numbers assigned to employees of Siroe.com.

Removing the DNs from the output saves space and makes the information more readable. The "sentinel" markers are used internally by the tool and can be edited out of the result if not needed.

$ ldifxform -i /export/temp/two.ldif \
            -c "tpreserve=telephonenumber" \
            -c "tpreserve=cn" \
            -c "sort=telephonenumber" \
            -c nodn -c notypes

version: 1
#:ordered: TRUE

objectclass: top
objectclass: sentinel

objectclass: top
objectclass: sentinel

 Babs Jensen
 555-5550

 Ted Morris
 555-5558

 Pete Minsky
 555-5559

The following example shows the statistical output of the ldifxform tool. The description of the various counters is self-contained within the generated comments.

$ ldifxform -i /export/temp/two.ldif -c statsonly

# Basic statistics
#:linecount: 27
#:entrycount: 4
# Number of nonleaf entries (at least one subordinate)
#:nonleafcount: 1
# Number of leaf entries (no subordinates)
#:leafcount: 4
# Largest number of entries immediately below a nonleaf entry
#:maximmsubr: 4
# Number of levels in the DIT hierarchy
#:maxdepth: 3
# Largest number of AVAs in an RDN of an entry's DN (normally 1)
#:maxrdns: 1
# Attribute types used in the LDIF file
# e is number entries containing this attr, v is total number of
# values, l is total length, m is max length of any one value,
# s is general syntax and x is extra encoding information.
#:attrstatsinfo: t=deletetimestamp e=1 v=1 l=3 m=3 i=1 s=int
#:attrstatsinfo: t=telephonenumber e=3 v=3 l=24 m=8 i=1 s=tel
#:attrstatsinfo: t=sn e=3 v=3 l=18 m=6 i=1 s=cis x=alphanumeric
#:attrstatsinfo: t=cn e=3 v=3 l=32 m=11 i=1 s=cis x=ascii
#:attrstatsinfo: t=objectclass e=4 v=7 l=38 m=11 i=2 s=cis
#                x=alphanumeric
# Counts of values of specific attribute types
#:attrdomaininfo: t=objectclass v=1 nsTombstone
#:attrdomaininfo: t=objectclass v=3 person
#:attrdomaininfo: t=objectclass v=3 top
# Number of entries with the latest createTimestamp value (UTC)
#:lastaddcount: c=1 t=200
# Number of entries with the latest modifyTimestamp value (UTC)
#:lastmodcount: c=1 t=200

Previous     Contents     Index     Next     
Copyright 2002 Sun Microsystems, Inc.. All rights reserved.

Last Updated April 15, 2002