Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun ONE Directory Server Resource Kit 5.2 Tools Reference 

Chapter 21
The LDIF Transformation Tool

The ldifxform tool reformats the contents of an LDIF (LDAP Data Interchange Format) text file. This chapter provides instructions on how to use the ldifxform tool. It contains the following sections:


Overview

The ldifxform (LDIF transform) tool reformats the contents of an LDIF text file. This tool can perform many transformations on LDIF input, such as converting between all of the most common character sets, extracting attribute values, modifying attribute names, ordering entries based on attribute values, or giving detailed statistics. In all cases, modifications are written to a new LDIF file, and the input file is never changed.


Tip

ldifxform can also be used to analyze or edit the contents of a directory offline by processing the LDIF output of the db2ldif tool. For more information, see “db2ldif (Export Database Contents to LDIF)” in Chapter 2 of the Sun ONE Directory Server Reference Manual).



Command Usage

The ldifxform tool acts as a stream filter, reading input from one file, performing transformations and writing the output to another file. Each transformation is specified by a command parameter on the command-line. Several compatible transformations may be performed simultaneously.

All possible operations are listed in Reformatting Operations. Some produce LDIF output destined to be reloaded into a directory. For example, renaming an attribute can be more easily processed on an LDIF file than online through requests to a directory server. Other reformatting operations do not produce LDIF; they are intended to provide an analysis of directory contents. For example, you may extract all different values of a specific attribute and list them under the DN in which they occur. The statistical operations provide counts of entries and attributes.

Syntax

The syntax of the ldifxform tool on the command-line takes the following form:

ldifxform [-i input.ldif] [-o outputFile] -c "command" ...

Where:

Options

The ldifxform options and parameters are described in Table 21-1. The ldifxform -h command will display usage help text that briefly describes all options.

Table 21-1  Command-Line Options for the ldifxform Tool 

Option

Parameter

Purpose

-i

input.ldif

Specify the input file that contains the LDIF text to process. When this option is omitted, the tool will read the standard input.

-o

outputFile

Specify the output file for the reformatted LDIF result. Note that some operations do not produce LDIF output. When this option is omitted, the tool will write to the standard output.

-c

command

Specify an operation for the tool to apply to the input. The command parameter is one of the transformations described in Reformatting Operations. This option may be repeated on the command-line when the corresponding operations are compatible.

-h

 

Display the usage help text that briefly describes all options.


Reformatting Operations

The following sections list the transformations available through the -c command parameter of the ldifxform tool. The commands are grouped by type:

LDIF Transformations

The LDIF transformations affect the encoding and general appearance of LDIF text files.

Table 21-2  LDIF Text Transformations Using ldifxform  

Command

Formatting Effect

 -c nob64

Will undo any base-64 transformations. Note that the output will not be reparsable if there are any binary-valued attributes or attributes beginning with special characters.

 -c sevenbit

Reformats the output as seven-bit characters by base-64 encoding any attribute values that contain non-ASCII bytes. The output is always reparsable.

 -c longlines

Prevents ldifxform from wrapping lines at the 79th column in the output file. This argument is necessary when editing an LDIF file using a UTF-8 aware editor as a character may be wrapped in the middle of its encoding. The output will be reparsable, but many popular system tools (such as grep or sed) may not be able to handle lines longer than 1024 characters.

 -c notypes

Removes attribute type names, therefore the output is no longer LDIF. This is useful for generating a list of values. (See Command-Line Examples.)

 -c nodn

Removes distinguished names, therefore the output is no longer LDIF. This is also useful for generating a list of values.

 -c cleanzero

Removes trailing zero bytes from attribute values. This option is needed only when processing an LDIF file from a buggy encoder.

Character Set Conversions

The ldifxform tool can be used to convert LDIF files to different charsets (character sets). Conversions are useful for porting LDIF files between platforms and for use in directories that require localized data. When porting between platforms, you must ensure that all data in the original can be represented in the target charset.

Table 21-3  Character Set Conversions Using ldifxform  

Command

Formatting Effect

 -c from=88591

Converts the input file from the ISO-8859-1 charset into the UTF-8 charset. This conversion allows the source data to be written using ISO-8859-1 text editors.

 -c to=88591

Converts the input file from UTF-8 into the ISO-8859-1 charset. The output is no longer LDIF, and characters that cannot be represented in ISO-8859-1 will be stripped out.

 -c from=t61

Converts the input file from the T.61 charset into the UTF-8 charset. This option should be used when converting data obtained from an X.500 or LDAPv2 servers that used T.61 charset encoding by default.

 -c to=t61

Converts the input file from UTF-8 into the T.61 charset. The output is no longer LDIF, and characters that cannot be represented in T.61 will be stripped out.

 -c from=mstext
 -c to=mstext

This pair of commands converts between UTF-8 and the Windows Unicode Text file format.

 -c from=charSet
 -c to=charSet

Additional transformations are supported between UTF-8 and the following platform-specific charSets. Platform-specific transformations will result in data loss for values that cannot be represented in the target charset:

Solaris platform: 646, 8859-1, 8859-2, 8859-3, 8859-4, 8859-5, 8859-6, 8859-7, 8859-8, 8859-9, 8859-10, eucJP, gb2312, iso2022, KOI8-R, PCK, SJIS, UTF-7, zh_CN.euc, zh_CN.iso2022-7, zh_TW-big5, zh_TW-euc, zh_TW-iso2022-7

AIX platform: ASCII-GR, CNS11643.1986-1, CNS11643.1986-2, IBM-1046, IBM-1124, IBM-1129, IBM-850, IBM-856, IBM-932, IBM-eucJP, IBM-eucKR, IBM-eucTW, IBM-sbdTW, IBM-udcJP, IBM-udcTW, ISO8859-1, ISO8859-2, ISO8859-3, ISO8859-4, ISO8859-5, ISO8859-6, ISO8859-7, ISO8859-8, ISO8859-9, JISX0201.1976-0, JISX0208.1983-0, KSC5601.1987-0, TIS-620, big5, ct, fold7, fold8, uucode

HP-UX platform: roman8, iso8859_1, iso8859_2, iso8859_5, iso8859_6, iso8859_7, iso8859_8, iso8859_9, tis620, eucJP, sjis, big5, ccdc, eucKR, chinese-gb

Attribute Modifications

These operations simplify global attribute modifications by replacing or removing attributes for all entries in the LDIF input.

Table 21-4  Attribute Name Modifications Using ldifxform  

Command

Formatting Effect

 -c suppressoptions

Remove all options other than binary from attribute type names.

 -c tcut=attr

Remove the attribute named attr from all entries where it is found. To remove multiple attributes, specify this command multiple times on the command-line.

 -c tpreserve=attr

Remove all attributes except for the given attr from all entries. To preserve multiple attributes, specify this command multiple times on the command-line.

 -c treplace=old:new

Replace the old attribute type name with the new name in all entries where it occurs.

LDAP Entry Presorting

Many directory servers return search results in the order that entries were loaded into the database. By presorting a set of entries known to be static, clients can avoid having to sort results with every query.

Table 21-5  Ordering of LDAP Entries Using ldifxform  

Command

Formatting Effect

 -c order

Reorders all entries in the file into hierarchical order.

 -c sort=attr

Sorts entries by increasing value (lowest to highest) of the given attr attribute. This is equivalent to alphabetical order for string-valued attributes.

 -c sort=^attr

Sorts entries by decreasing value (highest to lowest) of the given attr attribute. This is equivalent to reverse alphabetical order for string-valued attributes.

 -c split=N

Generates multiple LDIF output files that can be loaded into a server by N clients in parallel. The output files are named:

  outputFile_ldifxform_c_n

Where:

  • outputFile is the parameter of the -o option and specifies a writable directory and filename prefix.
  • c is the number of components in the root DN of the LDIF file.
  • n is the number of the part, from 1 to N.

Directory Analysis

The ldifxform tool can be used to analyze the contents of the directory from which the LDIF file is extracted. The output includes a detailed count of DN structures and attribute value occurrences.

Table 21-6  Extracting Directory Statistics Using ldifxform  

Command

Formatting Effect

 -c stats

Generates statistical information and appends it as an LDIF comment to the end of the output file.

 -c statsonly

Generates and outputs statistical information only. This command is not compatible with any other reformatting or LDIF transformation commands.


Command-Line Examples

The examples in this section show how to use the ldixform tool. These examples are based on the input file two.ldif detailed in Code Example 21-1.

Code Example 21-1  two.ldif Input File

dn: sn=Jensen,dc=example,dc=com

objectclass: top

objectclass: person

cn: Babs Jensen

sn: Jensen

telephoneNumber: 555-5550

createTimestamp: 100

dn: sn=Minsky,dc=example,dc=com

objectclass: top

objectclass: person

cn: Pete Minsky

sn: Minsky

telephoneNumber: 555-5559

modifyTimestamp: 200

dn: sn=Morris,dc=example,dc=com

objectclass: top

objectclass: person

cn: Ted Morris

sn: Morris

telephoneNumber: 555-5558

createTimestamp: 200

Transforming LDIF to a List

The following command will reformat the information in two.ldif so it is presented as a simple, hierarchical list of employees placed in order by their telephone numbers. The result appears on the standard output because no -o option is specified.

$ ldifxform -i /export/temp/two.ldif -c "tpreserve=telephonenumber" \
            -c "tpreserve=cn" -c "sort=telephonenumber" \
             -c nodn -c notypes

Removing the DNs saves space and makes the information more readable. The sentinel markers are used internally by the tool and can be edited out of the result if not needed.

Code Example 21-2  Hierarchical List of Employees and Telephone Numbers

version: 1

#:ordered: TRUE

objectclass: top

objectclass: sentinel

objectclass: top

objectclass: sentinel

Babs Jensen

555-5550

Ted Morris

555-5558

Pete Minsky

555-5559

Statistical Output

The following command shows the statistical output of the ldifxform tool. The description of the various counters is self-contained within the generated comments.

$ ldifxform -i /export/temp/two.ldif -c statsonly

The output includes a detailed count of DN structures and attribute value occurrences. Code Example 21-3 details the output.

Code Example 21-3  Statistics Only Output

# Basic statistics

#:linecount: 27

#:entrycount: 4

# Number of nonleaf entries (at least one subordinate)

#:nonleafcount: 1

# Number of leaf entries (no subordinates)

#:leafcount: 4

# Largest number of entries immediately below a nonleaf entry

#:maximmsubr: 4

# Number of levels in the DIT hierarchy

#:maxdepth: 3

# Largest number of AVAs in an RDN of an entry’s DN (normally 1)

#:maxrdns: 1

# Attribute types used in the LDIF file

# e is number entries containing this attr, v is total number of

# values, l is total length, m is max length of any one value,

# s is general syntax and x is extra encoding information.

#:attrstatsinfo: t=deletetimestamp e=1 v=1 l=3 m=3 i=1 s=int

#:attrstatsinfo: t=telephonenumber e=3 v=3 l=24 m=8 i=1 s=tel

#:attrstatsinfo: t=sn e=3 v=3 l=18 m=6 i=1 s=cis x=alphanumeric

#:attrstatsinfo: t=cn e=3 v=3 l=32 m=11 i=1 s=cis x=ascii

#:attrstatsinfo: t=objectclass e=4 v=7 l=38 m=11 i=2 s=cis

#                x=alphanumeric

# Counts of values of specific attribute types

#:attrdomaininfo: t=objectclass v=1 nsTombstone

#:attrdomaininfo: t=objectclass v=3 person

#:attrdomaininfo: t=objectclass v=3 top

# Number of entries with the latest createTimestamp value (UTC)

#:lastaddcount: c=1 t=200

# Number of entries with the latest modifyTimestamp value (UTC)

#:lastmodcount: c=1 t=200



Previous      Contents      Index      Next     


Copyright 2004 Sun Microsystems, Inc. All rights reserved.