Sun ONE Directory Server Resource Kit 5.2 Tools Reference |
Chapter 21
The LDIF Transformation ToolThe ldifxform tool reformats the contents of an LDIF (LDAP Data Interchange Format) text file. This chapter provides instructions on how to use the ldifxform tool. It contains the following sections:
OverviewThe ldifxform (LDIF transform) tool reformats the contents of an LDIF text file. This tool can perform many transformations on LDIF input, such as converting between all of the most common character sets, extracting attribute values, modifying attribute names, ordering entries based on attribute values, or giving detailed statistics. In all cases, modifications are written to a new LDIF file, and the input file is never changed.
Tip
ldifxform can also be used to analyze or edit the contents of a directory offline by processing the LDIF output of the db2ldif tool. For more information, see “db2ldif (Export Database Contents to LDIF)” in Chapter 2 of the Sun ONE Directory Server Reference Manual).
Command UsageThe ldifxform tool acts as a stream filter, reading input from one file, performing transformations and writing the output to another file. Each transformation is specified by a command parameter on the command-line. Several compatible transformations may be performed simultaneously.
All possible operations are listed in Reformatting Operations. Some produce LDIF output destined to be reloaded into a directory. For example, renaming an attribute can be more easily processed on an LDIF file than online through requests to a directory server. Other reformatting operations do not produce LDIF; they are intended to provide an analysis of directory contents. For example, you may extract all different values of a specific attribute and list them under the DN in which they occur. The statistical operations provide counts of entries and attributes.
Syntax
The syntax of the ldifxform tool on the command-line takes the following form:
ldifxform [-i input.ldif] [-o outputFile] -c "command" ...
Where:
- input.ldif is a readable file that contains the LDIF text input.
- outputFile is a writable file that will contain the reformatted LDIF. Some transformations and character conversions will produce output that is not valid LDIF.
- command is one of the supported operations to perform on the input, usually enclosed in double quotes ("") for the shell. See Reformatting Operations for the list of all available commands. You may specify multiple commands, each preceded by -c, if they are mutually compatible.
Options
The ldifxform options and parameters are described in Table 21-1. The ldifxform -h command will display usage help text that briefly describes all options.
Table 21-1 Command-Line Options for the ldifxform Tool
Option
Parameter
Purpose
-i
input.ldif
Specify the input file that contains the LDIF text to process. When this option is omitted, the tool will read the standard input.
-o
outputFile
Specify the output file for the reformatted LDIF result. Note that some operations do not produce LDIF output. When this option is omitted, the tool will write to the standard output.
-c
command
Specify an operation for the tool to apply to the input. The command parameter is one of the transformations described in Reformatting Operations. This option may be repeated on the command-line when the corresponding operations are compatible.
-h
Display the usage help text that briefly describes all options.
Reformatting OperationsThe following sections list the transformations available through the -c command parameter of the ldifxform tool. The commands are grouped by type:
LDIF Transformations
The LDIF transformations affect the encoding and general appearance of LDIF text files.
Table 21-2 LDIF Text Transformations Using ldifxform
Command
Formatting Effect
-c nob64
Will undo any base-64 transformations. Note that the output will not be reparsable if there are any binary-valued attributes or attributes beginning with special characters.
-c sevenbit
Reformats the output as seven-bit characters by base-64 encoding any attribute values that contain non-ASCII bytes. The output is always reparsable.
-c longlines
Prevents ldifxform from wrapping lines at the 79th column in the output file. This argument is necessary when editing an LDIF file using a UTF-8 aware editor as a character may be wrapped in the middle of its encoding. The output will be reparsable, but many popular system tools (such as grep or sed) may not be able to handle lines longer than 1024 characters.
-c notypes
Removes attribute type names, therefore the output is no longer LDIF. This is useful for generating a list of values. (See Command-Line Examples.)
-c nodn
Removes distinguished names, therefore the output is no longer LDIF. This is also useful for generating a list of values.
-c cleanzero
Removes trailing zero bytes from attribute values. This option is needed only when processing an LDIF file from a buggy encoder.
Character Set Conversions
The ldifxform tool can be used to convert LDIF files to different charsets (character sets). Conversions are useful for porting LDIF files between platforms and for use in directories that require localized data. When porting between platforms, you must ensure that all data in the original can be represented in the target charset.
Attribute Modifications
These operations simplify global attribute modifications by replacing or removing attributes for all entries in the LDIF input.
LDAP Entry Presorting
Many directory servers return search results in the order that entries were loaded into the database. By presorting a set of entries known to be static, clients can avoid having to sort results with every query.
Directory Analysis
The ldifxform tool can be used to analyze the contents of the directory from which the LDIF file is extracted. The output includes a detailed count of DN structures and attribute value occurrences.
Command-Line ExamplesThe examples in this section show how to use the ldixform tool. These examples are based on the input file two.ldif detailed in Code Example 21-1.
Transforming LDIF to a List
The following command will reformat the information in two.ldif so it is presented as a simple, hierarchical list of employees placed in order by their telephone numbers. The result appears on the standard output because no -o option is specified.
$ ldifxform -i /export/temp/two.ldif -c "tpreserve=telephonenumber" \
-c "tpreserve=cn" -c "sort=telephonenumber" \
-c nodn -c notypesRemoving the DNs saves space and makes the information more readable. The sentinel markers are used internally by the tool and can be edited out of the result if not needed.
Code Example 21-2 Hierarchical List of Employees and Telephone Numbers
version: 1
#:ordered: TRUE
objectclass: top
objectclass: sentinel
objectclass: top
objectclass: sentinel
Babs Jensen
555-5550
Ted Morris
555-5558
Pete Minsky
555-5559
Statistical Output
The following command shows the statistical output of the ldifxform tool. The description of the various counters is self-contained within the generated comments.
$ ldifxform -i /export/temp/two.ldif -c statsonly
The output includes a detailed count of DN structures and attribute value occurrences. Code Example 21-3 details the output.
Code Example 21-3 Statistics Only Output
# Basic statistics
#:linecount: 27
#:entrycount: 4
# Number of nonleaf entries (at least one subordinate)
#:nonleafcount: 1
# Number of leaf entries (no subordinates)
#:leafcount: 4
# Largest number of entries immediately below a nonleaf entry
#:maximmsubr: 4
# Number of levels in the DIT hierarchy
#:maxdepth: 3
# Largest number of AVAs in an RDN of an entry’s DN (normally 1)
#:maxrdns: 1
# Attribute types used in the LDIF file
# e is number entries containing this attr, v is total number of
# values, l is total length, m is max length of any one value,
# s is general syntax and x is extra encoding information.
#:attrstatsinfo: t=deletetimestamp e=1 v=1 l=3 m=3 i=1 s=int
#:attrstatsinfo: t=telephonenumber e=3 v=3 l=24 m=8 i=1 s=tel
#:attrstatsinfo: t=sn e=3 v=3 l=18 m=6 i=1 s=cis x=alphanumeric
#:attrstatsinfo: t=cn e=3 v=3 l=32 m=11 i=1 s=cis x=ascii
#:attrstatsinfo: t=objectclass e=4 v=7 l=38 m=11 i=2 s=cis
# x=alphanumeric
# Counts of values of specific attribute types
#:attrdomaininfo: t=objectclass v=1 nsTombstone
#:attrdomaininfo: t=objectclass v=3 person
#:attrdomaininfo: t=objectclass v=3 top
# Number of entries with the latest createTimestamp value (UTC)
#:lastaddcount: c=1 t=200
# Number of entries with the latest modifyTimestamp value (UTC)
#:lastmodcount: c=1 t=200