Sun Java System Directory Server Enterprise Edition 6.1 Deployment Planning Guide

Chapter 4 Defining Data Characteristics

The type of data in your directory determines how you structure the directory, who can access the data, and how access is granted. Data types can include, among others, user names, email addresses, telephone numbers, and information about groups to which users belong.

This chapter explains how to locate, categorize, structure, and organize data. It also explains how to map data to the Directory Server schema. This chapter covers the following topics:

Determining Data Sources and Ownership

The first step in categorizing existing data is to identify where that data comes from and who owns it.

Identifying Data Sources

To identify the data to be included in your directory, locate and analyze existing data sources.

Determining Data Ownership

Data ownership refers to the person or organization that is responsible for ensuring that data is up-to-date. During the data design phase, decide who can write data to the directory. Common strategies for determining data ownership include the following:

As you determine who can write to the data, you might find that multiple individuals require write access to the same information. For example, an information systems or directory management group should have write access to employee passwords. You might also want all employees to have write access to their own passwords. While you generally must give multiple people write access to the same information, try to keep this group small and easy to identify. Small groups help to ensure your data’s integrity.

For information about setting access control for your directory, see Chapter 6, Directory Server Access Control, in Sun Java System Directory Server Enterprise Edition 6.1 Administration Guide and How Directory Server Provides Access Control in Sun Java System Directory Server Enterprise Edition 6.1 Reference.

Distinguishing Between User and Configuration Data

To distinguish between data used to configure Directory Server and other Java Enterprise System servers and the actual user data stored in the directory, do the following:

Identifying Data From Disparate Data Sources

When determining data sources, ensure that you include data from other data sources, including legacy data sources. This data might not be stored in the directory. However, Directory Server might need to have some knowledge of, or control over, the data.

Directory Proxy Server provides a virtual directory feature that aggregates information, in real-time, from multiple data repositories. These repositories include LDAP directories, data that complies with the JDBC specification, and LDIF flat files.

The virtual directory supports complex filters that handle attributes from different data sources. It also supports modifications that combine attributes from different data sources.

During the data analysis phase, you might find that the same data is required by several applications, but in a different format. Instead of duplicating this information, it is preferable to have the applications transform it for their requirements.

Structuring Data With the Directory Information Tree

The directory information tree (DIT) provides a way to structure directory data so that the data can be referred to by client applications. The DIT interacts closely with other design decisions, including how you distribute, replicate, or control access to directory data.

DIT Terminology

A well-designed DIT provides the following:

The DIT structure follows the hierarchical LDAP model. The DIT organizes data, for example, by group, by people, or by geographical location. It also determines how data is partitioned across multiple servers.

DIT design has an impact on replication configuration and on how you use Directory Proxy Server to distribute data. If you want to replicate or distribute certain portions of a DIT, consider replication and the requirements of Directory Proxy Server at design time. Also, decide at design time whether you require access controls on branch points.

A DIT is defined in terms of suffixes, subsuffixes, and chained suffixes. A suffix is a branch or subtree whose entire contents are treated as a unit for administrative tasks. Indexing is defined for an entire suffix, and an entire suffix can be initialized in a single operation. A suffix is also usually the unit of replication. Data that you want to access and manage in the same way should be located in the same suffix. A suffix can be located at the root of the directory tree, where it is called a root suffix.

Because data can only be partitioned at the suffix level, an appropriate directory tree structure is required to spread data across multiple servers.

The following figure shows a directory with two root suffixes. Each suffix represents a separate corporate entity.

Figure 4–1 Two Root Suffixes in a Single Directory Server

Directory information tree with two root suffixes

A suffix might also be a branch of another suffix, in which case it is called a subsuffix. The parent suffix does not include the contents of the subsuffix for administrative operations. The subsuffix is managed independently of its parent. Because LDAP operation results contain no information about suffixes, directory clients are unaware of whether entries are part of root suffixes or subsuffixes.

The following figure shows a directory with a single root suffix and multiple subsuffixes for a large corporate entity.

Figure 4–2 One Root Suffix With Multiple Subsuffixes

Directory information tree with a single root suffix
and multiple subsuffixes

A suffix corresponds to an individual database within the server. However, databases and their files are managed internally by the server and database terminology is not used.

Chained suffixes create a virtual DIT by referencing suffixes on other servers. With chained suffixes, Directory Server performs the operation on the remote suffix. The directory then returns the result as if the operation had been performed locally. The location of the data is transparent. The client is unaware that the suffix is chained and that the data is retrieved from a remote server. A root suffix on one server can have subsuffixes that are chained to another server. In this scenario, the client is aware of a single tree structure.

In the special case of cascading chaining, the chained suffix might reference another chained suffix on the remote server, and so on. Each server forwards the operation and eventually returns the result to the server that handles the client’s request.

Designing the DIT

DIT design involves choosing a suffix to contain your data, determining the hierarchical relationship between data entries, and naming the entries in the DIT hierarchy. The following sections describe the design process in more detail.

Choosing a Suffix

The suffix is the name of the entry at the root of the DIT. If you have two or more DITs that do not have a natural common root, you can use multiple suffixes. The default Directory Server installation contains multiple suffixes. One suffix is used to store user data. The other suffixes are for data that is needed by internal directory operations, such as configuration information and directory schema.

All directory entries must be located below a common base entry, the suffix. Each suffix name must be as follows:

It is generally considered best practice to map your enterprise domain name to a Distinguished Name (DN). For example, an enterprise with the domain name example.com would use a DN of dc=example,dc=com.

Creating the DIT Structure and Naming Entries

The structure of a DIT can be flat or hierarchical. Although a flat tree is easier to manage, a degree of hierarchy might be required for data partitioning, replication management, and access control.

Branch Points and Naming Considerations

A branch point is a point at which you define a new subdivision within the DIT. When deciding on branch points, avoid potential problematic name changes. The likelihood of a name changing is proportional to the number of components in the name that can potentially change. The more hierarchical the DIT, the more components in the names, and the more likely the names are to change.

Use the following guidelines when defining and naming branch points:

Table 4–1 Traditional DN Branch Point Attributes

Attribute Name  

Definition  

c

A country name. 

o

An organization name. This attribute is typically used to represent a large divisional branching. The branching might include a corporate division, academic discipline, subsidiary, or other major branching within the enterprise. You should also use this attribute to represent a domain name. 

ou

An organizational unit. This attribute is typically used to represent a smaller divisional branching of your enterprise than an organization. Organizational units are generally subordinate to the preceding organization. 

st

A state or province name. 

l

A locality, such as a city, country, office, or facility name. 

dc

A domain component. 

Be consistent when choosing attributes for branch points. Some LDAP client applications might fail if the DN format is inconsistent across your DIT. If l (localityName) is subordinate to o (organizationName) in one part of your DIT, ensure that l is subordinate to o in all other parts of your directory.

Replication Considerations

When designing a DIT, consider which entries will be replicated to other servers. If you want to replicate a specific group of entries to the same set of servers, those entries should fall below a specific subtree. To describe the set of entries to be replicated, specify the DN at the top of the subtree. For more information about replicating entries, see Chapter 4, Directory Server Replication, in Sun Java System Directory Server Enterprise Edition 6.1 Reference.

Access Control Considerations

A DIT hierarchy can enable certain types of access control. As with replication, it is easier to group similar entries and to administer the entries from a single branch.

A hierarchical DIT also enables distributed administration. For example, you can use the DIT to give an administrator from the marketing department access to marketing entries, and an administrator from the sales department access to sales entries.

You can also set access controls based on directory content, rather than the DIT. Use the ACI filtered target mechanism to define a single access control rule. This rule states that a directory entry has access to all entries that contain a particular attribute value. For example, you can set an ACI filter that gives the sales administrator access to all entries that contain the attribute ou=Sales.

However, ACI filters can be difficult to manage. You must decide which method of access control is best suited to your directory: organizational branching in the DIT hierarchy, ACI filters, or a combination of the two.

Designing a Directory Schema

The directory schema describes the types of data that can be stored in a directory. During schema design, each data element is mapped to an LDAP attribute. Related elements are gathered into LDAP object classes. A well-designed schema helps maintain data integrity by imposing constraints on the size, range, and format of data values. You decide what types of entries your directory contains and the attributes that are available to each entry.

The predefined schema that is included with Directory Server contains the Internet Engineering Task Force (IETF) standard LDAP schema. The schema contains additional application-specific schema to support the features of the server. It also contains Directory Server-specific schema extensions. While this schema meets most directory requirements, you might need to extend the schema with new object classes and attributes that are specific to your directory.

Schema Design Process

Schema design involves doing the following:

Where possible, use the existing schema elements that are defined in the default Directory Server schema. Standard schema elements help to ensure compatibility with directory-enabled applications. Because the schema is based on the LDAP standard, it has been reviewed and agreed to by a large number of directory users.

Maintaining Data Consistency

Consistent data assists LDAP client applications in locating directory entries. For each type of information that is stored in the directory, select the required object classes and attributes to support that information. Always use the same object classes and attributes. If you use schema objects inconsistently, it is difficult to locate information.

You can maintain schema consistency in the following ways:

Other Directory Data Resources

For more information about the standard LDAP schema, and about designing a DIT, see the following sites:

For a complete list of the RFCs and standards supported by Directory Server Enterprise Edition, see Appendix A, Standards and RFCs Supported by Directory Server Enterprise Edition, in Sun Java System Directory Server Enterprise Edition 6.1 Evaluation Guide.