Sun Java System Directory Server Enterprise Edition 6.0 Deployment Planning Guide

Chapter 4 Defining Data Characteristics

The type of data in your directory determines how you structure the directory, who can access the data, and how access is granted. Data types can include, among others, user names, email addresses, telephone numbers, and information about groups to which users belong.

This chapter explains how to locate, categorize, structure, and organize data. It also explains how to map data to the Directory Server schema. This chapter covers the following topics:

Determining Data Sources and Ownership

The first step in categorizing existing data is to identify where that data comes from and who owns it.

Identifying Data Sources

To identify the data to be included in your directory, locate and analyze existing data sources.

Identify organizations that provide information.

Locate all the organizations that manage information essential to your enterprise. Typically, these organizations include your information services, human resources, payroll, and accounting departments.
Identify tools and processes that are information sources.

Common sources for information include the following:
- Networking operating systems, such as Windows, Novell Netware, and UNIX® NIS
- Email systems
- Security systems
- PBX or telephone switching systems
- Human resources applications
Determine how centralizing each piece of data affects the management of data.

Centralized data management might require new tools and new processes. Issues can arise when centralization requires increasing staff in some organizations and decreasing staff in others.

Determining Data Ownership

Data ownership refers to the person or organization that is responsible for ensuring that data is up-to-date. During the data design phase, decide who can write data to the directory. Common strategies for determining data ownership include the following:

Allow read-only access to the directory for everyone except a small group of directory content managers.
Allow individual users to manage strategic subsets of information.

These subsets of information might include their passwords, descriptive information about themselves, and their role within the organization.
Allow a person’s manager to write to some strategic subset of that person’s information, such as contact information or job title.
Allow an organization’s administrator to create and manage entries for that organization.

Organization administrators in effect become your directory content managers.
Create roles that give groups of people read or write access privileges.

For example, you might create roles for human resources, finance, or accounting. Allow each of these roles to have read access, write access, or both to the data needed by the group. This data might include salary information, government identification number, and home phone numbers and address.

For more information about roles and grouping entries, see Chapter 9, Directory Server Groups, Roles, and CoS, in Sun Java System Directory Server Enterprise Edition 6.0 Administration Guide and Chapter 8, Directory Server Groups and Roles, in Sun Java System Directory Server Enterprise Edition 6.0 Reference.

As you determine who can write to the data, you might find that multiple individuals require write access to the same information. For example, an information systems or directory management group should have write access to employee passwords. You might also want all employees to have write access to their own passwords. While you generally must give multiple people write access to the same information, try to keep this group small and easy to identify. Small groups help to ensure your data’s integrity.

For information about setting access control for your directory, see Chapter 6, Directory Server Access Control, in Sun Java System Directory Server Enterprise Edition 6.0 Administration Guide and How Directory Server Provides Access Control in Sun Java System Directory Server Enterprise Edition 6.0 Reference.

Distinguishing Between User and Configuration Data

To distinguish between data used to configure Directory Server and other Java Enterprise System servers and the actual user data stored in the directory, do the following:

Provide different backup strategies for user and configuration data.
Provide different high availability standards for user and configuration data.
Shut down, restore, and power up configuration servers quickly.
Keep configuration servers up while performing maintenance on other Directory Server instances.

Identifying Data From Disparate Data Sources

When determining data sources, ensure that you include data from other data sources, including legacy data sources. This data might not be stored in the directory. However, Directory Server might need to have some knowledge of, or control over, the data.

Directory Proxy Server provides a virtual directory feature that aggregates information, in real-time, from multiple data repositories. These repositories include LDAP directories, data that complies with the JDBC specification, and LDIF flat files.

The virtual directory supports complex filters that handle attributes from different data sources. It also supports modifications that combine attributes from different data sources.

During the data analysis phase, you might find that the same data is required by several applications, but in a different format. Instead of duplicating this information, it is preferable to have the applications transform it for their requirements.

Structuring Data With the Directory Information Tree

The directory information tree (DIT) provides a way to structure directory data so that the data can be referred to by client applications. The DIT interacts closely with other design decisions, including how you distribute, replicate, or control access to directory data.

DIT Terminology

A well-designed DIT provides the following:

Simplified directory data maintenance
Flexibility in creating replication policies and access controls
Support for the applications that use the directory
Simplified directory navigation for users

The DIT structure follows the hierarchical LDAP model. The DIT organizes data, for example, by group, by people, or by geographical location. It also determines how data is partitioned across multiple servers.

DIT design has an impact on replication configuration and on how you use Directory Proxy Server to distribute data. If you want to replicate or distribute certain portions of a DIT, consider replication and the requirements of Directory Proxy Server at design time. Also, decide at design time whether you require access controls on branch points.

A DIT is defined in terms of suffixes, subsuffixes, and chained suffixes. A suffix is a branch or subtree whose entire contents are treated as a unit for administrative tasks. Indexing is defined for an entire suffix, and an entire suffix can be initialized in a single operation. A suffix is also usually the unit of replication. Data that you want to access and manage in the same way should be located in the same suffix. A suffix can be located at the root of the directory tree, where it is called a root suffix.

Because data can only be partitioned at the suffix level, an appropriate directory tree structure is required to spread data across multiple servers.

The following figure shows a directory with two root suffixes. Each suffix represents a separate corporate entity.

Figure 4–1 Two Root Suffixes in a Single Directory Server

Directory information tree with two root suffixes

A suffix might also be a branch of another suffix, in which case it is called a subsuffix. The parent suffix does not include the contents of the subsuffix for administrative operations. The subsuffix is managed independently of its parent. Because LDAP operation results contain no information about suffixes, directory clients are unaware of whether entries are part of root suffixes or subsuffixes.

The following figure shows a directory with a single root suffix and multiple subsuffixes for a large corporate entity.

Figure 4–2 One Root Suffix With Multiple Subsuffixes

Directory information tree with a single root suffix
and multiple subsuffixes

A suffix corresponds to an individual database within the server. However, databases and their files are managed internally by the server and database terminology is not used.

Chained suffixes create a virtual DIT by referencing suffixes on other servers. With chained suffixes, Directory Server performs the operation on the remote suffix. The directory then returns the result as if the operation had been performed locally. The location of the data is transparent. The client is unaware that the suffix is chained and that the data is retrieved from a remote server. A root suffix on one server can have subsuffixes that are chained to another server. In this scenario, the client is aware of a single tree structure.

In the special case of cascading chaining, the chained suffix might reference another chained suffix on the remote server, and so on. Each server forwards the operation and eventually returns the result to the server that handles the client’s request.

Designing the DIT

DIT design involves choosing a suffix to contain your data, determining the hierarchical relationship between data entries, and naming the entries in the DIT hierarchy. The following sections describe the design process in more detail.

Choosing a Suffix

The suffix is the name of the entry at the root of the DIT. If you have two or more DITs that do not have a natural common root, you can use multiple suffixes. The default Directory Server installation contains multiple suffixes. One suffix is used to store user data. The other suffixes are for data that is needed by internal directory operations, such as configuration information and directory schema.

All directory entries must be located below a common base entry, the suffix. Each suffix name must be as follows:

Globally unique
Static, so that the name rarely changes
Short, so that entries beneath the suffix are easier to read online
Easy for a person to type and remember

It is generally considered best practice to map your enterprise domain name to a Distinguished Name (DN). For example, an enterprise with the domain name example.com would use a DN of dc=example,dc=com.

Creating the DIT Structure and Naming Entries

The structure of a DIT can be flat or hierarchical. Although a flat tree is easier to manage, a degree of hierarchy might be required for data partitioning, replication management, and access control.

Branch Points and Naming Considerations

A branch point is a point at which you define a new subdivision within the DIT. When deciding on branch points, avoid potential problematic name changes. The likelihood of a name changing is proportional to the number of components in the name that can potentially change. The more hierarchical the DIT, the more components in the names, and the more likely the names are to change.

Use the following guidelines when defining and naming branch points:

Branch your tree to represent only the largest organizational subdivisions in your enterprise.

Limit branch points to divisions, such as Corporate Information Services, Customer Support, Sales, and Professional Services. Make sure that your divisions are stable. Do not perform this kind of branching if your enterprise reorganizes frequently.
Use functional or generic names rather than actual organizational names.

Names change and you do not want to have to change your DIT every time your enterprise renames its divisions. Instead, use generic names that represent the function of the organization. For example, use Engineering instead of Widget Research and Development.
If you have multiple organizations that perform similar functions, create a single branch point for that function instead of branching based on divisional lines.

For example, even if you have multiple marketing organizations that are responsible for a specific product line, create a single Marketing subtree. All marketing entries then belong to that tree.
Try to use only the traditional branch point attributes that are shown in the following table.

Traditional attributes increase the likelihood of retaining compatibility with third-party LDAP client applications. In addition, traditional attributes are known to the default directory schema, which simplifies the construction of entries for the branch distinguished name (DN).
Branch according to the type of data stored in the directory.

For example, you might create a separate branch for people, groups, service, and devices.

Table 4–1 Traditional DN Branch Point Attributes


Attribute Name	Definition
`c`	A country name.
`o`	An organization name. This attribute is typically used to represent a large divisional branching. The branching might include a corporate division, academic discipline, subsidiary, or other major branching within the enterprise. You should also use this attribute to represent a domain name.
`ou`	An organizational unit. This attribute is typically used to represent a smaller divisional branching of your enterprise than an organization. Organizational units are generally subordinate to the preceding organization.
`st`	A state or province name.
`l`	A locality, such as a city, country, office, or facility name.
`dc`	A domain component.

Be consistent when choosing attributes for branch points. Some LDAP client applications might fail if the DN format is inconsistent across your DIT. If l (localityName) is subordinate to o (organizationName) in one part of your DIT, ensure that l is subordinate to o in all other parts of your directory.

Replication Considerations

When designing a DIT, consider which entries will be replicated to other servers. If you want to replicate a specific group of entries to the same set of servers, those entries should fall below a specific subtree. To describe the set of entries to be replicated, specify the DN at the top of the subtree. For more information about replicating entries, see Chapter 4, Directory Server Replication, in Sun Java System Directory Server Enterprise Edition 6.0 Reference.

Access Control Considerations

A DIT hierarchy can enable certain types of access control. As with replication, it is easier to group similar entries and to administer the entries from a single branch.

A hierarchical DIT also enables distributed administration. For example, you can use the DIT to give an administrator from the marketing department access to marketing entries, and an administrator from the sales department access to sales entries.

You can also set access controls based on directory content, rather than the DIT. Use the ACI filtered target mechanism to define a single access control rule. This rule states that a directory entry has access to all entries that contain a particular attribute value. For example, you can set an ACI filter that gives the sales administrator access to all entries that contain the attribute ou=Sales.

However, ACI filters can be difficult to manage. You must decide which method of access control is best suited to your directory: organizational branching in the DIT hierarchy, ACI filters, or a combination of the two.

Designing a Directory Schema

The directory schema describes the types of data that can be stored in a directory. During schema design, each data element is mapped to an LDAP attribute. Related elements are gathered into LDAP object classes. A well-designed schema helps maintain data integrity by imposing constraints on the size, range, and format of data values. You decide what types of entries your directory contains and the attributes that are available to each entry.

The predefined schema that is included with Directory Server contains the Internet Engineering Task Force (IETF) standard LDAP schema. The schema contains additional application-specific schema to support the features of the server. It also contains Directory Server-specific schema extensions. While this schema meets most directory requirements, you might need to extend the schema with new object classes and attributes that are specific to your directory.

Schema Design Process

Schema design involves doing the following:

Mapping your data to the default schema.

To map existing data to the default schema, identify the type of object that each data element describes then select a similar object class from the default schema. Use the common object classes, such as groups, people, and organizations. Select a similar attribute from the matching object class that best matches the data element.
Identifying unmatched data.
Extending the default schema to define new elements to meet your remaining needs.

If data elements exist that do not match the object classes and attributes defined by the default directory schema, you can customize the schema. You can also extend the schema to impose additional constraints on the existing schema. For more information, see About Custom Schema in Sun Java System Directory Server Enterprise Edition 6.0 Administration Guide.
Planning for schema maintenance.

Where possible, use the existing schema elements that are defined in the default Directory Server schema. Standard schema elements help to ensure compatibility with directory-enabled applications. Because the schema is based on the LDAP standard, it has been reviewed and agreed to by a large number of directory users.

Maintaining Data Consistency

Consistent data assists LDAP client applications in locating directory entries. For each type of information that is stored in the directory, select the required object classes and attributes to support that information. Always use the same object classes and attributes. If you use schema objects inconsistently, it is difficult to locate information.

You can maintain schema consistency in the following ways:

Use schema checking to ensure that attributes and object classes conform to the schema rules.

For more information about schema checking, see Chapter 11, Directory Server Schema, in Sun Java System Directory Server Enterprise Edition 6.0 Administration Guide.
Select and apply a consistent data format.

The LDAP schema allows you to place any data on any attribute value. However, you should store data consistently in the DIT by selecting a format appropriate for your LDAP client applications and directory users. With the LDAP protocol and Directory Server, you must represent data using the data formats specified in RFC 4517.

Other Directory Data Resources

For more information about the standard LDAP schema, and about designing a DIT, see the following sites:

RFC 4510: Lightweight Directory Access Protocol (LDAP): Technical Specification Road Map http://www.ietf.org/rfc/rfc4510.txt
RFC 4511: Lightweight Directory Access Protocol (LDAP): The Protocol http://www.ietf.org/rfc/rfc4511.txt
RFC 4512: Lightweight Directory Access Protocol (LDAP): Directory Information Models http://www.ietf.org/rfc/rfc4512.txt
RFC 4513: Lightweight Directory Access Protocol (LDAP): Authentication Methods and Security Mechanisms http://www.ietf.org/rfc/rfc4513.txt
RFC 4514: Lightweight Directory Access Protocol (LDAP): String Representation of Distinguished Names http://www.ietf.org/rfc/rfc4514.txt
RFC 4515: Lightweight Directory Access Protocol (LDAP): String Representation of Search Filters http://www.ietf.org/rfc/rfc4515.txt
RFC 4516: Lightweight Directory Access Protocol (LDAP): Uniform Resource Locator http://www.ietf.org/rfc/rfc4516.txt
RFC 4517: Lightweight Directory Access Protocol (LDAP): Syntaxes and Matching Rules http://www.ietf.org/rfc/rfc4517.txt
RFC 4518: Lightweight Directory Access Protocol (LDAP): Internationalized String Preparation http://www.ietf.org/rfc/rfc4518.txt
RFC 4519: Lightweight Directory Access Protocol (LDAP): Schema for User Applications http://www.ietf.org/rfc/rfc4519.txt
Understanding and Deploying LDAP Directory Services. T. Howes, M. Smith, G. Good. Macmillan Technical Publishing, 1999

For a complete list of the RFCs and standards supported by Directory Server Enterprise Edition, see Appendix A, Standards and RFCs Supported by Directory Server Enterprise Edition, in Sun Java System Directory Server Enterprise Edition 6.0 Evaluation Guide.