Sun Java System Directory Server Enterprise Edition 6.2 Deployment Planning Guide

Chapter 4 Defining Data Characteristics

The type of data in your directory determines how you structure the directory, who can access the data, and how access is granted. Data types can include, among others, user names, email addresses, telephone numbers, and information about groups to which users belong.

This chapter explains how to locate, categorize, structure, and organize data. It also explains how to map data to the Directory Server schema. This chapter covers the following topics:

Determining Data Sources and Ownership

The first step in categorizing existing data is to identify where that data comes from and who owns it.

Identifying Data Sources

To identify the data to be included in your directory, locate and analyze existing data sources.

Determining Data Ownership

Data ownership refers to the person or organization that is responsible for ensuring that data is up-to-date. During the data design phase, decide who can write data to the directory. Common strategies for determining data ownership include the following:

As you determine who can write to the data, you might find that multiple individuals require write access to the same information. For example, an information systems or directory management group should have write access to employee passwords. You might also want all employees to have write access to their own passwords. While you generally must give multiple people write access to the same information, try to keep this group small and easy to identify. Small groups help to ensure your data’s integrity.

For information about setting access control for your directory, see Chapter 6, Directory Server Access Control, in Sun Java System Directory Server Enterprise Edition 6.2 Administration Guide and How Directory Server Provides Access Control in Sun Java System Directory Server Enterprise Edition 6.2 Reference.

Distinguishing Between User and Configuration Data

To distinguish between data used to configure Directory Server and other Java Enterprise System servers and the actual user data stored in the directory, do the following:

Identifying Data From Disparate Data Sources

When determining data sources, ensure that you include data from other data sources, including legacy data sources. This data might not be stored in the directory. However, Directory Server might need to have some knowledge of, or control over, the data.

Directory Proxy Server provides a virtual directory feature that aggregates information, in real-time, from multiple data repositories. These repositories include LDAP directories, data that complies with the JDBC specification, and LDIF flat files.

The virtual directory supports complex filters that handle attributes from different data sources. It also supports modifications that combine attributes from different data sources.

During the data analysis phase, you might find that the same data is required by several applications, but in a different format. Instead of duplicating this information, it is preferable to have the applications transform it for their requirements.

Structuring Data With the Directory Information Tree

The directory information tree (DIT) provides a way to structure directory data so that the data can be referred to by client applications. The DIT interacts closely with other design decisions, including how you distribute, replicate, or control access to directory data.

DIT Terminology

A well-designed DIT provides the following:

The DIT structure follows the hierarchical LDAP model. The DIT organizes data, for example, by group, by people, or by geographical location. It also determines how data is partitioned across multiple servers.

DIT design has an impact on replication configuration and on how you use Directory Proxy Server to distribute data. If you want to replicate or distribute certain portions of a DIT, consider replication and the requirements of Directory Proxy Server at design time. Also, decide at design time whether you require access controls on branch points.

A DIT is defined in terms of suffixes, subsuffixes, and chained suffixes. A suffix is a branch or subtree whose entire contents are treated as a unit for administrative tasks. Indexing is defined for an entire suffix, and an entire suffix can be initialized in a single operation. A suffix is also usually the unit of replication. Data that you want to access and manage in the same way should be located in the same suffix. A suffix can be located at the root of the directory tree, where it is called a root suffix.

Because data can only be partitioned at the suffix level, an appropriate directory tree structure is required to spread data across multiple servers.

The following figure shows a directory with two root suffixes. Each suffix represents a separate corporate entity.

Figure 4–1 Two Root Suffixes in a Single Directory Server

Directory information tree with two root suffixes

A suffix might also be a branch of another suffix, in which case it is called a subsuffix. The parent suffix does not include the contents of the subsuffix for administrative operations. The subsuffix is managed independently of its parent. Because LDAP operation results contain no information about suffixes, directory clients are unaware of whether entries are part of root suffixes or subsuffixes.

The following figure shows a directory with a single root suffix and multiple subsuffixes for a large corporate entity.

Figure 4–2 One Root Suffix With Multiple Subsuffixes

Directory information tree with a single root suffix
and multiple subsuffixes

A suffix corresponds to an individual database within the server. However, databases and their files are managed internally by the server and database terminology is not used.

Chained suffixes create a virtual DIT by referencing suffixes on other servers. With chained suffixes, Directory Server performs the operation on the remote suffix. The directory then returns the result as if the operation had been performed locally. The location of the data is transparent. The client is unaware that the suffix is chained and that the data is retrieved from a remote server. A root suffix on one server can have subsuffixes that are chained to another server. In this scenario, the client is aware of a single tree structure.

In the special case of cascading chaining, the chained suffix might reference another chained suffix on the remote server, and so on. Each server forwards the operation and eventually returns the result to the server that handles the client’s request.

Designing the DIT

DIT design involves choosing a suffix to contain your data, determining the hierarchical relationship between data entries, and naming the entries in the DIT hierarchy. The following sections describe the design process in more detail.

Choosing a Suffix

The suffix is the name of the entry at the root of the DIT. If you have two or more DITs that do not have a natural common root, you can use multiple suffixes. The default Directory Server installation contains multiple suffixes. One suffix is used to store user data. The other suffixes are for data that is needed by internal directory operations, such as configuration information and directory schema.

All directory entries must be located below a common base entry, the suffix. Each suffix name must be as follows:

It is generally considered best practice to map your enterprise domain name to a Distinguished Name (DN). For example, an enterprise with the domain name example.com would use a DN of dc=example,dc=com.

Creating the DIT Structure and Naming Entries

The structure of a DIT can be flat or hierarchical. Although a flat tree is easier to manage, a degree of hierarchy might be required for data partitioning, replication management, and access control.

Branch Points and Naming Considerations

A branch point is a point at which you define a new subdivision within the DIT. When deciding on branch points, avoid potential problematic name changes. The likelihood of a name changing is proportional to the number of components in the name that can potentially change. The more hierarchical the DIT, the more components in the names, and the more likely the names are to change.

Use the following guidelines when defining and naming branch points:

Table 4–1 Traditional DN Branch Point Attributes

Attribute Name  

Definition  

c

A country name. 

o

An organization name. This attribute is typically used to represent a large divisional branching. The branching might include a corporate division, academic discipline, subsidiary, or other major branching within the enterprise. You should also use this attribute to represent a domain name. 

ou

An organizational unit. This attribute is typically used to represent a smaller divisional branching of your enterprise than an organization. Organizational units are generally subordinate to the preceding organization. 

st

A state or province name. 

l

A locality, such as a city, country, office, or facility name. 

dc

A domain component. 

Be consistent when choosing attributes for branch points. Some LDAP client applications might fail if the DN format is inconsistent across your DIT. If l (localityName) is subordinate to o (organizationName) in one part of your DIT, ensure that l is subordinate to o in all other parts of your directory.

Replication Considerations

When designing a DIT, consider which entries will be replicated to other servers. If you want to replicate a specific group of entries to the same set of servers, those entries should fall below a specific subtree. To describe the set of entries to be replicated, specify the DN at the top of the subtree. For more information about replicating entries, see Chapter 4, Directory Server Replication, in Sun Java System Directory Server Enterprise Edition 6.2 Reference.

Access Control Considerations

A DIT hierarchy can enable certain types of access control. As with replication, it is easier to group similar entries and to administer the entries from a single branch.

A hierarchical DIT also enables distributed administration. For example, you can use the DIT to give an administrator from the marketing department access to marketing entries, and an administrator from the sales department access to sales entries.

You can also set access controls based on directory content, rather than the DIT. Use the ACI filtered target mechanism to define a single access control rule. This rule states that a directory entry has access to all entries that contain a particular attribute value. For example, you can set an ACI filter that gives the sales administrator access to all entries that contain the attribute ou=Sales.

However, ACI filters can be difficult to manage. You must decide which method of access control is best suited to your directory: organizational branching in the DIT hierarchy, ACI filters, or a combination of the two.

Grouping Directory Data and Managing Attributes

The directory information tree organizes entries hierarchically. This hierarchy is a type of grouping mechanism. The hierarchy is not well suited for associations between dispersed entries, for organizations that change frequently, or for data that is repeated in many entries. Directory Server groups and roles offer more flexible associations between entries. The class of service (CoS) mechanism enables you to manage attributes so that the attributes are shared between entries. This sharing is done in a way that is invisible to applications.

These entry grouping and attribute management mechanisms are described in detail in Chapter 8, Directory Server Groups and Roles, in Sun Java System Directory Server Enterprise Edition 6.2 Reference and in Chapter 9, Directory Server Class of Service, in Sun Java System Directory Server Enterprise Edition 6.2 Reference.

This section provides an overview of the grouping mechanisms that is sufficient to design an administrative strategy. It does not explain how the mechanisms work or how to set them up.

The section is divided into the following topics:

Static, Dynamic, and Nested Groups

Directory Server distinguishes between two main kinds of groups; static groups and dynamic groups. Nested groups and mixed groups

Although groups may identify members anywhere in the directory, the group definitions themselves should be located under an appropriately named node such as ou=Groups. This makes them easy to find, for example, when defining access control instructions (ACIs) that grant or restrict access when the bind credentials are members of a group.

Static Groups

Static groups explicitly name their member entries. For example, a group of directory administrators would name the specific people who formed part of that group, as shown in the following illustration.

Figure shows logic of static group

The following LDIF extract shows how the members of this static group would be defined.

dn: cn=Directory Administrators, ou=Groups, dc=example,dc=com
...
member: uid=kvaughan, ou=People, dc=example,dc=com
member: uid=rdaugherty, ou=People, dc=example,dc=com
member: uid=hmiller, ou=People, dc=example,dc=com

Dynamic Groups

Dynamic groups specify a filter and all entries that match the filter are members of the group. These groups are dynamic because membership is defined each time the filter is evaluated.

Imagine, for example, that all management employees and their assistants were situated on the 3rd floor of your building, and that the room number of each employee commenced with the number of the floor. If you wanted to create a group containing just the employees on the third floor, you could use the room number to define just these employees, as shown in the following illustration.

Figure shows logic of dynamic group

The following LDIF extract shows how the members of this dynamic group would be defined.

dn: cn=3rd Floor, ou=Groups, dc=example,dc=com
...
memberURL: ldap:///dc=example,dc=com??sub?(roomnumber=3*)

Nested Groups

Nested groups use the DN of another group as the uniqueMember attribute of a static or dynamic group to place groups inside other groups. Directory Server also supports mixed groups, that is groups that reference individual entries, static groups, and dynamic groups.

Imagine for example that you wanted a group containing all directory administrators, and all management employees and their assistants. You could use a combination of the two groups defined earlier to create one nested group, as shown in the following illustration.

Figure shows logic of nested group

The following LDIF extract shows how the members of this nested group would be defined.

dn: cn=Admins and 3rd Floor, ou=Groups, dc=example,dc=com
...
member: cn=Directory Administrators, ou=Groups, dc=example,dc=com
member: cn=3rd Floor, ou=Groups, dc=example,dc=com

Nested groups are not the most efficient grouping mechanism. Dynamic nested groups incur an even greater performance cost. To avoid these performance problems, consider using roles instead.

Managed, Filtered, and Nested Roles

Roles are an entry grouping mechanism. Roles enable you to determine role membership as soon as an entry is retrieved from the directory. Each role has members, or entries that possess the role. As with groups, you can specify role members explicitly or dynamically.

Directory Server supports the following three types of roles:

Deciding Between Groups and Roles

The functionality of the groups and roles mechanisms overlap somewhat. Both mechanisms have advantages and disadvantages. Generally, the roles mechanism is designed to provide frequently required functionality more efficiently. Because the choice of a grouping mechanism influences server complexity and determines how clients process membership information, you must plan your grouping mechanism carefully. To decide which mechanism is more suitable, you need to understand the typical membership queries and management operations that are performed.

Advantages of the Groups Mechanism

Groups have the following advantages:

Advantages of the Roles Mechanism

Roles have the following advantages:

Restricting Permissions on Roles

Be aware of the following issues when using roles:

Managing Attributes With Class of Service

The Class of Service (CoS) mechanism allows attributes to be shared between entries. Like the role mechanism, CoS generates virtual attributes on the entries as the entries are retrieved. CoS does not define membership, but it does allow related entries to share data for coherency and space considerations. CoS values are calculated dynamically when the values are requested. CoS functionality and the various types of CoS are described in detail in the Sun Java System Directory Server Enterprise Edition 6.2 Reference.

The following sections examine the ways in which you can use the CoS functionality as intended, while avoiding performance pitfalls:


Note –

CoS generation always impacts performance. Client applications that search for more attributes than they actually need can compound the problem.

If you can influence how client applications are written, remind developers that client applications perform much better when looking up only those attribute values that they actually need.


Using CoS When Many Entries Share the Same Value

CoS provides substantial benefits for relatively low cost when you need the same attribute value to appear on numerous entries in a subtree.

Imagine, for example, a directory for MyCompany, Inc. in which every user entry under ou=People has a companyName attribute. Contractors have real values for companyName attributes on their entries, but all regular employees have a single CoS-generated value, MyCompany, Inc., for companyName. The following figure demonstrates this example with pointer CoS. Notice that CoS generates companyName values for all permanent employees without overriding real, not CoS-generated, companyName values stored for contractor employees. The company name is generated only for those entries for which companyName is an allowed attribute.

Figure 4–3 Generating CompanyName With Pointer CoS

Figure shows the CompanyName attribute generated with
Pointer CoS.

In cases where many entries share the same value, pointer CoS works particularly well. The ease of maintaining companyName for permanent employees offsets the additional processing cost of generating attribute values. Deep directory information trees (DITs) tend to bring together entries that share common characteristics. Pointer CoS can be used in deep DITs to generate common attribute values by placing CoS definitions at appropriate branches in the tree.

Using CoS When Entries Have Natural Relationships

CoS also provides substantial data administration benefits when directory data has natural relationships.

Consider an enterprise directory in which every employee has a manager. Every employee shares a mail stop and fax number with the nearest administrative assistant. Figure 4–4 demonstrates the use of indirect CoS to retrieve the department number from the manager entry. In Figure 4–5, the mail stop and fax number are retrieved from the administrative assistant entry.

Figure 4–4 Generating DepartmentNumber With Indirect CoS

Figure shows the DepartmentNumber attribute generated
with Indirect CoS.

In this implementation, the manager’s entry has a real value for departmentNumber, and this real value overrides any generated value. Directory Server does not generate attribute values from CoS-generated attribute values. Thus, in the Figure 4–4 example, the department number attribute value needs to be managed only on the manager's entry. Likewise, for the example shown in Figure 4–5, mail stop and fax number attributes need to be managed only on the administrative assistant’s entry.

Figure 4–5 Generating Mail Stop and Fax Number With Indirect CoS

Figure shows Mail Stop and Fax Number attributes generated
with Indirect CoS.

A single CoS definition entry can be used to exploit relationships such as these for many different entries in the directory.

Another natural relationship is service level. Consider an Internet service provider that offers customers standard, silver, gold, and platinum packages. A customer’s disk quota, number of mailboxes, and rights to prepaid support levels depend on the service level purchased. The following figure demonstrates how a classic CoS scheme enables this functionality.

Figure 4–6 Generating Servic-Level Data With Classic CoS

Figure shows service level data generated with Classic
CoS.

One CoS definition might be associated with multiple CoS template entries.

Avoiding Excessive CoS Definitions

Directory Server optimizes CoS when one classic CoS definition entry is associated with multiple CoS template entries. Directory Server does not optimize CoS if many CoS definitions potentially apply. Instead, Directory Server checks each CoS definition to determine whether the definition applies. This behavior leads to performance problems if you have thousands of CoS definitions.

This situation can arise in a modified version of the example shown in Figure 4–6. Consider an Internet service provider that offers customers delegated administration of their customers’ service level. Each customer provides definition entries for standard, silver, gold, and platinum service levels. Ramping up to 1000 customers means creating 1000 classic CoS definitions. Directory Server performance would be affected as it runs through the list of 1000 CoS definitions to determine which apply. If you must use CoS in this sort of situation, consider indirect CoS. In indirect CoS, customers’ entries identify the entries that define their class of service allotments.

When you start approaching the limit of having different CoS schemes for every target entry or two, you are better off updating the real values. You then achieve better performance by reading real, not CoS-generated values.

Designing a Directory Schema

The directory schema describes the types of data that can be stored in a directory. During schema design, each data element is mapped to an LDAP attribute. Related elements are gathered into LDAP object classes. A well-designed schema helps maintain data integrity by imposing constraints on the size, range, and format of data values. You decide what types of entries your directory contains and the attributes that are available to each entry.

The predefined schema that is included with Directory Server contains the Internet Engineering Task Force (IETF) standard LDAP schema. The schema contains additional application-specific schema to support the features of the server. It also contains Directory Server-specific schema extensions. While this schema meets most directory requirements, you might need to extend the schema with new object classes and attributes that are specific to your directory.

Schema Design Process

Schema design involves doing the following:

Where possible, use the existing schema elements that are defined in the default Directory Server schema. Standard schema elements help to ensure compatibility with directory-enabled applications. Because the schema is based on the LDAP standard, it has been reviewed and agreed to by a large number of directory users.

Maintaining Data Consistency

Consistent data assists LDAP client applications in locating directory entries. For each type of information that is stored in the directory, select the required object classes and attributes to support that information. Always use the same object classes and attributes. If you use schema objects inconsistently, it is difficult to locate information.

You can maintain schema consistency in the following ways:

Other Directory Data Resources

For more information about the standard LDAP schema, and about designing a DIT, see the following sites:

For a complete list of the RFCs and standards supported by Directory Server Enterprise Edition, see Appendix A, Standards and RFCs Supported by Directory Server Enterprise Edition, in Sun Java System Directory Server Enterprise Edition 6.2 Evaluation Guide.