Data Distribution Using the Proxy

The distribution feature in Oracle Unified Directory proxy addresses the challenge of large deployments, such as horizontal scalability, where all the entries cannot be held on a single data source, or LDAP server. Using distribution can also help you scale the number of updates per second.

When using the Oracle Unified Directory proxy with distribution, you must first split the data in your database into smaller chunks. To split the data, you can use the split-ldif command. These chunks of data are called partitions. Typically, each partition is stored on a separate server.

The split of the data is based on one of the following distribution algorithms:

Capacity. Entries are added to a partition based on the capacity of each partition. This algorithm is used for Add requests only. All other requests are distributed by the global index catalog or by a broadcast.
Numeric. Entries are split into partitions and distributed based on the numeric value of the naming attribute (for example uid).
Lexico. Entries are split into partitions and distributed based on the alphabetical value of the naming attribute (for example cn).
DN pattern. Entries are split into partitions and distributed based on the pattern (value) of the entry DN.

The type of data distribution you choose will depend on how the data in your directory service is organized. Numeric and lexico distribution have a very specific format for distribution. DN pattern can be adapted to match an existing data distribution model.

If a client request (except Add) cannot be linked to one of the distribution partitions, the Oracle Unified Directory proxy broadcasts the incoming request to all the partitions, unless a global index catalog has been configured.

However, if the request is clearly identified as outside the scope of the distribution, the request is returned with an error indicating that the entry does not exist. For example, if the distribution partitions includes data with uid's from 1–100 (partition1) and 100–200 (partition2) but you run a search where the base DN is uid=222,ou=people,dc=example,dc=com, the Oracle Unified Directory proxy will indicate that the entry does not exist.

Moreover, for the numeric and lexico algorithms, it is the first RDN after the distribution base DN that is used to treat a request. For example, the following search will return an error, as the uid is not the first RDN after the distribution base DN, for example ou=people,dc=example,dc=com.

$ ldapsearch -b "uid=1010,o=admin,ou=people,dc=example,dc=com" "objectclass=*"

Consider the number of partitions carefully. When you define the number of partitions you want in your deployment, you should note that you cannot split and redistribute the data into new partitions without downtime. You can, however, add a new partition with data that has entries outside the initial ones.

For example, if the initial partitions cover data with uids from 1–100 (partition1) and 100–200 (partition2), you can later add a partition3 which includes uids from 200–300. However, you cannot easily split partition1 and partition2 so that partition1 includes uids 1–150 and partition2 includes uids 150–300, for example. Splitting partitions is essentially like reconfiguring a new distribution deployment.

Numeric Distribution

With a distribution using numeric algorithm, the Oracle Unified Directory proxy forwards requests to one of the partitions, based on the numeric value of the first RDN after the distribution base DN in the request. When you set up distribution with numeric algorithm, you split the data of your database into different partitions based on a numerical value of the attribute of your choice, as long as the attribute represents a numerical string. The proxy then forwards all client requests to the appropriate partition, using the same numeric algorithm.

For example, you could split your data into two partitions based on the uid of the entries, as illustrated in Figure 5-7.

Figure 5-7 Numeric Distribution Example

Numeric distribution over two partitions, based on uid.

In this example, a search for an entry with a uid of 1111 is sent to Partition 1, while a search for an entry with a uid of 2345 is sent to Partition 2. Any request for an entry with a uid outside the scope of the partitions defined will indicate that no such entry exists.

Note - The upper boundary limit of a distribution algorithm is exclusive. This means that a search for uid 3000 in the example above returns an error indicating that the entry does not exist.

Example 5-1 Examples of Searches Using Numeric Distribution Algorithm

The following search will be successful:

$ ldapsearch -b "uid=1010,ou=people,cn=example,cn=com" "cn=Ben"

However, the following searches will indicate that the entry does not exist (with result code 32):

$ ldapsearch -b "uid=1010,o=admin,ou=people,cn=example,cn=com" "objectclass=*"

$ ldapsearch -b "uid=99,ou=people,cn=example,cn=com" "objectclass=*"

The following search will be broadcast, as the Oracle Unified Directory proxy cannot determine the partition to which the entry belongs, using the distribution algorithm defined above:

$ ldapsearch -b "ou=people,cn=example,cn=com" "uid=*"

Lexico Distribution

With a distribution using lexico algorithm, the Oracle Unified Directory proxy forwards requests to one of the partitions, based on the alphabetical value of the first RDN after the distribution base DN in the request. When you set up distribution with lexico algorithm, you split the data of your database into different partitions, based on an alphabetical value of the attribute of your choice. The proxy then forwards all client requests to the appropriate partition, using the same algorithm.

For example, you could split your data into two partitions based on the cn of the entries, as illustrated in Figure 5-8.

Figure 5-8 Lexico Distribution Example

Lexico distribution over two partitions, based on alphabetical entries using cn.

In this example, any requests for an entry with a cn starting with B such as Ben are sent to Partition 1, while requests for an entry with a cn from M–Y are sent to Partition 2.

Note - The upper boundary limit of a distribution algorithm is exclusive. This means that a search for cn= Zachary in the example above will indicate that no such entry is found. In order to include entries starting with Z in the search boundaries, then you should use the unlimited keyword. For example, cn=[M..unlimited[ will include all entries beyond M.

Example 5-2 Examples of Searches Using Lexico Distribution Algorithm

The following search will be successful:

$ ldapsearch -b "cn=Ben,ou=people,cn=example,cn=com" "objectclass=*"

The following search will also be successful, as cn=Ben is the first RDN.

$ ldapsearch -b "uid=1010,cn=Ben,ou=people,cn=example,cn=com" "objectclass=*"

However, the following searches will indicate that the entry does not exist (with result code 32):

$ ldapsearch -b "cn=Ben,o=admin,ou=people,cn=example,cn=com" "objectclass=*"

$ ldapsearch -b "cn=Zach,ou=people,cn=example,cn=com" "objectclass=*"

The distribution cannot determine to which partition the following search belongs and will be broadcast:

$ ldapsearch -b "ou=people,cn=example,cn=com" "cn=*"

Capacity Distribution

With a capacity-based distribution, the Oracle Unified Directory proxy sends Add requests based on the capacity of each partition, which is determined by the maximum number of entries the partitions can hold. All other requests are distributed by the global index catalog or by broadcast.

Because the data is distributed to the partitions in a completely random manner, the easiest way to identify on which partition a particular data entry is by using a global index. Global index is mandatory when using capacity distribution. If no global index is set up, all requests other than Add will have to be broadcast. For more information on global indexes, see Global Index Catalog and Configuring Global Indexes By Using the Command Line in Oracle Fusion Middleware Administration Guide for Oracle Unified Directory.

Figure 5-9 Capacity Distribution Example

Capacity-based distribution, where Server 1 has twice the capacity for entries.

In the example illustrated in Figure 5-9, Partition 1 has twice the capacity of Partition 2, therefore Partition 1 will receive twice the add requests sent to Partition 2. This way, both partitions should be full at the same time. When all the partitions are full, the distribution will send one request to each partition at each cycle.

DN Pattern Distribution

With a distribution using DN pattern algorithm, the Oracle Unified Directory proxy forwards requests to one of the partitions, based on the match between a request base DN and a string pattern. The match is only perform on the relative part of the request base DN, that is, the part after the distribution base DN. For example, you could split your data into two partitions based on a the DN pattern in the uid of the entries, as illustrated in Figure 5-10.

Distribution using DN pattern is more onerous than distribution with numeric or lexico algorithm. If possible, use another distribution algorithm.

Figure 5-10 DN Pattern Distribution Example

Distribution using DN pattern of uid, over two partitions.

In this example, all the data entries with a uid that ends with 0, 1, 2, 3, or 4 will be sent to Partition 1. Data entries with a uid that ends with 5, 6, 7, 8, or 9 will be sent to Partition 2.

This type of distribution, although using numerical values is quite different from numeric distribution. In numerical distribution, the data is partitioned based on a numerical range, while DN pattern distribution is based on a pattern in the data string.

Distribution using a DN pattern algorithm is typically used in cases where the distribution partitions do not correspond exactly to the distribution base DN. For example, if the data is distributed as illustrated in Figure 5-11, the data for Partition 1 and Partition 2 is in both base DN ou=people,ou=region1 and ou=people,ou=region2. The only way to distribute the data easily is to use the DN pattern.

Figure 5-11 Example of Directory Information Tree

Figure showing base DN distribution, where partitions are split over ou=region1 and ou=region2.

Example 5-3 Example of DN Pattern Algorithm Split by Region

If the deployment of the information is based in two geographical locations, it may be easier to use the DN pattern distribution to distribute the data. For example, if employee numbers were 4 digit codes, where the first digit indicated the region, then you could have the following:

Region 1	Region 2
1000	2000
1001	2001
1002	2002
1003	2003
1004	2004
1005	2005
1006	2006
1007	2007
1008	2008
1009	2009
1010	2010
...	...

In order to spread the load of data, the entries in each location are split over two servers, where Server 1 contains all entries that end with 0, 1, 2, 3, and 4, while Server 2 contains all the entries that end with 5, 6, 7, 8, and 9, as illustrated in Figure 5-10.

Therefore, a search for DN pattern 1222 would be sent to partition 1, as would 2222.

Skip Navigation Links
Exit Print View
	Oracle Fusion Middleware Deployment Planning Guide for Oracle Unified Directory 11g Release 1 (11.1.1)