access control instruction (ACI)
authentication password syntax
authorization identity control
Common Development and Distribution License
deprecated password storage scheme
Directory Services Markup Language
entry change notification control
extensible match search filter
greater than or equal to search filter
less than or equal to search filter
Lightweight Directory Access Protocol
notice of disconnection unsolicited notification
Password Modify extended operation
Simple Authentication and Security Layer
virtual attributes only control
The Basic Encoding Rules (BER) are a set of Abstract Syntax Notation One encoding rules that define a specific way in which information may be encoded in a binary form. It is used as the underlying mechanism for encoding message.
Many network protocols are text-based, which has the advantages of being relatively easy to understand if you examine the network traffic, and in many cases you can even interact with the target server by telnetting to it and typing in the appropriate commands. However, there are disadvantages as well, including that they are generally more verbose and less efficient to parse than they need to be. On the other hand, other protocols use a binary encoding that is more compact and more efficient. LDAP falls into this category, and uses the ASN.1 (abstract syntax notation one) mechanism, and more specifically the BER (basic encoding rules) flavor of ASN.1. There are a number of other encoding rules (such as DER, PER, and CER) that fall under the ASN.1 umbrella, but LDAP uses BER.
This section discusses the subset of BER that is used by LDAP in particular and does not address other cases.
BER elements use a TLV structure, where TLV stands for “type”, “length”, and “value”. That is, each BER element has one or more bytes (in LDAP, typically only a single byte) that indicates the data type for the element, one or more bytes that indicate the length of the value, and the encoded value itself (where the form of the encoded value depends on the data type), which can be zero or more bytes, as described in the following sections:
The BER type
The BER Length
The BER Value
The BER type indicates the data type for the value of the element. The BER specification provides several different data types, but the most commonly used by LDAP include OCTET STRING (which can be either a text string or just some binary data), INTEGER, BOOLEAN, NULL, ENUMERATED (like an integer, but where each value has a special meaning), SEQUENCE (an ordered collection of other elements, similar to an array), and SET (the same as a sequence, except that the order doesn't matter). There is also a CHOICE element, but it typically allows one of a few different kinds of elements.
The BER type is typically only a single byte, and this byte has data encoded in it. The two most significant bits (the two leftmost bits, because BER uses big endian/network ordering) are used to indicate the class for the element, using these possible class values:
The universal class. Most BER elements have a universal type, so any element with a universal type specifies what kind of data it holds. Examples of universal types include 0x01 (BOOLEAN), 0x02 (INTEGER), 0x04 (OCTET STRING), 0x05 (NULL), 0x0A (ENUMERATED), 0x30 (SEQUENCE), and 0x31 (SET). The binary encodings for all of those type values have the leftmost two bits set to zero.
The application-specific class. This class allows an application to define its own types that are consistent throughout that application. In this context, LDAP is considered an application. For example, when 0x42 appears in LDAP, it indicates an unbind request protocol op, because RFC 2251 section 4.3 states that the unbind request protocol op has a type of [APPLICATION 2].
The context-specific class. This class indicates that the type is specific to a particular usage within a given application. The same type can be re-used in different contexts in the same application as long as there is enough other information to determine which context is applicable in a given situation. For example, in the context of the credentials in a bind request protocol op, the context-specific type 0x80 is used to hold the bind password, but in the context of an extended operation it would be used to hold the request OID.
The private class, not typically used in LDAP.
The next bit (the third from the left) is the primitive/constructed bit. If it is set to zero (off), then the element is considered primitive, and the value is encoded in accordance with the rules of that data type. If it is set to one (on), then it means that the value is constructed from zero or more other ASN.1 elements that are concatenated together in their encoded forms. For example, for the universal SEQUENCE type of 0x30, the binary encoding is 00110000 and the primitive/constructed bit is set to one indicating that the value of the sequence is constructed from zero or more encoded elements.
The final five bits of the BER type byte specify the value of that type, and they are treated as a simple integer value (where 00000 is zero, 00001 is one, 00010 is two, 00011 is three, and so on). The only special value is 11111, which means that the type value is larger than can fit in the five bits allowed, and so multiple bytes are required. This value is not used in LDAP.
The second component in the TLV structure of a BER element is the length. This specifies the size in bytes of the encoded value. For the most part, this uses a straightforward binary encoding of the integer value (for example, if the encoded value is five bytes long, then it is encoded as 00000101 binary, or 0x05 hex), but if the value is longer than 127 bytes then it is necessary to use multiple bytes to encode the length. In that case, the first byte has the leftmost bit set to one and the remaining seven bits are used to specify the number of bytes required to encode the full length. For example, if there are 500 bytes in the length (hex 0x01F4), then the encoded length will actually consist of three bytes: 82 01 F4.
Note that there is an alternate form for encoding the length called the indefinite form. In this mechanism, only a part of the length is given at a time, similar to the chunked encoding that is available in HTTP 1.1. However, this form is not used in LDAP, as specified in RFC 2251 section 5.1.
The BER element contains the actual data of the element. Because BER is a binary encoding, the encodings can take advantage of that to represent the data in a compact form. As such, each data type has its own encoded form:
The NULL element never has a value, and therefore the length is always zero.
The value of this element is encoded as a concatenation of the raw bytes of the data being represented. For example, to represent the string Hello, the encoded value would be 48 65 6C 6C 6F. The value can have a length of zero bytes.
The value of this element is always a single byte. If all the bits in that byte are set to zero (0x00), then the value is FALSE. If one or more of the bytes is set to one, then the value is TRUE. As a result, there are 255 different ways to encode a BOOLEAN value of TRUE, but in practice it is generally encoded as 0xFF (that is, all the bits are set to one).
The value of this element is encoded as a binary integer in two's complement form. Although BER itself does not place a limit on the magnitude of the values that can be encoded, many software implementations have a cap of four or eight bytes (that is, 32-bit or 64-bit integer values), and LDAP generally uses a maximum of 4 bytes (which allows encoding values within the plus or minus 2 billion range). There is always at least one byte in the value.
The value of this element is encoded in exactly the same way as the value of an INTEGER element.
The value of this element is simply a concatenation of the encoded BER elements contained in the sequence. For example, to encode a sequence with two octet string elements encoding the text Hello and there, the encoded sequence value is 04 05 48 65 6C 6C 6F 04 05 74 68 65 72 65. A sequence value can be zero bytes if there are no elements in the sequence.
The value of this element is encoded in exactly the same way as the value of a SEQUENCE element.
The example above for encoding a SEQUENCE value had two complete BER elements concatenated together: the OCTET STRING representations of the strings Hello and there:
04 05 48 65 6C 6C 6F 04 05 74 68 65 72 65
In both of these cases, the first byte is the type (0x04, which is the universal primitive OCTET STRING type), and the second is the length (0x05, indicating that there are five bytes in the value). The remaining five bytes are the encoded representations of the strings Hello and there.
The following example encodes the integer value 3 using a context-specific type value of 5 instead of the universal INTEGER type:
85 01 03
The next example encodes an LDAP bind request protocol op as defined in RFC 2251 section 4.2. A simplified BNF representation of this element is as follows:
BindRequest ::= [APPLICATION 0] SEQUENCE { version INTEGER (1 .. 127), name OCTET STRING, authentication CHOICE { simple [0] OCTET STRING, sasl [3] SEQUENCE { mechanism OCTET STRING, credentials OCTET STRING OPTIONAL } } }
This example encodes a bind request using simple authentication for the user cn=test with a password of password. The complete encoding for this bind request protocol op is:
60 16 02 01 03 04 07 63 6E 3D 74 65 73 74 80 08 70 61 73 73 77 6F 72 64
In analysis, that string of bytes contains the following information:
The first byte is 0x60 and it is the BER type for the bind request protocol op. It comes from the [APPLICATION 0] SEQUENCE portion of the definition. Because it is application-specific, then the class bytes are 01, and because it is a SEQUENCE, it is constructed. Put that together with a type value of zero, the binary representation is 01100000, which is 0x60 hex.
The second byte is 0x16, which indicates the length of the bind request sequence. 0x16 hex is 22 decimal, and the number of bytes after the 0x16 is 22.
The next three bytes are 02 01 03, which is a universal INTEGER value of 3. It corresponds to the version component of the bind request sequence, and it indicates that this is an LDAPv3 bind request.
The next nine bytes are 04 07 63 6E 3D 74 65 73 74, which is a universal OCTET STRING containing the text cn=test. It corresponds to the “name” component of the bind request sequence.
The last component is 80 08 70 61 73 73 77 6F 72 64, which is an element with a type of context-specific primitive 0 and a length of eight bytes. As specified in the definition of the bind request protocol op, context-specific maps to the simple authentication type and that it should be treated as an OCTET STRING, and those eight bytes in the value do represent the encoded string password.