DTLS-SRTP Overview

RFC 5763 specifies using DTLS with SRTP, called DTLS-SRTP. Within this architecture, DTLS, based on TLS, provides key management, negotiation of parameters, and secure data transfer. SRTP provides confidential message authentication and replay security. Combining these protocols as SRTP-DTLS establishes fully secure SRTP flows.

Key exchange via text-based SDP is unacceptable in that malicious network elements can easily eavesdrop and obtain plaintext keys, compromising the privacy and integrity of the encrypted media stream. Consequently, the SDP exchange must be protected by a security protocol.

While the SRTP framework provides encryption and authentication procedures and defines the set of default cryptographic transforms required for RFC compliance, it does not specify a key management protocol to securely derive and exchange cryptographic keys. Similar to SDES deployment, these missing functions need to come from another mechanism. DTLS-SRTP, defined by RFC 5763 and RFC 5764 can provide this means of implementing key management for SRTP.

On the ESBC, DTLS-SRTP operation begins with the caller issuing a SIP INVITE with SDP parameters that requests a DTLS exchange between the end stations. The callee processes the SIP signaling and SDP request, ultimately issuing a SIP 200 OK to the caller, which is acknowledged. At this point, the DTLS server begins a DTLS handshake sequence between the media endpoints, within which the end stations confirm each others' identity and establish the cryptography to be used for each flow. Once confirmed, the end stations begin exchanging SRTP media.

DTLS-SRTP secures flows between itself and both the caller and callee. The architecture establishes a client-server relationship. Mutual authentication is required. Although it supports features such as early media, the architecture supports an active station tearing down the call if authentication from the other side fails.

The architecture also uses certificates as a means of confirming identity for both the signaling and media flows. These certificates can be self-signed and do not refer to an authority for confirmation. Instead, the end stations hash the certificates and create a fingerprint for use by the opposing end station to verify that the same end-station performing the signaling is also the source of the media. Finally, the architecture establishes the crypto-suite and exchange keys to be used to encrypt and decrypt each flow.