Route Patterns

A Route Pattern describes a pattern that will match a certain set of HTTP request paths. The pattern is matched against the path component of the request URI.

Example

/objects/:object/:id?

This route pattern will match the following paths:

Origins

This section is non normative

The syntax of Route Patterns is similar to and inspired by the pattern routing syntax found in a number of web frameworks, including:

Route Patterns emerge out of a desire to create a formal definition of the ad-hoc pattern syntax that these and similar frameworks have popularised.

A goal of Route Patterns is to ensure that it is not possible to define a suite of Route Patterns that are ambiguous, i.e that for any given request path only one or zero Route Patterns can be chosen to match against the path. As a consequence the Route Pattern syntax may be considered less flexible/expressive than the ad-hoc syntaxes that the above web frameworks define.

This is a conscious design trade-off. In the ad-hoc syntaxes, any ambiguity is resolved by the order in which patterns are declared, the first declared pattern is tested first, the second declared pattern tested second and so on. Developers can order the pattern declarations to ensure more specific patterns are tested before less specific patterns. This requires one central code location where routes are declared and requires careful ordering of the patterns to avoid errors. These requirements may not scale to larger applications where many developers are defining route patterns, and may not be fully aware of conflicting/overlapping route patterns, or to applications where route patterns need to be defined in many different locations (e.g. in a pluggable architecture).

The Route Pattern syntax is also somewhat similar to the URI Template syntax, but the applications of URI Templates and Route Patterns differ. URI Templates focus on forming concrete URIs from a template, Route Patterns focus on decomposing the path portion of a URI into it’s component parts.

Pattern Syntax Rules

A Route Pattern is a string of printable Unicode characters that contains zero or more embedded variable expressions. An expression MAY be a Named Parameter, delimited by a leading colon (‘:’) and a trailing slash (‘/’), or end of string, or an expression MAY be a Glob Parameter indicated by the wildcard character (‘*’). A pattern that contains one or more Named Parameters is termed a Named Pattern. A pattern that contains a Glob Parameter is termed a Glob Pattern. A pattern MUST NOT contain a mixture of Named Patterns and Glob Expressions. A pattern lacking any variable expressions is termed a Literal Pattern.

Route-Pattern = named-pattern / glob-pattern / literal-pattern

Path Separator

The slash (‘/’) character delimits the pattern into Path Segments. A Path Separator MUST NOT be followed by another Path Separator. The leading Path Separator in a Route Pattern is implied and may be omitted.

Examples

  • The patterns a/b and /a/b are equivalent
  • The patterns * and /* are equivalent
  • The patterns a/b and a/b/ are not equivalent, the trailing Path Separator is significant and cannot be ignored.

Reserved Characters

The set of reserved characters is those defined by RFC 3986 Section 2.2.

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

Literal Values

The characters outside of expressions and path separators in a Route Pattern are termed Literal Values. They MAY contain any printable Unicode character except the Reserved Characters.

Named Parameters

The start of a Named Parameter is indicated by the colon character (‘:’). The end of a Named Parameter is indicated by a Path Separator or the end of string. The Named Pattern MAY be suffixed with a Modifier. A given parameter name MUST only appear once in each route pattern. A Route Pattern MAY have zero or more Named Patterns.

named-expression-pattern = *(literal / path-separator / named-expression )

valid-name = [a-zA-Z0-9] / '-' / '_' 
char = [a-zA-Z]
name = char valid-name*
param-decl = name ('*' / '?' )

named-expression = ':' param-decl path-separator /
                   ':' param-decl <eos>

Modifiers

A Modifier modifies the matching behavior of a Named Parameter. Only a single Named Parameter in a Route Pattern MAY contain a Modifier and it MUST be the last Named Parameter in the pattern. A Modifier is suffixed to the end of a named parameter expression.

Eager Modifier

The Eager Modifier is indicated by the asterisk character (‘*’) and instructs the matcher to eagerly consume all characters matching the Named Pattern including the Path Separator character up to the end of the string.

Example
/foo/:all-children*

This pattern will match the following paths:

  • /foo/barall-children is bound to bar
  • /foo/bar/all-children is bound to bar/, the Eager Modifier consumes all characters including the Path Separator
  • /foo/bar/bazall-children is bound to bar/baz, the Eager Modifier consumes all characters to the end of the string

The Eager Modifier MUST match at least one character, so the above pattern will not match the following path:

  • /foo/, matching this path would require all-children to be bound to the empty string, which is not permitted.
Optional Modifier

The Optional Modifier is indicated by the question mark character (‘?’) and instructs the matcher that the Named Pattern will match zero or more characters until the end of string is reached.

Example
/foo/:item?

This pattern will match the following paths:

  • /foo/baritem is bound to bar
  • /foo/item is bound to the empty string, the Optional Modifier causes the Named Parameter to match the zero length string.

Compound Named Parameter

A Compound Named Parameter is a Named Parameter where the matching text in the request path is decomposed into named components. Each component is delimited by the comma character (‘,’).

A Compound Named Parameter MAY have an Optional Modifier, but MUST NOT have an Eager Modifier.

Example
/line-items/:order_id,item_id/detail

Glob Parameter

A Glob Parameter is denoted by the wildcard Modifier (the ‘*’ character). The wildcard Modifier MUST appear at the end of the pattern and MUST be preceded by the path separator. Only a single Glob Parameter is permitted in a pattern. A Glob Parameter MUST NOT occur in the same pattern as a Named Parameter.

glob-pattern = *(literal / path-separator /  ) / path-separator '*'

A Glob Parameter matches zero or more characters until the end of the string.

Examples
  • /* – Matches all paths
  • /foo/* – Matches all paths starting with /foo/ including the /foo/ path.

Pattern Matching Rules

A Route Pattern is composed of the following tokens:

A Route Pattern is matched against the URL encoded form of a request path, by matching each token against it’s corresponding segment of the request path. The tokens are matched in left to right order, the first token matching the left-most segment of the request path, the second token matching the next left most segment and so on.

The rules for matching each token type are defined below:

Path Separator Matching

Each path separator token MUST match exactly one ‘/’ character in the request path. A Path Separator MUST NOT match the URL Encoded form of the ‘/’ character, i.e. it MUST NOT match the following octets: %2F or the following octets: %2f. Since the leading Path Separator in a Route Pattern is optional, the leading Path Separator in a request path is also optional and MAY be omitted.

Examples

  • The pattern /a/b will match the request paths: a/b and /a/b
  • The equivalent pattern a/b will also match a/b and /a/b
  • The pattern /a/b will not match the request paths: a%2Fb or %2fa%2Fb

Literal Value Matching

Each literal value token MUST match the exact same characters in the request path. Each literal value MUST be URL encoded and compared to the URL encoded request path.

Examples

The pattern a/b will match the following request paths:

  • a/b
  • /a/b
  • /%61/%62 – ‘%61’ is the percent encoded form of the ‘a’ character, ‘%62’ is the percent encoded form of the ‘b’ character.

Named Parameter Matching

A Named Parameter token matches one or more characters up until the next occurrence of a Path Separator or end of string.

Optional Modifier Matching

If a Named Parameter has an Optional Modifier then it will match zero or more characters up until the end of string.

Eager Modifier Matching

If a Named Parameter has an Eager Modifier then it will match all characters until the end of the string.

Examples

The pattern /test/:item will match the following paths:

  • test/101
  • /test/true%2Ffalse
  • /test/a,b,c

The pattern will not match the following paths:

  • /test/101/ – extra trailing slash
  • /test/ – named parameter must match at least one character

Compound Named Parameter Matching

A Compound Named Parameter token matches one or more characters up until the next occurrence of a Path Separator or end of string, wherein the matched characters are further delimited by the comma (‘,’) character. If the Compound Named Parameters has N components, then there MUST be at most N-1 commas in the matched text. If there are more than N-1 comma characters then there MUST be no match. Trailing comma characters MAY be omitted in the matched request path.

Component values in the request path that must contain the comma character MUST use the percent encoded form of the comma character (‘%2C’)

Optional Modifier Matching

If a Compound Named Parameter has an Optional Modifier then it will match zero or more characters up until the end of string.

Examples

The pattern /line-items/:order_id,item_id/detail will match the following paths:

  • /line-items/101,493/detailorder_id is bound to 101, item_id is bound to 493
  • /line-items/101,/detailorder_id is bound to 101, item_id is bound to null
  • /line-items/,493/detailorder_id is bound to null, item_id is bound to 493
  • /line-items/,/detailorder_id is bound to null, item_id is bound to null

Trailing comma separators MAY be omitted so the following path will also be matched:

  • /line-items/101/detail, order_id is bound to 101, item_id is bound to null

If a component value contains the comma character, it must be percent encoded in the request path, for example given the pattern: /books/title,author, then:

  • /books/So%20Long%2C%20and%20Thanks%20for%20All%20the%20Fish,Douglas%20Adams will match, the comma character is percent encoded
  • /books/Eats,%20Shoots%20%26%20Leaves,Lynne%20Truss will not match, there being 2 comma characters in the matched range when only one is expected, and thus the match fails.

Glob Parameter Matching

A Glob Parameter token matches zero or more characters up until the end of the string.

Examples

The pattern /foo/* will match the following paths:

  • /foo/ – matches the empty string
  • /foo/bar – matches bar
  • /foo/bar/ – matches bar/
  • /foo/bar/baz – matches bar/baz

Route Pattern Sets

A Collection of Route Patterns is termed a Route Pattern Set. A Route Pattern Set MUST be unambiguous, meaning that for a given request path it should be possible to choose at most one Route Pattern from the set to match the request path.

Route Patterns MUST be ordered within the Route Pattern Set from most specific pattern to least specific pattern. Matching of a request path against a Route Pattern Set MUST proceed from the most specific pattern to least specific pattern. Matching MUST stop at the first matching Route Pattern encountered.

Equivalent & Overlapping Patterns

Equivalent or overlapping Route Patterns MUST NOT occur in the same Route Pattern Set.

Equivalent Patterns

Named Patterns are equivalent if the only difference between the patterns is the names assigned to parameters

Example

The following two patterns are not permitted in the same Route Pattern Set because the only difference is the name assigned to the Named Parameter:

  • /a/:b
  • /a/:c

Both Named Patterns will match the exact same set of request paths, which causes ambiguity about which one should be chosen to match a given request path.

Overlapping Patterns

Overlapping Patterns are Route Patterns where for a subset of request paths, more than one Route Pattern matches, and the Token Precedence Ordering described below does not help resolve which Route Pattern should be chosen.

Overlapping Modifiers

A Route Pattern Set MUST NOT contain two or more Named Patterns, which differ only in the use of a Modifier.

Example

The following three patterns are not permitted in the same Route Pattern Set because the only difference is the modifier assigned to the Named Parameter:

  • /a/:b
  • /a/:b?
  • /a/:b*
Overlapping Literal & Glob Pattern

A Glob Pattern MUST NOT overlap with a Literal Pattern in the same Route Pattern Set

Example

The following Literal and Glob Pattern overlap, because the Glob Pattern will also match the same request path as the Literal Pattern

  • /foo/bar/
  • /foo/bar/*
Overlapping Literal & Optional Name Pattern

An Optional Named Pattern MUST NOT overlap with a Literal Pattern in the same Route Pattern Set

Example

The following Literal and Optional Name Pattern overlap, because the Optional Name Pattern will also match the same request path as the Literal Pattern.

  • /foo/bar/
  • /foo/bar/:baz?

Pattern Ordering

Patterns MUST be ordered in reverse lexicographical ordering. As a consequence, the longest path sharing a common prefix will be matched first.

Example

Given the following Route Pattern Set:

  • /a
  • /b
  • /c/d
  • /c/d/a/1
  • /a/b/c/d/e/

The expected ordering of this set is:

  • /c/d/a/1
  • /c/d
  • /b
  • /a/b/c/d/e/
  • /a

Token Precedence

The different token types are assigned a precedence order from most specific to least specific, which enables a deterministic sort order to be determined for a Route Pattern Set.

Literal Values and Path Separators

Literal Values and Path Separators have the highest precedence as they require an exact match. Literal values are ordered in reverse lexicographical order, so that longer literal tokens are tested before shorter tokens.

Compound Named Parameters

A Compound Named Parameter has second highest precedence, as the requirement to match the comma characters within the matching value makes it more specific than a Named Parameter.

Optional Compound Name Parameters

An Optional Compound Named Parameter has third highest precedence, it is less specific than a Compound Named Parameter because it MAY match an empty string.

Named Parameters

A Named Parameter has fourth highest precedence, matching one or more characters until the next Path Separator or end of string.

Optional Named Parameters

An Optional Named Parameter has fifth highest precedence, matching zero or more characters, not including the Path Separator until the end of string.

Eager Named Parameters

An Eager Named Parameter has sixth highest precedence, matching one or more characters, including the Path Separator until the end of string.

Glob Parameters

Glob Parameters have lowest precedence, as they are the least specific pattern, matching zero or more characters until the end of string.

Examples

Given the following Route Pattern Set:

The expected ordering of these Route Patterns from most specific to least specific is:

Example Ordering Implementation

This section is non normative

One means by which the specified Route Pattern Ordering can be implemented is to convert each pattern to a canonical string representation and then order the canonical strings in reverse lexicographical order. To accomplish this each different parameter token in the pattern is replaced with a single low value character as shown in the following list, with the lowest precedence pattern getting the lowest value character, and the highest precedence getting the highest value character.

  • Glob -> ‘!
  • Eager Named -> ‘#
  • Optional Named -> ‘$
  • Named -> ‘'
  • Optional Compound -> ‘(
  • Compound -> ‘)

By applying this table to the patterns in the previous example we can see than the canonical strings for each pattern is:

  • /foo/!
  • /b/c/#
  • /b/:$
  • /a/'/c/'
  • /a/'/c
  • /a/'
  • /'/b/c
  • /!

Because the substitute characters used fall in the Reserved Character Set, they will never overlap with any literal tokens, and thus never result in any ambiguous overlap between patterns.