atg.core.net
Class URLUtils

java.lang.Object
  extended by atg.core.net.URLUtils

public class URLUtils
extends java.lang.Object

Utility class of static methods of useful URL related utilities.
Supersedes java.net.URLDecoder and java.net.URLEncoder. The URL escaping/unescaping functions use the ISO-8859-1 encoding for conversion between characters and bytes.

If you need a customized set of reserved (unescaped) ASCII characters, or a UTF-8 character encoding, then use a URLEscaper. This static implementation delegates to a URLEscaperISO88591 object (with the slight performance overhead for dispatching and data access in a non-static implementation).

See Also:
URLDecoder, URLEncoder, URLEscaper

Field Summary
static java.lang.String CLASS_VERSION
          Class version string
protected static java.lang.String EMAIL_ADDRESS_DELIMITER
           
protected static java.lang.String NUM_CHAR_REF_END
           
protected static java.lang.String NUM_CHAR_REF_START
           
protected static java.lang.String PERCENT
           
static java.util.regex.Pattern RESERVED_URI_CHARS
           
 
Constructor Summary
URLUtils()
           
 
Method Summary
static boolean containsNumCharRefs(java.lang.String pVal)
          Checks to see if a string contains any numeric character references of the form &#N; where N is either a decimal or hexidecimal value of a character's unicode code point.
static java.lang.String emailAddressFromPunycode(java.lang.String pEmailAddress)
          converts the domain portion of an email address into Unicode, from Punycode
static java.lang.String emailAddressToPunycode(java.lang.String pEmailAddress)
          converts an email address containing non-ASCII characters, or numeric character references into punycode.
static java.lang.String encodeEmailAddresses(java.lang.String pAddressString)
          encode the email address, or comma delimited list of email addresses
static void escapeAndAppendUrl(java.lang.StringBuffer sb, char[] str)
          URL encode the contents of a character array and append them to the provided StringBuffer.
static void escapeAndAppendUrlString(java.lang.StringBuffer pDestBuffer, java.lang.String pStr)
          URL encode the contents of the String and append them to the provided StringBuffer.
static java.lang.String escapeUrlString(java.lang.String pStr)
          Return the passed in String as a URL escaped String.
static java.lang.String escapeUrlString(java.lang.StringBuffer pStrBuf)
           
static java.lang.String getAddressPortNumberSeparator(java.lang.String pAddress)
          Return the port separator string for the given address
static java.lang.String getUrlPortNumberSeparator(java.lang.String pUrl)
          Return the port separator string for the given URL
static char hex2Char(char x, char y)
          Takes two ASCII characters, x and y, which represent a two-digit hexadecimal number (%xy), and returns the unsigned byte value in the range 0-255.
static boolean isIPv6Address(java.lang.String pAddress)
          Check to see if a given address is an IPv6 address
static boolean isRelative(java.lang.String pUrl)
          Returns true if the URL is relative
static boolean needsIPv6Wrapper(java.lang.String pAddress)
          Check to see of the address passed is an IPv6 address and it does not already have the square bracket wrapper.
static java.lang.String numCharRefToUnicode(java.lang.String pVal)
          Converts any unicode values in a string encoded as numeric character references into their literal unicode character values.
static java.util.Hashtable parsePostData(int len, java.io.InputStream in)
          Parses posted data sent from the client to the server, and builds a HashTable object with key-value pairs.
static java.util.Hashtable parseQuery(char[] str)
          Parses a query string passed from the client to the server and builds a HashTable object with key-value pairs.
static java.util.Hashtable parseQueryString(java.lang.String queryString)
          Parses a query string passed from the client to the server and builds a HashTable object with key-value pairs.
static java.lang.String processEmailAddressPunycode(java.lang.String pEmailAddress, boolean pEncode)
           
protected static java.lang.String processUrlPunycode(java.lang.String pURL, boolean pEncode)
          find the domain and path portions of a url, and pass them to the URLProcessor to convert them to/from punycode
static java.lang.String removeRelativePaths(java.lang.String pURL)
          Takes the an absolute URI that may have "../" in it and removes the "../" entries (by removing it and the directory above it in the path as well).
static void resolvePureRelativePath(java.lang.StringBuffer pResultBuffer, java.lang.String pPath, java.lang.String pWorkingDirectory)
          Resolves a pure relative path against a working directory, handling ".." references.
static java.lang.String resolvePureRelativePath(java.lang.String pPath, java.lang.String pWorkingDirectory)
          Resolves a pure relative path against a working directory, handling ".." references.
static java.lang.String stripURIArgs(java.lang.String pURI)
          Strip off any arguments etc found in the URI.
static int unescapeUrl(char[] str)
          Utility wrapper.
static int unescapeUrl(char[] str, int len)
          Utility method for converting from a MIME format called x-www-form-urlencoded.
static java.lang.String unescapeUrlString(java.lang.String pString)
          Utility method for converting from a MIME format called x-www-form-urlencoded to a String.
static java.lang.String URIDirectory(java.lang.String pURI)
          Returns the directory part of a URI (everything up to the filename), which will begin and end with the "/" character.
static java.lang.String URIFilename(java.lang.String pURI)
          Returns the filename part of a URI
static java.lang.String urlEncodeDisallowedOnly(java.lang.String pUrl)
          performs a percent-encoding on only the disallowed characters in a URL.
static java.lang.String urlFromPunycode(java.lang.String pURL)
          converts a url containing punycode or percent encoded portions into Unicode
static java.lang.String urlToPunycode(java.lang.String pURL)
          converts a url containing non-ASCII characters, or numeric character references into punycode with the path and query param portions percent-encoded.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CLASS_VERSION

public static java.lang.String CLASS_VERSION
Class version string


NUM_CHAR_REF_START

protected static final java.lang.String NUM_CHAR_REF_START
See Also:
Constant Field Values

NUM_CHAR_REF_END

protected static final java.lang.String NUM_CHAR_REF_END
See Also:
Constant Field Values

PERCENT

protected static final java.lang.String PERCENT
See Also:
Constant Field Values

EMAIL_ADDRESS_DELIMITER

protected static final java.lang.String EMAIL_ADDRESS_DELIMITER
See Also:
Constant Field Values

RESERVED_URI_CHARS

public static java.util.regex.Pattern RESERVED_URI_CHARS
Constructor Detail

URLUtils

public URLUtils()
Method Detail

isRelative

public static boolean isRelative(java.lang.String pUrl)
Returns true if the URL is relative


isIPv6Address

public static boolean isIPv6Address(java.lang.String pAddress)
Check to see if a given address is an IPv6 address

Parameters:
pAddress - The address to be check
Returns:
true if the address is IPv6 address, false if the address is not an IPv6 address

needsIPv6Wrapper

public static boolean needsIPv6Wrapper(java.lang.String pAddress)
Check to see of the address passed is an IPv6 address and it does not already have the square bracket wrapper.

Parameters:
pAddress - The address to be check
Returns:
true if the address needs a IPv6 square bracket wrapper

getUrlPortNumberSeparator

public static java.lang.String getUrlPortNumberSeparator(java.lang.String pUrl)
Return the port separator string for the given URL

Parameters:
pUrl - The URL to be checked
Returns:
The port separator string for the given URL

getAddressPortNumberSeparator

public static java.lang.String getAddressPortNumberSeparator(java.lang.String pAddress)
Return the port separator string for the given address

Parameters:
pAddress - The address to be checked
Returns:
The port separator string for the given address

URIDirectory

public static java.lang.String URIDirectory(java.lang.String pURI)
Returns the directory part of a URI (everything up to the filename), which will begin and end with the "/" character.


URIFilename

public static java.lang.String URIFilename(java.lang.String pURI)
Returns the filename part of a URI


stripURIArgs

public static java.lang.String stripURIArgs(java.lang.String pURI)
Strip off any arguments etc found in the URI. This is done by looking for ';', '?', and '#' and truncating the URI to all that comes before the first instance of either character.


resolvePureRelativePath

public static java.lang.String resolvePureRelativePath(java.lang.String pPath,
                                                       java.lang.String pWorkingDirectory)
                                                throws java.lang.IllegalArgumentException
Resolves a pure relative path against a working directory, handling ".." references. Returns the String or null if the path could not be resolved.

Parameters:
pPath - The pure relative path (i.e. does not start with "/") to be resolved.
pWorkingDirectory - The working directory to resolve against, which should begin and end with the "/" character.
Throws:
java.lang.IllegalArgumentException

resolvePureRelativePath

public static void resolvePureRelativePath(java.lang.StringBuffer pResultBuffer,
                                           java.lang.String pPath,
                                           java.lang.String pWorkingDirectory)
                                    throws java.lang.IllegalArgumentException
Resolves a pure relative path against a working directory, handling ".." references. Returns the String or null if the path could not be resolved.

Parameters:
pPath - The pure relative path (i.e. does not start with "/") to be resolved.
pWorkingDirectory - The working directory to resolve against, which should begin and end with the "/" character.
Throws:
java.lang.IllegalArgumentException

removeRelativePaths

public static java.lang.String removeRelativePaths(java.lang.String pURL)
                                            throws java.lang.IllegalArgumentException
Takes the an absolute URI that may have "../" in it and removes the "../" entries (by removing it and the directory above it in the path as well).

Throws:
java.lang.IllegalArgumentException - if the path name has too many ../'s in it.

unescapeUrlString

public static java.lang.String unescapeUrlString(java.lang.String pString)
Utility method for converting from a MIME format called x-www-form-urlencoded to a String. Optimized version of java.net.URLDecoder#decode.

To convert to a String, each character is examined in turn:

If the string argument is null, an empty string is returned.

Parameters:
pString - String object to be decoded
Returns:
decoded String object, never null
See Also:
hex2Char( char, char ), unescapeUrl( char[], int ), URLDecoder.decode( String )

unescapeUrl

public static int unescapeUrl(char[] str)
Utility wrapper.

See Also:
unescapeUrl( char[], int )

unescapeUrl

public static int unescapeUrl(char[] str,
                              int len)
Utility method for converting from a MIME format called x-www-form-urlencoded. Optimized version of java.net.URLDecoder#decode. This version operates on a mutable character array.

Each character is examined in turn:

Parameters:
str - character array to be unescaped, conversion occurs in place
Returns:
length of significant output in the unescaped char array,
or -1 if no change was necessary
See Also:
hex2Char( char, char ), URLDecoder.decode( String )

escapeUrlString

public static final java.lang.String escapeUrlString(java.lang.String pStr)
Return the passed in String as a URL escaped String.

Uses the ISO-8859-1 character encoding. Equivalent to an optimized version of java.net.URLEncoder#encode.

If the string argument is null, an empty string is returned.

Parameters:
pStr - the String to be translated.
Returns:
the URL encoded String, never null
See Also:
URLEncoder.encode( String )

escapeUrlString

public static final java.lang.String escapeUrlString(java.lang.StringBuffer pStrBuf)

escapeAndAppendUrlString

public static final void escapeAndAppendUrlString(java.lang.StringBuffer pDestBuffer,
                                                  java.lang.String pStr)
URL encode the contents of the String and append them to the provided StringBuffer. Uses the ISO-8859-1 character encoding.

Parameters:
pDestBuffer - the StringBuffer to which to append the result
pStr - the String to be translated
See Also:
escapeAndAppendUrl( StringBuffer, char[] )

escapeAndAppendUrl

public static void escapeAndAppendUrl(java.lang.StringBuffer sb,
                                      char[] str)
URL encode the contents of a character array and append them to the provided StringBuffer.

This algorithm is only valid only for the lower 8-bits of the character value. The Unicode range U+0000 - U+00FF is equivalent to ISO-8859-1, so it handles these characters correctly. Unicode characters above U+00FF are truncated and mapped to a single character from ISO-8859-1.

Parameters:
sb - the StringBuffer to which to append the result
str - the character array to be translated

parseQueryString

public static java.util.Hashtable parseQueryString(java.lang.String queryString)
Parses a query string passed from the client to the server and builds a HashTable object with key-value pairs. Same functionality as javax.servlet.http.HttpUtils#parseQueryString but this implementation attempts optimization by inlining character unescaping with the key-value delimiter parsing.

The query string should be in the form of a string packaged by the GET or POST method, that is, it should have key-value pairs in the form key=value, with each pair separated from the next by a '&' character.

A key can appear more than once in the query string with different values. However, the key appears only once in the hashtable, with its value being an array of strings containing the multiple values sent by the query string.

The keys and values in the hashtable are stored in their decoded form, so any '+' characters are converted to spaces ' ', and byte values in hexadecimal notation "%xy" are converted to Unicode characters using the ISO-8859-1 encoding.

Parameters:
queryString - a string containing the query to be parsed
Returns:
a HashTable object built from the parsed key-value pairs, may be emtpty, but never null
Throws:
java.lang.IllegalArgumentException - if the query string is invalid
See Also:
parseQuery( char[] ), HttpUtils.parseQueryString( String )

parseQuery

public static java.util.Hashtable parseQuery(char[] str)
Parses a query string passed from the client to the server and builds a HashTable object with key-value pairs. The query string should be in the form of a string packaged by the GET or POST method, that is, it should have key-value pairs in the form key=value, with each pair separated from the next by a '&' character.

A key can appear more than once in the query string with different values. However, the key appears only once in the hashtable, with its value being an array of strings containing the multiple values sent by the query string.

The keys and values in the hashtable are stored in their decoded form, so any '+' characters are converted to spaces ' ', and byte values in hexadecimal notation "%xy" are converted to Unicode characters using the ISO-8859-1 encoding.

Parameters:
str - character array representing a query string
Returns:
a HashTable object built from the parsed key-value pairs, may be empty, but never null
Throws:
java.lang.IllegalArgumentException - if the query string is invalid

parsePostData

public static java.util.Hashtable parsePostData(int len,
                                                java.io.InputStream in)
                                         throws java.lang.IllegalArgumentException
Parses posted data sent from the client to the server, and builds a HashTable object with key-value pairs. Same functionality as java.servlet.http.HttpUtils#parsePostData, but this implementation attempts optimization by inlining character unescaping with the key-value delimiter parsing.

The query string should be in the form of a string packaged by the GET or POST method, that is, it should have key-value pairs in the form key=value, with each pair separated from the next by a '&' character.

A key can appear more than once in the query string with different values. However, the key appears only once in the hashtable, with its value being an array of strings containing the multiple values sent by the query string.

The keys and values in the hashtable are stored in their decoded form, so any '+' characters are converted to spaces ' ', and byte values in hexadecimal notation "%xy" are converted to characters.

The conversion from bytes to characters uses the standard ISO-8859-1 encoding, which is compatible with the version in java.servlet.http.HttpUtils, and also produces the String format expected by the Dynamo character Converter interface.

The return value will be an empty hashtable if len is not a positive value, or if there was an exception reading the stream, or if the stream ended before all the data was read.

Parameters:
len - an integer specifying the length, in characters, of the data to be parsed from the ServletInputStream argument
in - the ServletInputStream object that contains the data sent from the client
Returns:
a HashTable object built from the parsed key-value pairs (never null)
Throws:
java.lang.IllegalArgumentException - if the stream argument is null
See Also:
parseQuery( char[] ), java.servlet.http.HttpUtils#parsePostData( int, ServletInputStream )

hex2Char

public static char hex2Char(char x,
                            char y)
Takes two ASCII characters, x and y, which represent a two-digit hexadecimal number (%xy), and returns the unsigned byte value in the range 0-255.

If the character input values are outside the ASCII range (0x00-0x7f), their values are masked to 7-bits before the look-up. If the (masked) input ASCII character is not a valid hex character (0-9,a-f,A-F), then it does not contribute to the result value. If both inputs are invalid, then the result will be zero.

Neither the hex input nor the byte valued output can represent the full repertoire of Unicode characters. If the byte return value is interpreted directly as a character, it will appear to be from the the ISO-8859-1 repertoire, which is equivalent to the subset of Unicode with a zero upper byte. The name hex2Char is purely for historical reasons, it should really be hex2Byte.

Parameters:
x - high order digit of hexadecimal representation
y - low order digit of hexadecimal representation
Returns:
byte value

encodeEmailAddresses

public static java.lang.String encodeEmailAddresses(java.lang.String pAddressString)
                                             throws javax.mail.internet.AddressException,
                                                    java.io.UnsupportedEncodingException
encode the email address, or comma delimited list of email addresses

Parameters:
pAddressString -
Returns:
a comma delimited String containing all the email addresses in pAddressString as punycode encoded
Throws:
javax.mail.internet.AddressException - if there is an error parsing the address string
java.io.UnsupportedEncodingException

emailAddressFromPunycode

public static java.lang.String emailAddressFromPunycode(java.lang.String pEmailAddress)
converts the domain portion of an email address into Unicode, from Punycode

Parameters:
pEmailAddress - the email address to decode
Returns:
the email address with the domain portion as Unicode

emailAddressToPunycode

public static java.lang.String emailAddressToPunycode(java.lang.String pEmailAddress)
converts an email address containing non-ASCII characters, or numeric character references into punycode. Only the domain portion of the email address is converted to punycode

Parameters:
pEmailAddress - the email address to encode
Returns:
the email address with the domain portion as punycode, if any characters in the domain portion needed to be encoded

processEmailAddressPunycode

public static java.lang.String processEmailAddressPunycode(java.lang.String pEmailAddress,
                                                           boolean pEncode)

urlFromPunycode

public static java.lang.String urlFromPunycode(java.lang.String pURL)
converts a url containing punycode or percent encoded portions into Unicode

Parameters:
pURL - the url to decode
Returns:
the url as Unicode, with no punycode or percent encoded portions

urlToPunycode

public static java.lang.String urlToPunycode(java.lang.String pURL)
converts a url containing non-ASCII characters, or numeric character references into punycode with the path and query param portions percent-encoded.

Parameters:
pURL - the url to encode
Returns:
the url as punycode, if any characters needed to be encoded

processUrlPunycode

protected static java.lang.String processUrlPunycode(java.lang.String pURL,
                                                     boolean pEncode)
find the domain and path portions of a url, and pass them to the URLProcessor to convert them to/from punycode


urlEncodeDisallowedOnly

public static java.lang.String urlEncodeDisallowedOnly(java.lang.String pUrl)
                                                throws java.io.UnsupportedEncodingException
performs a percent-encoding on only the disallowed characters in a URL. does not alter reserved or non-reserved characeters.

Parameters:
pUrl - a URL to be percent encoded
Returns:
the resulting URL after percent encoding has been performed on only the disallowed characters.
Throws:
java.io.UnsupportedEncodingException - if the UTF-8 encoding is not supported

numCharRefToUnicode

public static java.lang.String numCharRefToUnicode(java.lang.String pVal)
Converts any unicode values in a string encoded as numeric character references into their literal unicode character values. A unicode value encoded as a numeric character reference takes the form of &#N; where N is either the decimal or hexidecimal value of a character's unicode code point. For instance, the numeric character reference of would be converted into the Korean Hangul Jamo character of choseong cieuc.

Parameters:
pVal - A string potentially containing numeric character references
Returns:
The string with any numeric character references converted to literal unicode characters, or if no numeric character references are present, then return pVal

containsNumCharRefs

public static boolean containsNumCharRefs(java.lang.String pVal)
Checks to see if a string contains any numeric character references of the form &#N; where N is either a decimal or hexidecimal value of a character's unicode code point.

Parameters:
pVal - the string to check
Returns:
true if pVal contains any numeric character references, false if not