You can set the URL normalization properties in the default.xml file.
URL normalization (also called URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The purpose of URL normalization is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent.
| Property Name | Property Value |
|---|---|
| urlnormalizer.order | Space-delimited list of URL normalization class names. Specifies the order in which the URL normalizers will be run. If any normalizer is not activated, it will be silently skipped. If other normalizers not on the list are activated, they will run in random order after the listed normalizers run. |
| urlnormalizer.regex.file | File name (default is regex-normalize.xml). Name of the configuration file used by the RegexUrlNormalizer class. Note that the file must be in the configuration directory. |
| urlnormalizer.loop.count | Integer value (default is 1). Specifies how many times to loop through normalizers, to ensure that all transformations are performed. |
| urlnormalizer.normalize-seeds | Boolean value (default is false). Specifies whether to normalize the seeds. |