You can change the default behavior of the CAS Document Conversion Module by specifying options via JVM property names and values.
Note that you cannot set these options via the standard configuation files.
The two options are:
stellent.fallbackFormat
determines the fallback format, that is, what extraction format will be used if the CAS Document Conversion Module cannot identify the format of a file. The two valid settings areascii8
(files whose types are specifically unidentifiable are treated as plain-text files, even if they are not plain-text) andnone
(unrecognized file types are considered to be unsupported types and therefore are not converted). Use thenone
setting if you are more concerned with preventing many binary and unrecognized files from being incorrectly identified as text. If there are documents that are not being properly extracted (especially text files containing multi-byte character encodings), it may be useful to try theascii8
option.stellent.fileId
determines the file identification behavior. The two valid settings arenormal
(standard file identification behavior occurs) andextended
(an extended test is run on all files that are not identified). Theextended
setting may result in slower crawls than with thenormal
setting, but it improves the accuracy of file identification.