Skip Headers
Oracle® Outside In Search Export Developer's Guide
Release 8.4.1

Part Number E12887-05
Go to Documentation Home
Home
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

B Search Export Options

Options are parameters affecting the behavior of an export or transformation. This chapter presents both the C/C++ and SOAP options relevant to the Search Export product.

While default values are provided, users are encouraged to set all options for a number of reasons. In some cases, the default values were chosen to provide backwards compatibility. In other cases, the default values were chosen arbitrarily from a range of possibilities.

B.1 Search Export C/C++ Options

These options are available to the developer when using the export engine.

Options are set using the DASetOption call. It is recommended that developers familiarize themselves with all of the options available.

Options may be Local, in which case they only affect the handle for which they are set, or Global, in which case they automatically affect all handles associated with the hDoc and must be set before the call to DAOpenDocument.

B.1.1 Character Mapping

This section discusses character mapping options.

B.1.1.1 SCCOPT_DEFAULTINPUTCHARSET

This option is used in cases where Oracle Outside In cannot determine the character set used to encode the text of an input file. When all other means of determining the file's character set are exhausted, Oracle Outside In will assume that an input document is encoded in the character set specified by this option. This is most often used when reading plain-text files, but may also be used when reading HTML or PDF files. The possible character sets are listed in charsets.h.

When "extended test for text" is enabled (see Section B.1.3.3, "SCCOPT_FIFLAGS"), this option will still apply to plain-text input files that are not identified as EBCDIC or Unicode.

This option supersedes the SCCOPT_FALLBACKFORMAT option for selecting the character set assumed for plain-text files. For backwards compatibility, use of deprecated character-set -related values is still currently supported for SCCOPT_FALLBACKFORMAT, though internally such values will be translated into equivalent values for the SCCOPT_DEFAULTINPUTCHARSET. As a result, if an application were to set both options, the last such value set for either option will be the value that takes effect.

Handle Types

NULL, VTHDOC

Scope

Global

Data Type

VTDWORD

Default

  • ANSI1252 on Windows and Latin-1 on UNIX.

Data

The data types are listed in charsets.h.

B.1.1.2 SCCOPT_UNMAPPABLECHAR

This option selects the character used when a character is not a valid Unicode character, or does not conform to the XML specification for valid characters. This option takes the Unicode value for the replacement character. If you are using the PageML output format, this option is only valid if the SCCEX_PAGEML_TEXTOUT flag is set in SCCOPT_XML_PAGEML_FLAGS.

Handle Types

VTHDOC

Scope

Local

Data Type

VTWORD

Data

The Unicode value for the character to use.

Default

  • 0xfffd

B.1.2 Output

This section discusses output options.

B.1.2.1 SCCOPT_RENDERING_PREFER_OIT

This option is valid on 32-bit and 64-bit Linux (Red Hat and Suse) and Solaris Sparc platforms.

This option is only valid when PageML is the output format.

When this option is set to TRUE, the technology will attempt to use its internal graphics code to render fonts and graphics. When set to FALSE, the technology will render images using the operating system's native graphics subsystem (X11 on UNIX/Linux platforms). Note that this option only works when at least one of the appropriate output solutions is present. For example, if the UNIX $DISPLAY variable does not point to a valid X Server, but the OSGD and/or WV_GD modules required for the Oracle Outside In output solution exist, Oracle Outside In will default to the Oracle Outside In rendering code. The option will fail if neither of these output solutions is present.

It is important for the system to be able to locate useable fonts when this option is set to TRUE. Only TrueType fonts (*.ttf or *.ttc files) are currently supported. To ensure that the system can find them, make sure that the environment variable GDFONTPATH includes one or more paths to these files. If the variable GDFONTPATH can't be found, the current directory is used. If fonts are called for and cannot be found, Search Export will exit with an error. Also note that when copying Windows fonts to a UNIX system, the font extension for the files (*.ttf or *.ttc) must be lowercase, or they will not be detected during the search for available fonts. Oracle does not provide fonts with any Oracle Outside In product.

Handle Types

NULL, VTHDOC

Scope

Global

Data Type

VTBOOL

Data

One of the following values:

  • TRUE: Use the technology's internal graphics rendering code to produce bitmap output files whenever possible.

  • FALSE: Use the operating system's native graphics subsystem.

Default

FALSE

B.1.3 Input Handling

This section discusses input handling options.

B.1.3.1 SCCOPT_EXTRACTXMPMETADATA

Adobe's Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. This option enables the XMP feature, which does not interpret the XMP metadata, but passes it straight through without any interpretation. This option is independent of the other two "metadata" options. This option will be ignored if the SCCOPT_PARSEXMPMETADATA option is enabled.

  • SCCEX_IND_SUPPRESSPROPERTIES will not affect XMP, so if you turn XMP on, but also set SuppressProperties, you will still get the XMP.

  • SCCEX_METADATAONLY will not guarantee that XMP is produced.

Handle Types

VTHDOC

Scope

Local (was Global prior to release 8.2.2)

Data Type

VTBOOL

Data

  • TRUE: This setting enables XMP extraction.

  • FALSE: This setting disables XMP extraction.

Default

  • FALSE

B.1.3.2 SCCOPT_FALLBACKFORMAT

This option controls how files are handled when their specific application type cannot be determined. This normally affects all plain-text files, because plain-text files are generally identified by process of elimination, for example, when a file isn't identified as having been created by a known application, it is treated as a plain-text file.

This option must be set for an hDoc before any subhandle has been created for that hDoc.

A number of values that were formerly allowed for this option have been deprecated. Specifically, the values that selected specific plain-text character sets are no longer to be used. Instead, applications should use the SCCOPT_DEFAULTINPUTCHARSET option for such functionality.

Handle Types

NULL, VTHDOC

Scope

Global

Data Type

VTDWORD

Data

The high VTWORD of this value is reserved and should be set to 0, and the low VTWORD must have one of the following values:

  • FI_TEXT: Unidentified file types will be treated as text files.

  • FI_NONE: Oracle Outside In will not attempt to process files whose type cannot be identified. This will include text files. When this option is selected, an attempt to process a file of unidentified type will cause Oracle Outside In to return an error value of DAERR_FILTERNOTAVAIL (or SCCERR_NOFILTER).

Default

  • FI_TEXT

B.1.3.3 SCCOPT_FIFLAGS

This option affects how an input file's internal format (application type) is identified when the file is first opened by the Oracle Outside In technology. When the extended test flag is in effect, and an input file is identified as being either 7-bit ASCII, EBCDIC, or Unicode, the file's contents will be interpreted as such by the export process.

The extended test is optional because it requires extra processing and cannot guarantee complete accuracy (which would require the inspection of every single byte in a file to eliminate false positives.)

Handle Types

NULL, VTHDOC

Scope

Global

Data Type

VTDWORD

Data

One of the following values:

  • SCCUT_FI_NORMAL: This is the default value. When this is set, standard file identification behavior occurs.

  • SCCUT_FI_EXTENDEDTEST: If set, the File Identification code will run an extended test on all files that are not identified.

Default

  • SCCUT_FI_NORMAL

B.1.3.4 SCCOPT_FORMATFLAGS

This option allows the developer to set flags that enable options that span multiple export products.

Handle Types

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

  • SCCOPT_FLAGS_ALLISODATETIMES: When this flag is set, all Date and Time values are converted to the ISO 8601 standard. This conversion can only be performed using dates that are stored as numeric data within the original file.

  • SCCOPT_FLAGS_STRICTFILEACCESS: When an embedded file or URL can't be opened with the full path, OIT will sometimes try and open the referenced file from other locations, including the current directory. When this flag is set, it will prevent OIT from trying to open the file from any location other than the fully qualified path or URL.

Default

0: All flags turned off

B.1.3.5 SCCOPT_SYSTEMFLAGS

This option controls a number of miscellaneous interactions between the developer and the Outside In Technology.

Handle Type

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

  • SCCVW_SYSTEM_UNICODE: This flag causes the strings in SCCDATREENODE to be returned in Unicode.

Default

0

B.1.3.6 SCCOPT_IGNORE_PASSWORD

This option can disable the password verification of files where the contents can be processed without validation of the password. If this option is not set, the filter should prompt for a password if it handles password-protected files.

As of Release 8.4.0, only the PST and MDB Filters support this option.

Scope

Global

Data Type

VTBOOL

Data

  • TRUE: Ignore validation of the password

  • FALSE: Prompt for the password

Default

FALSE

B.1.3.7 SCCOPT_LOTUSNOTESDIRECTORY

This option allows the developer to specify the location of a Lotus Notes or Domino installation for use by the NSF filter. A valid Lotus installation directory must contain the file nnotes.dll.

Note:

Please see section 2.1.1 for NSF support on Win x86-32 or Win x86-64 or section 3.1.1 for NSF support on Linux x86-32 or Solaris Sparc 32.

Handle Types

NULL

Scope

Global

Data Type

VTLPBYTE

Data

A path to the Lotus Notes directory.

Default

If this option isn't set, then OIT will first attempt to load the Lotus library according to the operating system's PATH environment variable, and then attempt to find and load the Lotus library as indicated in HKEY_CLASSES_ROOT\Notes.Link.

B.1.3.8 SCCOPT_PARSEXMPMETADATA

Adobe's Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. This option enables parsing of the XMP data into normal OIT document properties. Enabling this option may cause the loss of some regular data in premium graphics filters (such as Postscript), but won't affect most formats (such as PDF).

Handle Types

VTHDOC

Scope

Local

Data Type

VTBOOL

Data

  • TRUE: This setting enables parsing XMP.

  • FALSE: This setting disables parsing XMP.

Default

FALSE

B.1.3.9 SCCOPT_PDF_FILTER_REORDER_BIDI

This option controls whether or not the PDF filter will attempt to reorder bidirectional text runs so that the output is in standard logical order as used by the Unicode 2.0 and later specification. This additional processing will result in slower filter performance according to the amount of bidirectional data in the file.

Handle Types

VTHDOC, NULL

Scope

Global

Data Type

VTDWORD

Data

  • SCCUT_FILTER_STANDARD_BIDI

  • SCCUT_FILTER_REORDERED_BIDI

Default

SCCUT_FILTER_STANDARD_BIDI

B.1.3.10 SCCOPT_PROCESS_OLE_EMBEDDINGS

Microsoft Powerpoint versions from 1997 through 2003 had the capability to embed OLE documents in the Powerpoint files. This option controls which embeddings are to be processed as native (OLE) documents and which are processed using the alternate graphic.

Note:

The Microsoft Powerpoint application sometimes does embed known Microsoft OLE embeddings (such as Visio, Project) as an "Unknown" type. To process these embeddings, the SCCOPT_PROCESS_OLEEMBED_ALL option is required. Post Office-2003 products such as Office 2007 embeddings also fall into this category.

Handle Types

VTHDOC, NULL

Scope

Global

Data Type

VTWORD

Data

  • SCCOPT_PROCESS_OLEEMBED_ALL : Process all embeddings in the file

  • SCCOPT_PROCESS_OLEEMBED_NONE : Process none of the embeddings in the file

  • SCCOPT_PROCESS_OLEEMBED_STANDARD (default) : Process embeddings that are known standard embeddings. These include Office 2003 versions of Word, Excel, Visio etc.

Default

SCCOPT_PROCESS_OLEEMBED_STANDARD

B.1.3.11 SCCOPT_TIMEZONE

This option allows the user to define an offset to GMT that will be applied during date formatting, allowing date values to be displayed in a selectable time zone. This option affects the formatting of numbers that have been defined as date values. This option will not affect dates that are stored as text.

Note:

This option does not apply for spreadsheet files.

Handle Types

NULL, VTHDOC

Scope

Global

Data Type

VTLONG

Data

Integer parameter from -96 to 96, representing 15-minute offsets from GMT. To query the operating system for the time zone set on the machine, specify SCC_TIMEZONE_USENATIVE.

Default

  • 0: GMT time

B.1.3.12 SCCOPT_HTML_COND_COMMENT_MODE

Some HTML includes a special type of comment that will be read by particular versions of browsers or other products. This option allows you to control which of those comments are included in the output.

Handle Type

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

  • One or more of the following values OR-ed together:

  • HTML_COND_COMMENT_NONE: Don't output any conditional comments. Note: setting any other flag will negate this.

  • HTML_COND_COMMENT_IE5: include the IE 5 comments

  • HTML_COND_COMMENT_IE6: include the IE 6 comments

  • HTML_COND_COMMENT_IE7: include the IE 7 comments

  • HTML_COND_COMMENT_IE8: include the IE 8 comments

  • HTML_COND_COMMENT_IE9: include the IE 9 comments

  • HTML_COND_COMMENT_ALL: include all conditional comments including the versions listed above and any other versions that might be in the HTML.

B.1.3.13 SCCOPT_PDF_FILTER_DROPHYPHENS

This option controls whether or not the PDF filter will drop hyphens at the end of a line. Since most PDF-generating tools create them as generic dashes, it's impossible for Outside In to know if the hyphen is a syllable hyphen or part of a hyphenated word. When this option is set to TRUE, all hyphens at the end of lines will be dropped from the extracted text.

Note:

When this option is TRUE, the character counts for the extracted text may not match the counts used for rendering where the hyphens are required for rendering. This will affect annotations in rendering APIs.

Handle Types

VTHDOC

Scope

Local

Data Type

VTBOOL

Data

  • TRUE: This setting drops hyphens from the end of all lines.

  • FALSE: This setting retains hyphens at the end of all lines.

Default

FALSE

B.1.3.14 SCCOPT_ARCFULLPATH

In the Viewer and rendering products, this option tells the archive display engine to show the full path to a node in the szNode field in response to a SCCVW_GETTREENODE message. It also causes the name fields in DAGetTreeRecord and DAGetObjectInfo to contain the full path instead of just the archive node name.

Data Type

VTBOOL

Data

  • TRUE: Display the full path.

  • FALSE: Do not display the path.

Default

FALSE

B.1.4 Compression

This section discusses compression options.

B.1.4.1 SCCOPT_FILTERLZW

This option can disable access to any files using Lempel-Ziv-Welch (LZW) compression, such as .GIF files, .ZIP files or self-extracting archive (.EXE) files containing "shrunk" files. Attempts to read such files when this option is enabled will fail and return the error SCCERR_UNSUPPORTEDCOMPRESSION.

The following is a list of file types affected when this option is disabled:

  • GIF files

  • TIF files using LZW compression

  • PDF files that use internal LZW compression

  • TAZ and TAR archives containing files that are identified as FI_UNIXCOMP

  • ZIP and self-extracting archive (.EXE) files containing "shrunk" files

  • Postscript files using LZW compression

Although this option can disable access to files in ZIP or EXE archives stored using LZW compression, any files in such archives that were stored using any other form of compression will still be accessible.

Handle Types

VTHDOC, HEXPORT

Scope

Local

Data Type

VTDWORD

Data

  • SCCVW_FILTER_LZW_ENABLED: LZW compressed files will be read normally.

  • SCCVW_FILTER_LZW_DISABLED: LZW compressed files will not be read.

Default

SCCVW_FILTER_LZW_ENABLED

B.1.5 XML

This section discusses XML options.

B.1.5.1 SCCOPT_ENABLEALLSUBOBJECTS

Oracle Outside In has an internal flag that is used to optimize several of the input filters for searching. One of the side effects of this optimization is that many embedded bitmaps, including Progressive JPEG, aren't output by the filter. SCCOPT_ENABLEALLSUBOBJECTS can override this internal optimization.

Handle Types

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

One of the following values:

  • SCCVW_FILTER_ENABLEALLSUBOBJECTS: Override the optimizations.

  • SCCVW_FILTER_NORMALSUBOBJECTS: Allow the optimizations.

Default

SCCVW_FILTER_NORMALSUBOBJECTS

B.1.5.2 SCCOPT_XML_DEF_METHOD

This option determines whether Search Export will reference a SearchML or PageML schema, DTD, or no reference when generating output. This option is not valid when SearchText or SearchHTML is the output format.

Handle Types

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

One of the following values:

  • SCCEX_XML_XDM_DTD: Document Type Definition (DTD)

  • SCCEX_XML_XDM_XSD: Extensible Schema Definition

  • SCCEX_XML_XDM_NONE: No XML definition reference

Default

SCCEX_XML_XDM_NONE

B.1.5.3 SCCOPT_XML_DEF_REFERENCE

This option allows the developer to set a particular file as the XML definition reference.

If the SCCOPT_XML_DEF_METHOD option is set to SCCEX_XML_XDM_XSD or SCCEX_XML_XDM_DTD, the value of this option will be used to reference the schema or DTD, respectively.

Handle Types

VTHDOC

Scope

Local

Data Type

Size (in bytes) of the data being passed, including a terminating NULL.

Data

The size of an array that holds WORD-sized characters terminated with a WORD-sized NULL (a UCS-2 string). The size passed is the total number of bytes that this UCS-2 string comprises. It includes in its size the bytes occupied by the terminating NULL.

Default

None

B.1.5.4 SCCOPT_XML_NULLREPLACECHAR

This option specifies a two-byte Unicode character that will be used to replace null characters if null path separators are being used. This option defaults to '/' and is valid for the SearchML 3.x, SearchHTML and SearchText output formats.

Handle Types

VTHDOC

Scope

Local

Data Type

VTWORD

Data

A two-byte Unicode character that will be used to replace null characters if null path separators are being used.

Default

0x002f = "/"

B.1.5.5 SCCOPT_XML_PAGEML_FLAGS

This option allows the developer to set flags that enable options unique to the PageML schema.

Handle Types

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

One or more of the following values bitwise OR-ed together. Note that these flags are valid ONLY for the PageML output format:

  • SCCEX_PAGEML_TEXTOUT: Include text in PageML's output.

  • SCCEX_XML_NO_XML_DECLARATION: Exclude the XML declaration in PageML's output.

Default

  • 0: All flags turned off.

B.1.5.6 SCCOPT_XML_PAGEML_PRINTERNAME

This option is Windows-specific. It is used to set which device context to use to render the pages.

It specifies, as a byte string, the name of the printer whose metrics should be used to calculate pagination information. If unspecified, the default printer will be used. The screen metrics of the system will be used if a printer is not specified and a default printer does not exist. As pagination is affected by the metrics of the device context and installed fonts, PageML XML output can vary between different systems and configurations.

Handle Types

VTHDOC

Scope

Local

Data Type

VTLPVOID

Data

A null-terminated single-byte string for the name of the printer which is the device context that should be used to render pages.

Default

  • NULL

    PageML uses the Windows default printer.

B.1.5.7 SCCOPT_XML_SEARCHML_CHAR_ATTRS

This option allows the developer to track character attributes contained in the input document and choose which are output to tags in the XML document produced.

Handle Types

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

One or more of the following values bitwise OR-ed together. Note that not all flags are valid for all Search Export output formats.

  • SCCEX_XML_SEARCHML_ALLCAPS: Valid for the SearchML 3.x output formats only.

  • SCCEX_XML_SEARCHML_BOLD: Valid for the SearchML 3.x and SearchHTML output formats only.

  • SCCEX_XML_SEARCHML_DUNDERLINE: Valid for the SearchML 3.x and SearchHTML output formats only.

  • SCCEX_XML_SEARCHML_HIDDEN: Not valid for the PageML output format.

  • SCCEX_XML_SEARCHML_ITALIC: Valid for the SearchML 3.x and SearchHTML output formats only.

  • SCCEX_XML_SEARCHML_OCE: When this flag is set, an attribute named oce is added either to <p> or <r> elements as appropriate. (This flag does not affect <unmapped> elements, which will always have an oce attribute.) The value of the attribute is a hex representation of the character set. The value is defined by our core technology, SO_ANSIUNKNOWN for instance. Possible values for this attribute appear in the vtchars.h header file. Valid for the SearchML 3.x output formats only.

  • SCCEX_XML_SEARCHML_OUTLINE: Valid for the SearchML 3.x output formats only.

  • SCCEX_XML_SEARCHML_REVISIONADD: Valid for all output formats. When set, causes added text to be output and appropriately marked.

  • SCCEX_XML_SEARCHML_REVISIONDELETE: Valid for all output formats. When set, causes deleted text to be output and appropriately marked.

  • SCCEX_XML_SEARCHML_SMALLCAPS: Valid for the SearchML 3.x output formats only.

  • SCCEX_XML_SEARCHML_STRIKEOUT: Valid for the SearchML 3.x output formats only.

  • SCCEX_XML_SEARCHML_UNDERLINE: Valid for the SearchML 3.x and SearchHTML output formats only.

Default

  • 0: All flags turned off.

B.1.5.8 SCCOPT_XML_SEARCHML_FLAGS

This option allows the developer to set flags that enable options unique to the following SearchML formats: SearchML 3.x, SearchHTML and SearchText.

This option is not valid for the PageML output format, although there is a similar PageML-specific option (SCCOPT_XML_PAGEML_FLAGS) that includes similar flags.

Handle Types

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

One or more of the following values bitwise OR-ed together. Note that not all flags are valid for all Search Export output formats:

  • SCCEX_ANNOTATIONS: When set, revised or annotated text will be designated as such. An "annotation" is a note or comment that goes along with a document, but is not really part of the document itself. Examples would be comments, footnotes, slidenotes, etc. Valid only for the SearchML 3.x output formats.

  • SCCEX_XML_ENABLEERRORINFO: When this flag is set, SearchML will output an <error> element if an error occurs while processing the main document or any sub-documents. The <error> element has one required attribute, code, which will be a hex value of the error code. The contents of the element will be a string with the description of the error returned from DAGetErrorString. Valid only for the SearchML 3.1 and later output formats.

  • SCCEX_IND_GENERATED: Includes data not originally stored as text in the input document. This can be important content the user would see when viewing the document in the original application (time and owner information in archives, numbers in spreadsheets/databases, etc.).

  • SCCEX_IND_GENERATESYSTEMMETADATA: When this flag is set, system metadata will be generated. This text is "generated" and part of the document properties, so it will be affected by SCCEX_IND_GENERATED and SCCEX_IND_SUPPRESSPROPERTIES. This information is gathered through system calls and may adversely affect performance. Valid only for the SearchML 3.x output formats.

  • SCCEX_IND_SS_CELLINFO: When this flag is set, SearchML will output a <cell> element that will encapsulate data from each non-empty cell in a spreadsheet. (NOTE: Numeric cells are considered empty unless SCCEX_IND_GENERATED is enabled.) The <cell> element will have a required attribute start which will give the location of the cell. It will also have an optional attribute end which will be used to indicate a merged cell. Both the start and end attributes will be in the form RowColumn where the Row will be a letter and Column will be a number (for example <cell start="A1">). Valid only for the SearchML 3.x output formats.

  • SCCEX_IND_SUPPRESSPROPERTIES: Document properties are not produced. Not valid for the PageML output format.

  • SCCEX_METADATAONLY: Produce only metadata.

  • SCCEX_PRODUCEURLS: Produce URL and Book Mark information when it is available. Valid only for the SearchHTML and SearchML 3.x output formats.

  • SCCEX_XML_EMBEDDINGS: Include embeddings.

  • SCCEX_XML_NO_XML_DECLARATION: Exclude the XML declaration. Valid only for the SearchML 3.x output formats.

  • SCCEX_XML_PRODUCEOBJECTINFO: When this flag is set, information for use with IOTYPE_OBJECT will be included in the <document> element. The information will correspond to the fields in the SCCDAOBJECT structure. Valid only for the SearchML 3.x output format.

  • SCCEX_XML_PSTYLENAMES: Include paragraph style name references as an attribute of paragraph tags. Valid only for the SearchML 3.x output formats.

  • SCCEX_XML_SKIPSTYLES: When possible, skip processing the style information. This should result in better performance, but certain output will no longer be available. When this flag is set and an appropriate filter is selected, character attributes, paragraph attributes, font names, and PDF Map Problem warnings will be unavailable, even if they have been requested.

    Note:

    This will only work with optimized input filters, but Microsoft Office, PDF, RTF, MSG, Mime, and HTML are included in the optimized list.

  • SCCEX_XML_SUPPRESSARCHIVESUBDOCS: Subdocuments in archives are not processed.

  • SCCEX_XML_SUPPRESSATTACHMENTS: Attachments are not processed.

Default

  • 0: All flags turned off.

B.1.5.9 SCCOPT_XML_SEARCHML_OFFSET

The value of this option is a Boolean that if set to TRUE will include offset information in the SearchML output according to the schema. If the option is set to FALSE, no offset information is produced.

Handle Types

VTHDOC, VTHEXPORT

Scope

Local

Data Type

VTBOOL

Default

FALSE

B.1.5.10 SCCOPT_XML_SEARCHML_PARA_ATTRS

This option allows the developer to track paragraph attributes contained in the input document and, optionally, include them in the XML output. All lengths are measured in twips. The values that appear in the SearchML output are the values that apply to the first content encountered in a given paragraph. For example, if the character height changes after the initial content in a paragraph, that change will be ignored. Left and first line indents are measured relative to the left page margin. The right indent is measured relative to the right page margin.

This option only affects SearchML output. The option is not valid for the SearchHTML, SearchText and PageML output flavors.

Handle Types

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

One or more of the following values bitwise OR-ed together:

  • SCCEX_XML_SEARCHML_SPACING

  • SCCEX_XML_SEARCHML_HEIGHT

  • SCCEX_XML_SEARCHML_LEFTINDENT

  • SCCEX_XML_SEARCHML_RIGHTINDENT

  • SCCEX_XML_SEARCHML_FIRSTINDENT

Default

  • 0: All flags turned off.

B.1.5.11 SCCOPT_XML_SEARCHML_UNMAPPEDTEXT

This option allows for the production of unmapped text (the original code points from the input document). A new <unmapped> element will be produced to enclose this text. The <unmapped> element will contain base64-encoded text. It will also contain two attributes. "OCE" will contain a hex value representing the character set. "font" will contain a string value of the original font name. This is necessary for non-standard encodings such as wingdings or webdings. This option is only valid in the SearchML 3.2 (and higher) schema.

Handle Type

VTHDOC

Scope

Local

Data Type

VTDWORD

Data

One of the following values:

  • SCCEX_XML_JUST_UNMAPPEDTEXT: Output just the unmapped text

  • SCCEX_XML_NO_UNMAPPEDTEXT: Don't output any unmapped text.

  • SCCEX_XML_BOTH_UNMAPPEDTEXT: Output both the original and the unmapped text.

Default

  • SCCEX_XML_NO_UNMAPPEDTEXT

B.1.6 File System

This section discusses file system options.

B.1.6.1 SCCOPT_IO_BUFFERSIZE

This set of three options allows the user to adjust buffer sizes to tailor memory usage to the machine's ability. The numbers specified in these options are in kilobytes. These are advanced options that casual users of Search Export may ignore.

Handle Type

NULL, VTHDOC

Scope

Global

Data Type

SCCBUFFEROPTIONS Structure

Data

A buffer options structure

B.1.6.1.1 SCCBUFFEROPTIONS Structure
typedef struct SCCBUFFEROPTIONStag
{
   VTDWORD dwReadBufferSize;    /* size of the I/O Read buffer 
                                in KB */
   VTDWORD dwMMapBufferSize;    /* maximum size for the I/O   
                                Memory Map buffer in KB */
   VTDWORD dwTempBufferSize;    /* maximum size for the memory-
                                mapped temp files in KB */
   VTDWORD dwFlags;             /* use flags */
} SCCBUFFEROPTIONS, *PSCCBUFFEROPTIONS;

Parameters

  • dwReadBufferSize: Used to define the number of bytes that will read from disk into memory at any given time. Once the buffer has data, further file reads will proceed within the buffer until the end of the buffer is reached, at which point the buffer will again be filled from the disk. This can lead to performance improvements in many file formats, regardless of the size of the document.

  • dwMMapBufferSize: Used to define a maximum size that a document can be and use a memory-mapped I/O model. In this situation, the entire file is read from disk into memory and all further I/O is performed on the data in memory. This can lead to significantly improved performance, but note that either the entire file can be read into memory, or it cannot. If both of these buffers are set, then if the file is smaller than the dwMMapBufferSize, the entire file will be read into memory; if not, it will be read in blocks defined by the dwReadBufferSize.

  • dwTempBufferSize: The maximum size that a temporary file can occupy in memory before being written to disk as a physical file. Storing temporary files in memory can boost performance on archives, files that have embedded objects or attachments. If set to 0, all temporary files will be written to disk.

  • dwFlags

    • SCCBUFOPT_SET_READBUFSIZE 1

    • SCCBUFOPT_SET_MMAPBUFSIZE 2

    • SCCBUFOPT_SET_TEMPBUFSIZE 4

    To set any of the three buffer sizes, set the corresponding flag while calling dwSetOption.

Default

The default settings for these options are:

  • #define SCCBUFOPT_DEFAULT_READBUFSIZE 2: A 2KB read buffer.

  • #define SCCBUFOPT_DEFAULT_MMAPBUFSIZE 8192: An 8MB memory-map size.

  • #define SCCBUFOPT_DEFAULT_TEMPBUFSIZE 2048: A 2MB temp-file limit.

Minimum and maximum sizes for each are:

  • SCCBUFOPT_MIN_READBUFSIZE 1: Read one Kbyte at a time.

  • SCCBUFOPT_MIN_MMAPBUFSIZE 0: Don't use memory-mapped input.

  • SCCBUFOPT_MIN_TEMPBUFSIZE 0: Don't use memory temp files

  • SCCBUFOPT_MAX_READBUFSIZE 0x003fffff, SCCBUFOPT_MAX_MMAPBUFSIZE 0x003fffff, SCCBUFOPT_MAX_TEMPBUFSIZE 0x003fffff: These maximums correspond to the largest file size possible under the 4GB DWORD limit.

B.1.6.2 SCCOPT_TEMPDIR

From time to time, the technology needs to create one or more temporary files. This option sets the directory to be used for those files.

It is recommended that this option be set as part of a system to clean up temporary files left behind in the event of abnormal program termination. By using this option with code to delete files older than a predefined time limit, the OEM can help to ensure that the number of temporary files does not grow without limit.

Note:

This option will be ignored if SCCOPT_REDIRECTTEMPFILE is set.

Handle Types

NULL, VTHDOC

Scope

Global

Data Type

SCCUTTEMPDIRSPEC structure

B.1.6.2.1 SCCUTTEMPDIRSPEC Structure

This structure is used in the SCCOPT_TEMPDIR option.

SCCUTTEMPDIRSPEC is a C data structure defined in sccvw.h as follows:

typedef struct SCCUTTEMPDIRSPEC
{
   VTDWORD   dwSize;
   VTDWORD   dwSpecType;
   VTBYTE    szTempDirName[SCCUT_FILENAMEMAX];
} SCCUTTEMPDIRSPEC,  * LPSCCUTTEMPDIRSPEC;

There is currently a limitation. dwSpecType describes the contents of szTempDirName. Together, dwSpecType and szTempDirName describe the location of the source file. The only dwSpecType values supported at this time are:

  • IOTYPE_ANSIPATH: Windows only. szTempDirName points to a NULL-terminated full path name using the ANSI character set and FAT 8.3 (Win16) or NTFS (Win32 and Win64) file name conventions.

  • IOTYPE_UNICODEPATH: Windows only. szTempDirName points to a NULL-terminated full path name using the Unicode character set and NTFS file name conventions. Note that the length of the path name is limited to SCCUT_FILENAMEMAX bytes, or (SCCUT_FILENAMEMAX / 2) double-byte Unicode characters.

  • IOTYPE_UNIXPATH: X Windows on UNIX platforms only. szTempDirName points to a NULL-terminated full path name using the system default character set and UNIX path conventions.

Specifically not supported at this time is IOTYPE_REDIRECT.

Parameters

  • dwSize: Set to sizeof(SCCUTTEMPDIRSPEC).

  • dwSpecType: IOTYPE_ANSIPATH, IOTYPE_UNICODE or IOTYPE_UNIXPATH

  • szTempDirName: The path to the directory to use for the temporary files. Note that if all SCCUT_FILENAMEMAX bytes in the buffer are filled, there will not be space left for file names.

Default

The system default directory for temporary files. On UNIX systems, this is the value of environment variable $TMP. On Windows systems, it is the value of environment variable %TMP%.

B.1.6.3 SCCOPT_DOCUMENTMEMORYMODE

This option determines the maximum amount of memory that the chunker may use to store the document's data, from 4 MB to 1 GB. The more memory the chunker has available to it, the less often it needs to re-read data from the document.

Handle Types

NULL, VTHDOC

Scope

Global

Data Type

VTDWORD

Parameters

  • SCCDOCUMENTMEMORYMODE_SMALLEST 1 - 4MB

  • SCCDOCUMENTMEMORYMODE_SMALL 2 - 16MB

  • SCCDOCUMENTMEMORYMODE_MEDIUM 3 - 64MB

  • SCCDOCUMENTMEMORYMODE_LARGE 4 - 256MB

  • SCCDOCUMENTMEMORYMODE_LARGEST 5 - 1 GB

Default

SCCDOCUMENTMEMORYMODE_SMALL 2 - 16MB

B.1.6.4 SCCOPT_REDIRECTTEMPFILE

This option is set when the developer wants to use redirected IO to completely take over responsibility for the low level IO calls of the temp file.

Handle Types

NULL, VTHDOC

Scope

Global (not persistent)

Data Type

VTLPVOID: pCallbackFunc

Function pointer of the redirect IO callback.

Redirect call back function:

typedef
{
     VTDWORD (* REDIRECTTEMPFILECALLBACKPROC)
     (HIOFILE *phFile, 
     VTVOID *pSpec, 
     VTDWORD dwFileFlags);

There is another option to handle the temp directory, SCCOPT_TEMPDIR. Only one of these two can be set by the developer. The SCCOPT_TEMPDIR option will be ignored if SCCOPT_REDIRECTTEMPFILE is set. These files may be safely deleted when the Close function is called.

B.2 Search Export SOAP Options

These options are available to the developer when using the export engine through the Transformation Server API.

This chapter details the Web Services implementation of options in Transformation Server. However, there are links to API-specific information for the C and JAVA client interfaces to the technology within each of the following sections.

B.2.1 How Options Work

An option is defined by an identifier and an associated value. The identifier (hOptions) indicates what particular option is being specified. The option value data must be in a form that conforms to the set of supported data types.

Note that it is not necessarily an error to specify options that are not understood by the export engine, but some transformation engines may require that certain options be specified.

B.2.2 Character Mapping

This section discusses character mapping options.

B.2.2.1 defaultInputCharset

This option is used in cases where Oracle Outside In cannot determine the character set used to encode the text of an input file. When all other means of determining the file's character set are exhausted, Oracle Outside In will assume that an input document is encoded in the character set specified by this option. This is most often used when reading plain-text files, but may also be used when reading HTML or PDF files.

When the "extended test for text" is enabled (see Section B.2.4.2, "extendedTestForText"), this option will still apply to plain-text input files that are not identified as EBCDIC or Unicode.

This option supersedes the fallbackFormat option for selecting the character set assumed for plain-text files. For backwards compatibility, use of deprecated character-set -related values is still currently supported for fallbackFormat, though internally such values will be translated into equivalent values for the defaultInputCharset. As a result, if an application were to set both options, the last such value set for either option will be the value that takes effect.

Data Type

DefaultInputCharSet

Data

The SOAP representation of the character set to use, from the values in defaultInputCharSetEnum.

B.2.2.2 unmappableCharacter

This option selects the character used when a character is not a valid Unicode character, or does not conform to the XML specification for valid characters. This option takes the Unicode value for the replacement character. If you are using the PageML output format, this option is only valid if the textOutOn option is set.

Data Type

xsd:unsignedShort

Data

The Unicode value for the character to use.

Default

  • 0xfffd

Links

  • C Client Implementation: XSD_unsignedShort

  • JAVA Client Implementation: UnsignedShort

B.2.3 Output

This section discusses output options.

B.2.3.1 preferOITRendering

This option is only valid on the Linux (Red Hat and Suse) and Solaris Sparc platforms.

This option is only valid when PageML is the output format.

When this option is set to true, the technology will attempt to use its internal graphics code to render fonts and graphics. When set to false, the technology will render images using the operating system's native graphics subsystem (X11 on UNIX/Linux platforms). Note that this option only works when at least one of the appropriate output solutions is present. For example, if the UNIX $DISPLAY variable does not point to a valid X Server, but the OSGD and/or WV_GD modules required for the Oracle Outside In output solution exist, Oracle Outside In will default to the Oracle Outside In rendering code. The option will fail if neither of these output solutions is present.

It is important for the system to be able to locate useable fonts when this option is set to true. Only TrueType fonts (*.ttf or *.ttc files) are currently supported. To ensure that the system can find them, make sure that the environment variable GDFONTPATH includes one or more paths to these files. If the variable GDFONTPATH can't be found, the current directory is used. If fonts are called for and cannot be found, Search Export will exit with an error. Also note that when copying Windows fonts to a UNIX system, the font extension for the files (*.ttf or *.ttc) must be lowercase, or they will not be detected during the search for available fonts. Oracle does not provide fonts with any Oracle Outside In product.

If preferOITRendering is set in a particular instance of tsagent, it cannot be changed in that agent until the agent is terminated.

Data Type

xsd:boolean

Data

One of the following values:

  • true: Use the technology's internal graphics rendering code to produce bitmap output files whenever possible.

  • false: Use the operating system's native graphics subsystem.

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.4 Input Handling

This section discusses input handling options.

B.2.4.1 fallbackFormat

This option controls how files are handled when their specific application type cannot be determined. This normally affects all plain-text files, because plain-text files are generally identified by process of elimination, for example, when a file isn't identified as having been created by a known application, it is treated as a plain-text file.

A number of values that were formerly allowed for this option have been deprecated. Specifically, the values that selected specific plain-text character sets are no longer to be used. Instead, applications should use the defaultInputCharset option for such functionality.

Data Type

FallbackFormatEnum

Data

One of the following values:

  • fallbackToText: Unidentified file types will be treated as text files.

  • noFallbackFormat: Oracle Outside In will not attempt to process files whose type cannot be identified. This will include text files. When this option is selected, an attempt to process a file of unidentified type will cause Oracle Outside In to return an error value of SCCERR_UNSUPPORTEDFORMAT.

Default

  • ASCII-8

Links

  • C Client Implementation: OIT_FallbackFormatEnum

  • JAVA Client Implementation: FallbackFormatEnum

B.2.4.2 extendedTestForText

This option affects how an input file's internal format (application type) is identified when the file is first opened by the Oracle Outside In technology. When the extended test flag is in effect, and an input file is identified as being either 7-bit ASCII, EBCDIC, or Unicode, the file's contents will be interpreted as such by the export process.

The extended test is optional because it requires extra processing and cannot guarantee complete accuracy (which would require the inspection of every single byte in a file to eliminate false positives.)

Data Type

xsd:boolean

Data

One of the following values:

  • false: This is the default value. When this is set, standard file identification behavior occurs.

  • true: If set, the File Identification code will run an extended test on all files that are not identified.

Default

  • false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.4.3 ignorePassword

This option can disable the password verification of files where the contents can be processed without validation of the password. If this option is not set, the filter should prompt for a password if it handles password-protected files.

As of Release 8.4.0, only the PST and MDB Filters support this option.

Data Type

xsd:boolean

Data

  • true: Ignore validation of the password

  • false: Prompt for the password

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.4.4 oleEmbeddings

Microsoft Powerpoint versions from 1997 through 2003 had the capability to embed OLE documents in the Powerpoint files. This option controls which embeddings are to be processed as native (OLE) documents and which are processed using the alternate graphic.

Note:

The Microsoft Powerpoint application sometimes does embed known Microsoft OLE embeddings (such as Visio, Project) as an "Unknown" type. To process these embeddings, the processAll option is required. Post Office-2003 products such as Office 2007 embeddings also fall into this category.

Data Type

OleEmbeddingsEnum

Data

  • processAll: Process all embeddings in the file.

  • processNone: Process none of the embeddings in the file

  • processStandard: Process embeddings that are known standard embeddings.

Default

processStandard

Links

  • C Client Implementation: OIT_OleEmbeddingsEnum

  • JAVA Client Implementation: OleEmbeddingsEnum

B.2.4.5 parseXMPMetaData

Adobe's Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. This option enables parsing of the XMP data into normal OIT document properties. Enabling this option may cause the loss of some regular data in premium graphics filters (such as Postscript), but won't affect most formats (such as PDF).

Data Type

xsd:boolean

Data

  • true: This setting enables parsing XMP.

  • false: This setting disables parsing XMP.

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.4.6 reorderBIDI

This option controls whether or not the PDF filter will attempt to reorder bidirectional text runs so that the output is in standard logical order as used by the Unicode 2.0 and later specification. This additional processing will result in slower filter performance according to the amount of bidirectional data in the file.

Data Type

xsd:boolean

Data

  • true: The PDF filter uses standard ordering.

  • false: The PDF filter will attempt to reorder bidirectional text runs.

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.4.7 timezone

This option allows the user to define an offset to GMT that will be applied during date formatting, allowing date values to be displayed in a selectable time zone. This option affects the formatting of numbers that have been defined as date values (e.g., most dates in spreadsheet cells). This option will not affect dates that are stored as text.

Note:

This option does not apply for spreadsheet files.

Data Type

xsd:int

Data

Integer parameter from -96 to 96, representing 15-minute offsets from GMT. To query the operating system for the time zone set on the machine, specify the numeric value of 61440 (0xF000 in hexadecimal).

Default

  • 0: GMT time

Links

  • C Client Implementation: XSD_int

  • JAVA Client Implementation: Integer

B.2.4.8 extractXMPMetaData

Adobe's Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. This option enables the XMP feature, which does not interpret the XMP metadata, but passes it straight through without any interpretation.

Data Type

xsd:boolean

Data

  • true

  • false

Default

  • false

B.2.4.9 htmlCondCommentIE5On

This option allows you to display content customized for Internet Explorer 5.

Data Type

xsd_boolean

Default

0: off

Links

C Client Implementation: VTBOOL

JAVA Client Implementation: boolean

B.2.4.10 htmlCondCommentIE6On

This option allows you to display content customized for Internet Explorer 6.

Data Type

xsd_boolean

Default

0: off

Links

C Client Implementation: VTBOOL

JAVA Client Implementation: boolean

B.2.4.11 htmlCondCommentIE7On

This option allows you to display content customized for Internet Explorer 7.

Data Type

xsd_boolean

Default

0: off

Links

C Client Implementation: VTBOOL

JAVA Client Implementation: boolean

B.2.4.12 htmlCondCommentIE8On

This option allows you to display content customized for Internet Explorer 8.

Data Type

xsd_boolean

Default

0: off

Links

C Client Implementation: VTBOOL

JAVA Client Implementation: boolean

B.2.4.13 htmlCondCommentIE9On

This option allows you to display content customized for Internet Explorer 9.

Data Type

xsd_boolean

Default

0: off

Links

C Client Implementation: VTBOOL

JAVA Client Implementation: boolean

B.2.4.14 htmlCondCommentAllOn

This option allows you to display all conditional comments.

Data Type

xsd_boolean

Default

0: off

Links

C Client Implementation: VTBOOL

JAVA Client Implementation: boolean

B.2.5 Compression

This section discusses compression options.

B.2.5.1 allowLZW

This option can disable access to any files using Lempel-Ziv-Welch (LZW) compression, such as .GIF files, .ZIP files or self-extracting archive (.EXE) files containing "shrunk" files. Attempts to read such files when this option is enabled will fail and return the error SCCERR_UNSUPPORTEDCOMPRESSION.

The following is a list of file types affected when this option is disabled:

  • GIF files

  • TIF files using LZW compression

  • PDF files that use internal LZW compression

  • TAZ and TAR archives containing files that are identified as FI_UNIXCOMP

  • ZIP and self-extracting archive (.EXE) files containing "shrunk" files

  • Postscript files using LZW compression

Although this option can disable access to files in ZIP or EXE archives stored using LZW compression, any files in such archives that were stored using any other form of compression will still be accessible.

Data Type

xsd:boolean

Data

  • true: LZW compressed files will be read and written normally.

  • false: LZW compressed files will not be read or written.

Default

true

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6 XML

This section pertains to XML options.

B.2.6.1 allCapsOn

When set, causes capitalized text to be output and appropriately marked. Valid for the SearchML 3.x output formats only.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.2 boldOn

When set, causes bold text to be output and appropriately marked. Not valid for the SearchText and PageML output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.3 cellInfoOn

When set, SearchML will output a <cell> element that will encapsulate data from each non-empty cell in a spreadsheet. (Note: Numeric cells are considered empty unless FI DOCS NO HP BUILDING(3.7) is enabled. ) The <cell> element will have a required attribute start which will give the location of the cell. It will also have an optional attribute end which will be used to indicate a merged cell. Both the start and end attributes will be in the form RowColumn where the Row will be a letter and Column will be a number (for example, <cell start="A1">). Valid only for the SearchML 3.x output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.4 changeNumbertoTextOn

Includes data not originally stored as text in the input document. This can be important content the user would see when viewing the document in the original application (time and owner information in archives, numbers in spreadsheets/databases, etc.). Valid for all output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.5 documentPropertiesOn

When set, document properties are included inthe output. Default value is false. Not valid for the PageML output format.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.6 doubleUnderlineOn

When set, causes double-underlined text to be included in the output and appropriately marked. Not valid for the SearchText and PageML output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.7 embeddingsOn

Include embeddings. Not valid for the PageML output format.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.8 errorInfoOn

When this flag is set, SearchML will output an <error> element if an error occurs while processing the main document or any sub-documents. The <error> element has one required attribute, code, which will be a hex value of the error code. The contents of the element will be a string with the description of the error returned from DAGetErrorString. Valid only for the SearchML 3.x output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.9 generateSystemMetaDataOn

When this flag is set, system metadata will be generated. This text is "generated" and is part of the document properties. This information is gathered through system calls and may adversely affect performance. Valid only for the SearchML 3.x output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.10 hiddenOn

Include hidden text in the output. Not valid for the PageML output format.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.11 italicOn

Include italic text in the output. Not valid for the SearchText and PageML output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.12 metadataOnlyOn

Produce only metadata. Not valid for the PageML output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.13 originalCharsetOn

When this option is set, an attribute named oce is added either to <p> or <r> elements as appropriate. The value of the attribute is a hex representation of the character set. The value is defined by our core technology, SO_ANSIUNKNOWN for instance. Possible values for this attribute appear in the vtchars.h header file. Valid for the SearchML 3.x output formats only.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.14 outlineOn

Include outlined text in the output. Valid for the SearchML 3.x output formats only.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.15 produceURLsOn

Produce URL information when it is available. Valid for the SearchML 3.x and SearchHTML output formats only.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.16 revisionAddOn

When set, causes added text to be output and appropriately marked. Valid for all output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.17 revisionDeleteOn

When set, causes deleted text to be output and appropriately marked. Valid for all output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.18 revisionsOn

When set, revised or annotated text will be designated as such. Valid only for the SearchML 3.x output format.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.19 smallCapsOn

When set, causes text in small caps to be output and appropriately marked. Valid for the SearchML output format only.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.20 strikeoutOn

When set, causes strikeout text to be output and appropriately marked. Valid for the SearchML 3.x output formats only.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.21 underlineOn

When set, causes underlined text to be output and appropriately marked. Valid for the SearchML 3.x output formats only.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.22 xmlDefinitionMethod

This option determines whether Search Export will reference a SearchML or PageML schema, DTD, or no reference when generating output. This option is not valid when SearchText or SearchHTML is the output format.

Data Type

XMLDefinitionMethodEnum

Data

One of the following values:

  • dtd: Document Type Definition (DTD)

  • xsd: Extensible Schema Definition

  • noDefinition: No XML definition reference

Default

noDefinition

Links

  • C Client Implementation: OIT_XmlDefinitionMethodEnum

  • JAVA Client Implementation: XmlDefinitionMethodEnum

B.2.6.23 xmlDefinitionLocation

This option allows the developer to set a particular file as the XML definition reference.

If the xmlDefinitionMethod option is set to xsd or dtd, the value of this option will be used to reference the schema or DTD, respectively.

Data Type

xsd:string

Data

A UTF-8 encoded string specifying the location of an xsd or dtd file. If using the C API, this string must be a null-terminated array of single-byte characters.

Default

None

Links

  • C Client Implementation: XSD_string

  • JAVA Client Implementation: String

B.2.6.24 nullReplacementCharacter

This option specifies a two-byte Unicode character that will be used to replace null characters if null path separators are being used. This option defaults to '/' and is valid for the SearchML 3.x, SearchHTML and SearchText output formats.

Data Type

xsd:unsignedShort

Data

A two-byte Unicode character that will be used to replace null characters if null path separators are being used.

Default

0x002f = "/"

Links

  • C Client Implementation: XSD_unsignedShort

  • JAVA Client Implementation: UnsignedShort

B.2.6.25 printerName

This option is Windows-specific. It is used to set which device context to use to render the pages.

It specifies, as a byte string, the name of the printer whose metrics should be used to calculate pagination information. If unspecified, the default printer will be used. The screen metrics of the system will be used if a printer is not specified and a default printer does not exist. As pagination is affected by the metrics of the device context and installed fonts, PageML XML output can vary between different systems and configurations.

Data Type

xsd:string

Data

A null-terminated single-byte string for the name of the printer which is the device context that should be used to render pages.

Default

  • null: PageML uses the Windows default printer.

Links

  • C Client Implementation: XSD_string

  • JAVA Client Implementation: String

B.2.6.26 paragraphStyleNamesOn

Include paragraph style name references as an attribute of paragraph tags. Valid for the SearchML 3.x output formats only.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.27 includeTextOffsets

The value of this option is a Boolean that if set to true will include offset information in the SearchML output according to the schema. If the option is set to false, no offset information is produced.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.28 paragraphAttributes

This option allows the developer to track paragraph attributes contained in the input document and, optionally, include them in the XML output. All lengths are measured in twips. The values that appear in the SearchML output are the values that apply to the first content encountered in a given paragraph. For example, if the character height changes after the initial content in a paragraph, that change will be ignored. Left and first line indents are measured relative to the left page margin. The right indent is measured relative to the right page margin.

Data Type

ParagraphAttributes

Data

The paragraphAttributes option is a complexType data structure composed of Boolean variables, which may be switched on or off in any combination. The variables are:

  • spacing

  • height

  • leftIndent

  • rightIndent

  • firstIndent

Default

  • 0: All flags set to false.

Links

  • C Client Implementation: OIT_ParagraphAttributes

  • JAVA Client Implementation: ParagraphAttributes

B.2.6.29 unmappedText

This option allows for the production of unmapped text (the original code points from the input document). A new <unmapped> element will be produced to enclose this text. The <unmapped> element will contain base64-encoded text. It will also contain two attributes. "OCE" will contain a hex value representing the character set. "Font" will contain a string value of the original font name. This is necessary for non-standard encodings such as wingdings or webdings. This option is only valid in the SearchML 3.2 (and higher) schema.

Data Type

SearchMLUnmappedTextEnum

Data

One of the following values:

  • justUnmappedText: Output just the unmapped text

  • noUnmappedText: Don't output any unmapped text.

  • bothUnmappedText: Output both the original and the unmapped text.

Default

  • noUnmappedText

Links

  • C Client Implementation: OIT_SearchMLUnmappedTextEnum

  • JAVA Client Implementation: SearchMLUnmappedTextEnum

B.2.6.30 suppressArchiveSubDocsOn

Subdocuments in archives are not processed. Not valid for the PageML output format.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.31 suppressAttachmentsOn

Attachments are not processed. Not valid for the PageML output format.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.32 textOutOn

This option is valid only for the PageML output format.

When set to true, include text in the PageML output.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.6.33 xmlDeclarationOff

Exclude the XML declaration. Not valid for the SearchText and SearchHTML output formats.

Data Type

xsd:boolean

Default

false

Links

  • C Client Implementation: XSD_boolean

  • JAVA Client Implementation: Boolean

B.2.7 File System

This section applies to file system options.

B.2.7.1 fileAccess

This option supplies information to OIT when information is required to open an input file. This information may be the password of the file or a support file location.

Further information about how Transformation Server implements this option will be forthcoming.

B.2.7.2 readBufferSize

Used to define the number of bytes that that will read from disk into memory at any given time. Once the buffer has data, further file reads will proceed within the buffer until the end of the buffer is reached, at which point the buffer will again be filled from the disk. This can lead to performance improvements in many file formats, regardless of the size of the document.

Data Type

xsd:unsignedInt

Data

The size of the buffer in kilobytes.

Default

2

Links

  • C Client Implementation: XSD_unsignedInt

  • JAVA Client Implementation: UnsignedInt

B.2.7.3 memoryMappedInputSize

Used to define a maximum size that a document can be and use a memory-mapped I/O model. In this situation, the entire file is read from disk into memory and all further I/O is performed on the data in memory. This can lead to significantly improved performance, but note that either the entire file can be read into memory, or it cannot. If both of these buffers are set, then if the file is smaller that the dwMMapBufferSize, the entire file will be read into memory, if not, it will be read in blocks defined by the dwReadBufferSize.

Data Type

xsd:unsignedInt

Data

The size of the buffer in kilobytes.

Default

8192

Links

  • C Client Implementation: XSD_unsignedInt

  • JAVA Client Implementation: UnsignedInt

B.2.7.4 tempBufferSize

The maximum size that a temporary file can occupy in memory before being written to disk as a physical file. Storing temporary files in memory can boost performance on archives, files that have embedded objects or attachments. If set to 0, all temporary files will be written to disk.

Data Type

xsd:unsignedInt

Data

The size of the buffer in kilobytes.

Default

2048

Links

  • C Client Implementation: XSD_unsignedInt

  • JAVA Client Implementation: UnsignedInt