11 Search Export Java Classes

This chapter describes Search Export Java classes.

The following classes are covered:

11.1 ArchiveNode Class

ArchiveNode provides information about an archive node. This is a read-only class where the technology fills in all the values.

Namespace

com.oracle.outsidein

Accessors

boolean isFolder() - A value of true indicates that the record is an archive node.
int getFileSize() - File size of the archive node
java.util.Date getTime() - Time the archive node was created
int getNodeNum() - Serial number of the archive node in the archive
String getNodeName() - The name of the archive node

11.2 Exporter Interface

This section describes the properties and methods of Exporter.

All of Outside In's Exporter functionality can be accessed through the Exporter Interface. The object returned by OutsideIn class is an implementation of this interface. This class derives from the Document Interface, which in turn is derived from the OptionsCache Interface.

Namespace

com.oracle.outsidein

Methods

getExportStatus
```
ExportStatus getExportStatus()
```
This function is used to determine if there were conversion problems during an export. The ExportStatus object returned may have information about sub-document failures, areas of a conversion that may not have high fidelity with the original document. When applicable the number of pages in the output is also provided.
newSubDocumentExporter
```
Exporter newSubDocumentExporter(
      int SubDocId,
      SubDocumentIdentifierTypeValue idType
) throws OutsideInException
```
Create a new Exporter for a subdocument.

SubDocId: Identifier of the subdocument

idType: Type of subdocument

SubDocumentIdentifierTypeValue: This is an enumeration for the type of subdocument being opened.
- XMLEXPORTLOCATOR: Subdocument to be opened is based on output of XML Export (SubdocId is the value of the object_id attribute of a locator element.)
- ATTACHMENTLOCATOR: Subdocument to be opened is based on the locator value provided by the one of the Export SDKs.
- EMAILATTACHMENTINDEX: Subdocument to be opened is based on the index of the attachment from an email message. (SubdocId is the zero-based index of the attachment from an email message file. The first attachment presented by OutsideIn has the index value 0, the second has the index value 1, etc.)
Returns: A new Exporter object for the subdocument
newSubObjectExporter
```
Exporter newSubObjectExporter(
      SubObjectTypeValue objType,
      int data1,
      int data2,
      int data3,
      int data4
) throws OutsideInException
```
Create a new Exporter for a subobject.

objType: Type of subobject

data1: Data identifying the subobject from SearchML

data2: Data identifying the subobject from SearchML

data3: Data identifying the subobject from SearchML

data4: Data identifying the subobject from SearchML

Returns: A new Exporter object for the subobject

SubObjectTypeValue: An enumeration to describe the type of SubObject to open.
- LinkedObject
- EmbeddedObject
- CompressedFile
- Attachment
newArchiveNodeExporter
```
Exporter newArchiveNodeExporter(
      int dwRecordNum
) throws OutsideInException
```
Create a new Exporter for an archive node. You may get the number of nodes in an archive using getArchiveNodeCount. The nodes are numbered from 0 to getArchiveNodeCount -1.

dwRecordNum: The number of the record to retrieve information about. The first node is node 0 and the total number of nodes may be obtained from getArchiveNodeCount.

Returns: A new Exporter object for the archive node
newArchiveNodeExporter with Search Export Data
```
Exporter newArchiveNodeExporter(
      int flags,
      int params1,
      int params2
) throws OutsideInException
```
Create a new Exporter for an archive node. To use this function, you must first process the archive with Search Export and save the Node data for later use in this function.

Flags: Special flags value from Search Export

Params1: Data1 from Search Export

Params2: Data2 from Search Export

Returns: A new Exporter object for the archive node
export
```
void export() throws OutsideInException
```
Perform the conversion and close the export process keeping the source document open.
```
void export(boolean bLeaveSourceOpen) throws OutsideInException
```
Perform the conversion and keep the source document open or close it based on the value of bLeaveSourceOpen.

bLeaveSourceOpen: If set to true, keeps the source document open for next export process.

Note:
Before Release 8.5.3, calling Export() with no parameters, would leave the source document open. The default behavior starting with Release 8.5.3 is to close the document after exporting the file. If you would like to keep the file open for other conversions, use this method with "bLeaveSourceOpen" set to true.

setDestinationFile
```
OptionsCache setDestinationFile(
      String filename
) throws OutsideInException
```
Set the location of the destination file

filename: Full path to the destination file

Returns: The updated options object
setExportTimeout
```
OptionsCache setExportTimeout(int millisecondsTimeout)
 
```
This method sets the time that the export process should wait for a response from the Outside In export engine to complete the export of a document, setting an upper limit on the time that will elapse during a call to export(). If the specified length of time is reached before the export has completed, the export operation will be terminated and an OutsideInException will be thrown. If this option is not set, the default timeout is 5 minutes.
newLocalExporter
```
static Exporter newLocalExporter(Exporter source)
```
This method creates and returns an instance of an Exporter object based on the source Exporter. All the options of source are copied to the new Exporter. The source and destination file information will not be copied.

11.2.1 Document Interface

All of the Outside In document-related methods are accessed through the Document Interface.

Namespace

com.oracle.outsidein

Methods

close
```
void close()
```
Closes the currently open document.
getArchiveNodeCount
```
int getArchiveNodeCount() throws OutsideInException
```
Retrieves the number of nodes in an archive file.

Returns the number of nodes in the archive file or 0 if the file is not an archive file.
getFileId
```
FileFormat getFileId(FileIdInfoFlagValue dwFlags) throws OutsideInException
```
Gets the format of the file based on the technology's content-based file identification process.

dwFlags: Option to retrieve the file identification pre-Extended or post-Extended Test

Returns the format identifier of the file.
getObjectInfo
```
ObjectInfo getObjectInfo() throws OutsideInException
```
Retrieves the information about an embedded object.

Return: An ObjectInfo object with the information about the embedded object
getArchiveNode
```
TreeRecord getArchiveNode(int nNodeNum) throws OutsideInException
```
Retrieves information about a record in an archive file. You may get the number of nodes in an archive using getArchiveNodeCount.

nNodeNum: The number of the record to retrieve information about. The first node is node 0.

Return Value: An ArchiveNode object with the information about the record
saveArchiveNode
```
void saveArchiveNode(
      int nNodeNum,
      File file) throws OutsideInException
```
Extracts a record in an archive file to disk.

nNodeNumType: The number of the record to retrieve information about. The first node is node 0.

file: The destination file to which the file will be extracted.
saveArchiveNode with Search Export Flags
```
void saveArchiveNode(
      int flags,
      int params1,
      int params2,
      File file) throws OutsideInException
```
Extracts a record in an archive file to disk without reading the data for all nodes in the archive in a sequential order. To use this function, you must first process the archive with Search Export and save the Node data for later use in this function. setOpenForNonSequentialAccess must be set to TRUE to use this function.

flagsType: Special flags value from Search Export

params1: Data1 from Search Export

params2: Data2 from Search Export

file: The destination file to which the file will be extracted
setSourceFile
```
OptionsCache setSourceFile( String filename) throws OutsideInException
```
Set the source document.

filename: Full path of the source document

Returns: The options cache object associated with this document

11.2.2 SeekableByteChannel6 Interface

Enables API users to handle I/O for the source and destination documents. Implement this interface to control I/O operations such as reading, writing, and seeking. This interface mimics the java.nio.channels.SeekableByteChannel interface which is only available in Java 7 and later. Note that SeekableByteChannel6 will be removed in favor of java.nio.channels.SeekableByteChannel if support for Java 6 is dropped in a future release of the Outside In Java API. Until then, this interface must be used if redirected I/O is required.

Namespace

com.oracle.outsidein

Methods

Get position
```
long position()
```
Returns this channel's position.

Set position

SeekableByteChannel6 position(long newPosition)

Sets this channel's position.

read
```
int read(java.nio.ByteBuffer dst)
```
Reads a sequence of bytes from this channel into the given buffer. Bytes are read starting at this channel's current position, and then the position is updated with the number of bytes actually read.
size
```
long size()
```
Returns the current size of the entity to which this channel is connected.
truncate
```
SeekableByteChannel6 truncate(long size)
```
Truncates the entity, to which this channel is connected, to the given size. Never invoked by Outside In and may be implemented by just returning this.
write
```
int write(java.io.nio.ByteBuffer src)
```
Writes a sequence of bytes to this channel from the given buffer. Bytes are written starting at this channel's current position. The entity to which the channel is connected is grown, if necessary, to accommodate the written bytes, and then the position is updated with the number of bytes actually written.
close
```
void close()
```
Closes this channel. If this channel is already closed then invoking this method has no effect.
isOpen
```
boolean isOpen()
```
Tells whether or not this channel is open.

11.2.3 OptionsCache Class

This section describes the OptionsCache class.

The options that configure the way outputs are generated are accessed through the OptionsCache class.

All of the options described in the following subsections are available through this interface. Other methods in this interface are described below.

Namespace

com.oracle.outsidein.options

Methods

OptionsCache setSourceFile(File file) throws OutsideInException

Sets the source document to be opened.

file: Full path to source file
OptionsCache setSourceFile(SeekableByteChannel6 redirect) throws OutsideInException

Sets an object that implements SeekableByteChannel6 to be used as the source document. Exporting a file using this method may have issues with files that require the original name of the file (examples: if the extension of the file is needed for identification purposes or if the name of a secondary file depends on the name/path of the original source file).

redirect: Object implementing SeekableByteChannel6 to be used to read the source data containing the input file
OptionsCache setSourceFile(SeekableByteChannel6 redirect, String filename) throws OutsideInException

Sets an object that implements SeekableByteChannel6 to be used as the source document and provides information about the filename.

redirect: Object implementing SeekableByteChannel6 to be used to read the source data containing the input file

filename: A fully qualified path or file name that may be used to derive the extension of the file or name of a secondary file that is dependent on the name/path of the source file
OptionsCache addSourceFile(File file) throws OutsideInException

Sets the next source document file to be exported in sequence. This allows multiple documents to be exported to the same output destination.

file: Full path to source file
OptionsCache addSourceFile(SeekableByteChannel6 redirect)

Set a redirected channel as the next source document to be exported to the original destination file. This method has the same limitations as the similar setSourceFile(SeekableByteChannel6 redirect) method.
OptionsCache addSourceFile(SeekableByteChannel6 redirect, String Filename)

Set a redirected channel as the next source document to be exported to the original destination file. The file name provided is used as in the method setSourceFile(SeekableByteChannel6 redirect, String Filename)
OptionsCache setSourceFormat(FileFormat fileId)

Sets the source format to process the input file as, ignoring the algorithmic detection of the file type.

fileId: the format to treat the input document as.
OptionsCache setDestinationFile(File file) throws OutsideInException

Sets the location of the destination file.

file: Full path to the destination file
OptionsCache setDestinationFile(SeekableByteChannel6 redirect) throws OutsideInException

Sets an object that implements SeekableByteChannel6 to be used as the destination document. An Exporter.export() operation will write the output data to the provided SeekableByteChannel6 object.

redirect: Object implementing SeekableByteChannel6 to be used as the destination document written during an Exporter.export() operation
OptionsCache setDestinationFormat(FileFormat fileId)

Sets the destination file format to which the file should be converted.

fileId: the format to convert the input document(s) to.
OptionsCache setCallbackHandler(Callback callback)

Sets the object to use to handle callbacks.

callback: the callback handling object.
OptionsCache setPasswordsList(List<String> Passwords)

Provides a list of strings to use as passwords for encrypted documents. The technology will cycle through this list until a successful password is found or the list is exhausted.

Passwords: List of strings to be used as passwords.
OptionsCache setLotusNotesId(String NotesIdFile)

Sets the Lotus Notes ID file location.

NotesIdFile: Full path to the Notes ID file.
OptionsCache setOpenForNonSequentialAccess(boolean bOpenForNonSequentialAccess)

Setting this option causes the technology to open archive files in a special mode that is only usable for non-sequential access of nodes.

bOpenForNonSequentialAccess : If set to true would open the archive file in the special access mode. Note that turning this flag on a non-archive file will throw an exception at RunExport time.

11.2.3.1 DefaultInputCharacterSet

OIT Option ID: SCCOPT_DEFAULTINPUTCHARSET

This option is used in cases where Outside In cannot determine the character set used to encode the text of an input file. When all other means of determining the file's character set are exhausted, Outside In will assume that an input document is encoded in the character set specified by this option. This is most often used when reading plain-text files, but may also be used when reading HTML or PDF files.

Data Type

DefaultInputCharacterSetValue

DefaultInputCharacterSetValue Enumeration

DefaultInputCharacterSetValue can be one of the following enumerations:

SYSTEMDEFAULT

UNICODE

BIGENDIANUNICODE

LITTLEEENDIANUNICODE

UTF8

UTF7

ASCII

UNIXJAPANESE

UNIXJAPANESEEUC

UNIXCHINESETRAD1

UNIXCHINESEEUCTRAD1

UNIXCHINESETRAD2

UNIXCHINESEEUCTRAD2

UNIXKOREAN

UNIXCHINESESIMPLE

EBCDIC37

EBCDIC273

EBCDIC274

EBCDIC277

EBCDIC278

EBCDIC280

EBCDIC282

EBCDIC284

EBCDIC285

EBCDIC297

EBCDIC500

EBCDIC1026

DOS437

DOS737

DOS850

DOS852

DOS855

DOS857

DOS860

DOS861

DOS863

DOS865

DOS866

DOS869

WINDOWS874

WINDOWS932

WINDOWS936

WINDOWS949

WINDOWS950

WINDOWS1250

WINDOWS1251

WINDOWS1252

WINDOWS1253

WINDOWS1254

WINDOWS1255

WINDOWS1256

WINDOWS1257

ISO8859_1

ISO8859_2

ISO8859_3

ISO8859_4

ISO8859_5

ISO8859_6

ISO8859_7

ISO8859_8

ISO8859_9

MACROMAN

MACCROATIAN

MACROMANIAN

MACTURKISH

MACICELANDIC

MACCYRILLIC

MACGREEK

MACCE

MACHEBREW

MACARABIC

MACJAPANESE

HPROMAN8

BIDIOLDCODE

BIDIPC8

BIDIE0

RUSSIANKOI8

JAPANESEX0201

Default

SYSTEMDEFAULT

11.2.3.2 DocumentMemoryMode

OIT Option ID: SCCOPT_DOCUMENTMEMORYMODE

This option determines the maximum amount of memory that the chunker may use to store the document's data, from 4 MB to 1 GB. The more memory the chunker has available to it, the less often it needs to re-read data from the document.

Data

SMALLEST: 1 - 4MB
SMALL: 2 - 16MB
MEDIUM: 3 - 64MB
LARGE: 4 - 256MB
LARGEST: 5 - 1 GB

Default

LARGE: 4 - 256MB

11.2.3.3 DropPDFHyphens

This option controls whether or not the PDF filter will drop hyphens at the end of a line. Since most PDF-generating tools create them as generic dashes, it's impossible for Outside In to know if the hyphen is a syllable hyphen or part of a hyphenated word.When this option is set to true, all hyphens at the end of lines will be dropped from the extracted text.

Data Type

boolean

Default

false

11.2.3.4 EnableAllSubObjects

Oracle Outside In has an internal flag that is used to optimize several of the input filters for searching. One of the side effects of this optimization is that many embedded bitmaps, including Progressive JPEG, aren't output by the filter. This option can override this internal optimization.

Data Type

boolean

Default

false

11.2.3.5 EnableAlphaBlending

This option allows the user to enable alpha-channel blending (transparency) in rendering vector images. This is primarily useful to improve fidelity when rendering with a slower graphics engine, such as X-Windows over a network when performance is not an issue.

Data

Boolean

Default

False

11.2.3.6 ExportPerformanceMode

This option allows for skipping the processing of some or all style information when possible. This should result in better performance, but certain output will no longer be available.

Data Type

ExportPerformanceModeValue

ExportPerformanceModeValue Enumeration

ExportPerformanceModeValue can be one or more of the following enumerations:

NORMAL - Process the style information normally.
TEXTANDFONTS - Process only the font and character set information within a style.
TEXTONLY - Skip processing all style information.

Default

NORMAL

11.2.3.7 ExtractXMPMetadata

OIT Option ID: SCCOPT_EXTRACTXMPMETADATA

Adobe's Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. This option enables the XMP feature, which does not interpret the XMP metadata, but passes it straight through without any interpretation. This option will be ignored if the ParseXMPMetadata option is enabled.

Data Type

boolean

Data

true: This setting enables XMP extraction.
false: This setting disables XMP extraction.

Default

false

11.2.3.8 FallbackFormat

This option controls how files are handled when their specific application type cannot be determined. This normally affects all plain-text files, because plain-text files are generally identified by process of elimination, for example, when a file isn't identified as having been created by a known application, it is treated as a plain-text file. It is recommended that None be set to prevent the conversion from exporting unidentified binary files as though they were text, which could generate many pages of "garbage" output.

Data Type

FallbackFormatValue

FallbackFormatValue Enumeration

TEXT: Unidentified file types will be treated as text files.
NONE: Outside In will not attempt to process files whose type cannot be identified

Default

TEXT

11.2.3.9 IECondCommentMode

OIT Option ID: SCCOPT_HTML_COND_COMMENT_MODE

Some HTML input files may include "conditional comments", which are HTML comments that mark areas of HTML to be interpreted in specific versions of Internet Explorer, while being ignored by other browsers. This option allows you to control how the content contained within conditional comments will be interpreted by Outside In's HTML parsing code.

Data

NONE: Don't output any conditional comment
IE5: Include the IE5 comments
IE6: Include the IE6 comments
IE7: Include the IE7 comments
IE8: Include the IE8 comments
IE9: Include the IE9 comments
ALL: Include all conditional comments

11.2.3.10 IgnorePassword

OIT Option ID: SCCOPT_IGNORE_PASSWORD

This option can disable the password verification of files where the contents can be processed without validation of the password. If this option is not set, the filter should prompt for a password if it handles password-protected files.

Data Type

boolean

11.2.3.11 IncludeCharacterAttributes

This option allows the developer to track character attributes contained in the input document and choose which are output to tags in the XML document produced.

Data Type

EnumSet<IncludeCharacterAttributeValues>

Data

An IncludeCharacterAttributeValues object with the character attributes to be included

IncludeCharacterAttributeValues Enumeration

The following set of flags:

REVISIONDELETE

BOLD

ITALIC

UNDERLINE

DOOUBLEUNDERLINE

OUTLINE

HIDDEN

STRIKEOUT

SMALLCAPS

ALLCAPS

OCE

REVISIONADD

Default

EnumSet.noneOf(IncludeCharacterAttributeValues.class)

11.2.3.12 IncludeSearchMLOffset

The value of this option is a boolean that if set to true will include offset information in the SearchML output according to the schema. If the option is set to false, no offset information is produced.

Data Type

boolean

Default

false

11.2.3.13 InternalRendering

OIT Option ID: SCCOPT_RENDERING_PREFER_OIT

This option is valid on 32- and 64-bit Linux, 32-bit SunOS SPARC, 32-bit HP-UX RISC, and 32-bit AIX PPC.

When this option is set to TRUE, the technology will attempt to use its internal graphics code to render fonts and graphics. When set to FALSE, the technology will render images using the operating system's native graphics subsystem (X11 on UNIX/Linux platforms). This requires that there be an X11 display and a valid DISPLAY variable, regardless of the type of input document.

It is important for the system to be able to locate usable fonts when this option is set to TRUE. Only TrueType fonts (*.ttf or *.ttc files) are currently supported. To ensure that the system can find them, make sure that the environment variable GDFONTPATH includes one or more paths to these files. If the variable GDFONTPATH can't be found, the current directory is used. If fonts are called for and cannot be found, Image Export will exit with an error. Oracle does not provide fonts with any Outside In product.

Note:

Note that the maximum total path size for paths included in GDFONTPATH is 256 characters - paths longer than this will be truncated and will result in fonts not being discovered.

Data Type

boolean

11.2.3.14 ISODateTimes

OIT Option ID: SCCOPT_FORMATFLAGS

When this flag is set, all Date and Time values are converted to the ISO 8601 standard. This conversion can only be performed using dates that are stored as numeric data within the original file.

Data Type

boolean

Default

false

11.2.3.15 LotusNotesDirectory

OIT Option ID: SCCOPT_LOTUSNOTESDIRECTORY

This option allows the developer to specify the location of a Lotus Notes or Domino installation for use by the NSF filter. A valid Lotus installation directory must contain the file nnotes.dll.

Type (Common): String

Data

A path to the Lotus Notes directory.

Default

If this option isn't set, then OIT will first attempt to load the Lotus library according to the operating system's PATH environment variable, and then attempt to find and load the Lotus library as indicated in HKEY_CLASSES_ROOT\Notes.Link.

11.2.3.16 NullReplacementCharacter

This option specifies a two-byte Unicode character that will be used to replace null characters if null path separators are being used. This option defaults to '/' and is valid for the SearchML 3.x, SearchHTML and SearchText output formats.

Data Type

int

Default

11.2.3.17 PageMLFlags

This option allows the developer to set flags that enable options unique to the PageML schema.

Data Type

EnumSet<PageMLFlagValues>

PageMLFlagValues Enumeration

The following set of flags:

NOXMLDECLARATION: Do not generate xml declaration
INCLUDETEXT: Include text in PageML output

Default

EnumSet.noneOf(PageMLFlagValues.class)

11.2.3.18 ParseXMPMetadata

OIT Option ID: SCCOPT_PARSEXMPMETADATA

Adobe's Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. This option enables parsing of the XMP data into normal OIT document properties. Enabling this option may cause the loss of some regular data in premium graphics filters (such as Postscript), but won't affect most formats (such as PDF).

Data Type

boolean

Data

true: This setting enables parsing XMP.
false: This setting disables parsing XMP.

Default

false

11.2.3.19 PDFInputMaxEmbeddedObjects

This option allows the user to limit the number of embedded objects that are produced in a PDF file.

Data Type

long

Data

The maximum number of embedded objects to produce in PDF output. Setting this to 0 would produce an all embedded objects in the input document.

Default

0 – produce all objects.

11.2.3.20 PDFInputMaxVectorPaths

This option allows the user to limit the number of vector paths that are produced in a PDF file.

Data Type

long

Data

The maximum number of paths to produce in PDF output. Setting this to 0 would produce an all vector objects in the input document.

Default

0 – produce all vector objects.

11.2.3.21 PDFReorderBiDi

OIT Option ID: SCCOPT_PDF_FILTER_REORDER_BIDI

This option controls whether or not the PDF filter will attempt to reorder bidirectional text runs so that the output is in standard logical order as used by the Unicode 2.0 and later specification. This additional processing will result in slower filter performance according to the amount of bidirectional data in the file.

PDFReorderBiDiValue Enumeration

This enumeration defines the type of Bidirection text reordering the PDF filter should perform.

STANDARDBIDI: Do not attempt to reorder bidirectional text runs.
REORDEREDBIDI: Attempt to reorder bidirectional text runs.

11.2.3.22 PDFWordSpacingFactor

This option controls the spacing threshold in PDF input documents. Most PDF documents do not have an explicit character denoting a word break. The PDF filter calculates the distance between two characters to determine if they are part of the same word or if there should be a word break inserted. The space between characters is compared to the length of the space character in the current font multiplied by this fraction. If the space between characters is larger, then a word break character is inserted into the text stream. Otherwise, the characters are considered to be part of the same word and no word break is inserted.

Data Type

float

Data

A value representing the percentage of the space character used to trigger a word break. Valid values are positive values less than 2.

Default

0.85

11.2.3.23 PerformExtendedFI

OIT Option ID: SCCOPT_FIFLAGS

This option affects how an input file's internal format (application type) is identified when the file is first opened by the Outside In technology. When the extended test flag is in effect, and an input file is identified as being either 7-bit ASCII, EBCDIC, or Unicode, the file's contents will be interpreted as such by the export process.

The extended test is optional because it requires extra processing and cannot guarantee complete accuracy (which would require the inspection of every single byte in a file to eliminate false positives.)

Data Type

boolean

Data

One of the following values:

false: When this is set, standard file identification behavior occurs.
true: If set, the File Identification code will run an extended test on all files that are not identified.

Default

true

11.2.3.24 PrinterName

This option is Windows-specific. It is used to set which device context to use to render the pages.

It specifies, as a byte string, the name of the printer whose metrics should be used to calculate pagination information. If unspecified, the default printer will be used. The screen metrics of the system will be used if a printer is not specified and a default printer does not exist. As pagination is affected by the metrics of the device context and installed fonts, PageML XML output can vary between different systems and configurations.

Data Type

String

Default

None - PageML uses the Windows default printer

11.2.3.25 ProcessOLEEmbeddingMode

OIT Option ID: SCCOPT_PROCESS_OLE_EMBEDDINGS

Microsoft Powerpoint versions from 1997 through 2003 had the capability to embed OLE documents in the Powerpoint files. This option controls which embeddings are to be processed as native (OLE) documents and which are processed using the alternate graphic.

Note:

The Microsoft Powerpoint application sometimes does embed known Microsoft OLE embeddings (such as Visio, Project) as an "Unknown" type. To process these embeddings, the ProcessOLEEmbedAll option is required. Post Office-2003 products such as Office 2007 embeddings also fall into this category.

Data

STANDARD: Process embeddings that are known standard embeddings. These include Office 2003 versions of Word, Excel, Visio, etc.
ALL: Process all embeddings in the file.
NONE: Process none of the embeddings in the file.

Default

STANDARD

11.2.3.26 RenderEmbeddedFonts

This option allows you to disable the use of embedded fonts in PDF input files. If the option is set to true, the embedded fonts in the PDF input are used to render text; if the option is set to false, the embedded fonts are not used and the fallback is to use fonts available to Outside In to render text.

Data Type

boolean

Default

true

11.2.3.27 SearchMLFlags

This option allows the developer to set flags that enable options unique to the following SearchML formats: SearchML 3.x, SearchHTML and SearchText.

Data Type

EnumSet<SearchMLFlagValues>

SearchMLFlagValues Enumeration

The following set of flags:

SHOWPARAGRAPHSTYLENAMES: Add paragraph style name reference to p tags
PROCESSEMBEDDINGS: Process embeddings
NOXMLDECLARATION: Don't generate xml declaration
SUPPRESSPROPERTIES: Suppress processing of document properties in all indexing related products.
GENERATETEXT: Produce generated text in all indexing related products.
SUPPRESSATTACHMENTS: Suppress processing of attachments.
SUPPRESSARCHIVESUBDOCS: Suppress processing of sub-documents in archives
METADATAONLY: Produce only metadata.
ANNOTATIONS: Annotation text should be noted as such
PRODUCEURLS: Produce URLs for hyperlinks
PRODUCEOBJECTINFO: Produce information allowing for reference of sub-document objects.
ENABLEERRORINFO: Output sub-document error information.
PRODUCECELLINFO: Output spreadsheet row and column information.
GENERATESYSTEMMETADATA: Generate system metadata
SKIPSTYLES: Skip style information for performance reasons. This option overrides other style related directives.
PRODUCEHIDDENCELLS: Produce hidden cell attribute

Default

EnumSet.noneOf(SearchMLFlagValues.class)

11.2.3.28 SearchMLParaAttributes

This option allows the developer to track paragraph attributes contained in the input document and, optionally, include them in the XML output. This option only affects SearchML output. The option is not valid for the SearchHTML, SearchText and PageML output flavors.

Data Type

EnumSet<Options.SearchMLParaAttributeValues>

SearchMLParaAttributeValues Enumeration

SearchMLParaAttributeValues can be one or more of the following enumerations ORed together:

PARAGRAPHSPACING: Track paragraph spacing
CHARACTERHEIGHT: Track Character Height
LEFTINDENT: Track left indent (in twips)
RIGHTINDENT: Track right indent (in twips)
FIRSTINDENT: Track first line indent (in twips)

Default

EnumSet.noneOf(Options.SearchMLParaAttributeValues.class)

11.2.3.29 ShowArchiveFullPath

OIT Option ID: SCCOPT_ARCFULLPATH

This option causes the full path of a node to be returned in "GetArchiveNodeInfo" and "GetObjectInfo".

Data Type

boolean

Data

true: Provide the full path.
false: Do not provide the path.

Default

false

11.2.3.30 StrictFile

When an embedded file or URL can't be opened with the full path, OutsideIn will sometimes try and open the referenced file from other locations, including the current directory. When this option is set, it will prevent OutsideIn from trying to open the file from any location other than the fully qualified path or URL.

Data Type

boolean

Default

false

11.2.3.31 TimeZoneOffset

OIT Option ID: SCCOPT_TIMEZONE

This option allows the user to define an offset to GMT that will be applied during date formatting, allowing date values to be displayed in a selectable time zone. This option affects the formatting of numbers that have been defined as date values. This option will not affect dates that are stored as text.

Note:

Daylight savings is not supported. The sent time in msg files when viewed in Outlook can be an hour different from the time sent when an image of the msg file is created.

Data Type

long

Data

Integer parameter from -96 to 96, representing 15-minute offsets from GMT. To query the operating system for the time zone set on the machine, specify SCC_TIMEZONE_USENATIVE.

Default

0: GMT time

11.2.3.32 UnmappableCharacter

OIT Option ID: SCCOPT_UNMAPPABLECHAR

This option selects the character used when a character cannot be found in the output character set. This option takes the Unicode value for the replacement character. It is left to the user to make sure that the selected replacement character is available in the output character set.

Data Type

int

Data

The Unicode value for the character to use.

Default

0x002a = "*"

11.2.3.33 UnmappedText

This option allows for the production of unmapped text (the original code points from the input document). A new <unmapped> element will be produced to enclose this text. The <unmapped> element will contain base64-encoded text. It will also contain two attributes. "OCE" will contain a hex value representing the character set. "font" will contain a string value of the original font name. This is necessary for non-standard encodings such as wingdings or webdings. This option is only valid in the SearchML 3.2 (and higher) schema.

Data Type

UnmappedTextValue

UnmappedTextValue Enumeration

The following set of values:

ONLYUNMAPPED: Output just the unmapped text
NOUNMAPPEDTEXT: No unmapped text is output
BOTH: Both original and unmapped text are output

Default

NOUNMAPPEDTEXT

11.2.3.34 XMLDefinitionReference

This option determines whether the converted file will reference a specified schema, DTD, or no reference when generating output.

Data Type

XMLReference

Data

An XMLReference object that defines the XML Definition Reference to be used.

Default

No reference defined

11.3 ExportStatus Class

The ExportStatus class provides access to information about a conversion. This information may include information about sub-document failures, areas of a conversion that may not have high fidelity with the original document. When applicable the number of pages in the output is also provided.

Namespace

com.oracle.outsidein

Accessors

long getPageCount() - A count of all of the output pages produced during an export operation.
EnumSet<ExportStatusFlags> getStatusFlags() - Gets the information about possible fidelity issues with the original document.
long getSubDocsFailed() - Number of sub documents that were not converted.
long getSubDocsPassed() - Number of sub documents that were successfully converted.

ExportStatusFlags Enumeration

This enumeration is the set of possible known problems that can occur during an export process.

NoInformationAvailable: No Information is available
MissingMap: A PDF text run was missing the toUnicode table
VerticalText: A vertical text run was present
TextEffects: A run that had unsupported text effects applied. One example is Word Art
UnsupportedCompression: A graphic had an unsupported compression
UnsupportedColorSpace: A graphic had an unsupported color space
Forms: A sub documents had forms
RightToLeftTables: A table had right to left columns
Equations: A file had equations
AliasedFont: The desired font was missing, but a font alias was used
MissingFont: The desired font wasn't present on the system
SubDocFailed: a sub-document was not converted
TypeThreeFont: A type 3 font was encountered.
UnsupportedShading: An unsupported shading pattern was encountered.
InvalidHTML: An HTML parse error, as defined by the W3C, was encountered.

11.4 FileFormat Class

This class defines the identifiers for file formats.

Namespace

com.oracle.outsidein

Methods

GetDescription

String GetDescription()

This method returns the description of the format.
GetId

int GetId()

This method returns the numeric identifier of the format.
ForId

FileFormat ForId(int id)

This method returns the FileFormat object for the given identifier.

id: The numeric identifier for which the corresponding FileFormat object is returned.

11.5 ObjectInfo Class

ObjectInfo provides all the information available about the OIT Object. This is a read-only class where the technology fills in all the values.

Namespace

com.oracle.outsidein.options

Accessors

ObjectInfo.CompressionValues getCompression() - the type of compression used to store the object, if known.
EnumSet<ObjectInfo.ObjectInfoFlagValues> getFlags() - flags indicating attributes of the object.
FileFormat getFormatId() - the format Identifier of the object.
String getName() - name of the object.

ObjectInfoFlags Enumeration

Bit fields to describe information about an object.

PARTIALFILE: Object would not normally exist outside the source document
PROTECTEDFILE: Object is encrypted or password protected
UNSUPPORTEDCOMPRESSION: Object uses an unsupported compression mechanism
DRMFILE: Object uses Digital Rights Management protection
UNIDENTIFIEDFILE: Object is extracted, but can not successfully identified
LINKTOFILE: Object links to file, it can not be extracted
ENCRYPTEDFILE: Object is encrypted and can be decrypted with the known password

11.6 Option Interface

The Option Interface provides the methods and properties to retrieve information about an Outside In Option.

Package

com.oracle.outsidein.options

Accessors

String getName() — Gets the name of the option
String getDescription() — Gets the description of the option
Class<?> getDataType() — Gets the type of the option value.
Class<?>[] getItemTypes() — Gets the type parameters for option values that are generics
EnumSet<Option.OutsideInProducts> getSupportingProducts() — Gets the list of products that support this option

Methods

void set(OptionsCache exporter, Object objValue) throws OutsideInException;

This method sets the option to the exporter object and returns the exporter object itself.

exporter — The exporter object
objValue — Value of the option

Note:

If the type of objValue cannot be converted to the data type the option is expecting, an OutsideInException is thrown.

Object get(OptionsCache exporter)

This method gets the currently set value for the option.

exporter: The exporter object who’s option value is requested.

OutsideInProducts Enumeration

HTMLEXPORT — Outside In HTML Export
IMAGEEXPORT — Outside In Image Export
PDFEXPORT — Outside In PDF Export
SEARCHEXPORT — Outside In Search Export
WEBVIEWEXPORT — Outside In Web View Export
XMLEXPORT — Outside In XML Export

11.7 OutsideIn Class

This is a utility class that creates an instance of an Exporter object on request.

Namespace

com.oracle.outsidein

Methods

static Exporter newLocalExporter()

This method creates an instance of an Exporter object. It returns a newly created Exporter object.

static Exporter newLocalExporter(Exporter source)

This method creates and returns an instance of an Exporter object based on the source Exporter. All the options of source are copied to the new Exporter. The source and destination file information will not be copied.

OutsideInVersion getCoreVersion()

This static method returns an OutsideInVersion object with information describingthe Outside In Core Technology used.

void setLocation(File oilinkDir)

Sets an explicit path to the native Outside In libraries and oilink.exe. If used, this method must be called prior to any other Outside In method or this method will throw an exception. If setLocation() is not used, the location will be determined by searching for the Outside In libraries in the following order:

the location specified in the 'OILinkLocation' Java property
the 'oit' subdirectory under the directory containing oilink.jar
the directory containing oilink.jar

11.8 OutsideInVersion Class

The OutsideIn Class is used to describe the version of the Outside In Core Module.

Namespace

com.oracle.outsidein

Methods

String GetVersion()

This method returns the version information as a string in the format of “MajorVersion.MinorVersion.DotVersion”.

int getMajorVersion()

The major version component.

int getMinorVersion()

The minor version component.

int getDotVersion()

The dot version component.

11.9 OutsideInException Class

This is the exception that is thrown when an Outside In Technology error occurs.

This class derives from the Exception class. This class has no public methods or properties except those of the parent Exception class.

Namespace

com.oracle.outsidein

11.10 XMLReference Class

The XMLReference class is a data class used to define the XML definition reference to be used.

Namespace

com.oracle.outsidein.options

Methods

ReferenceMethodValue getReferenceMethod()

Retrieves the type of reference.

void setReferenceMethod(ReferenceMethodValue value)

Sets the type of reference.

String getDefinitionReference()

Retrieves the DTD or schema referenced.

void setDefinitionReference(String value)

Sets the DTD or schema referenced.

Constructors

XMLReference()

Create an instance of a XMLReference object using No XML definition reference

XMLReference(XMLReference.ReferenceMethodValue, String)

Create an instance of a XMLReference object to provide a DTD/XSD

ReferenceMethodValue Enumeration

This enumeration is used to set whether Export will reference a schema, a DTD, or no reference when generating output.

DTD: Document Type Definition (DTD)
XSD: Extensible Schema Definition
NONE: No definition reference