Data Access Common Functions

This chapter describes common Data Access functions. The Data Access module is common to all Oracle Outside In SDKs. It provides a way to open a generic handle to a source file. This handle can then be used in the functions described in this chapter.

This chapter includes the following sections:

Parent topic: Using the C/C++ API

Deprecated Functions

DAInit and DaThreadInit have both been deprecated. DAInitEx now replaces these two functions. All new implementations should use DAInitEX, although the other two functions will continue to be supported.

DAInitEx

This function tells the Data Access module to perform any necessary initialization it needs to prepare for document access. This function must be called before the first time the application uses the module to retrieve data from any document. This function supersedes the old DAInit and DAThreadInit functions.

Note:

DAInitEx should only be called once per application, at application startup time. Any number of documents can be opened for access between calls to DAInitEx and DADeInit. If DAInitEx succeeds, DADeInit must be called regardless of any other API calls.

If the ThreadOption parameter is set to something other than DATHREAD_INIT_NOTHREADS, then this function’s preparation includes setting up mutex function pointers to prevent threads from clashing in critical sections of the technology’s code. The developer must actually code the threads after this function has been called. DAInitEx should be called only once per process and should be called before the developer’s application begins the thread.

Note:

Multiple threads are supported for all Windows platforms, the 32-bit versions of Linux x86 and Solaris SPARC, Linux x64 and Solaris SPARC 64. Failed initialization of the threading function will not impair other API calls. If threading isn’t initialized or fails, stub functions are called instead of mutex functions.

Prototype

DAERR DAInitEx(VTSHORT ThreadOption, VTDWORD dwFlags);

Parameters

Return Values

DADeInit

This function tells the Data Access module that it will not be asked to read additional documents, so it should perform any cleanup tasks that may be necessary. This function should be called at application shutdown time, and only if the module was successfully initialized with a call to DAInitEx.

Prototype

DAERR DADeInit();

Return Values

DAOpenDocument

Opens a source file to make it accessible by one or more of the data access technologies. If DAOpenDocument succeeds, DACloseDocument must be called regardless of any other API calls.

The software now allows you to specify a file within an archive as the source for a conversion. A “subdocument specification” has been defined that allows the caller to identify the item within the archive that they wish to convert. The subdocument specification has the form item.number, where number identifies a particular item within the archive (item numbers must be non-zero, positive integers and the enumeration of items in the archive starts with “1”). Nested archives are supported, meaning that if the archived item is itself also an archive, you can specify an item within it as the “true” target file. This is accomplished by appending another number to the subdocument specification, delimited by another dot. For example, to specify item number 3 within an archive, the subdocument specification is item.3. If item number 3 is an archive file itself, and you wish to specify the fourth item within it, the subdocument specification is item.3.4. Any level of nesting is supported, up to the maximum length of a subdocument specification, which is DA_MAXSUBDOCSPEC.

For IO types other than IOTYPE_REDIRECT, the subdocument specification may be specified as part of the file’s path. This is accomplished by appending a question mark delimiter to the path, followed by the subdocument specification. For example, to specify the third item within the file c:\docs\file.zip, specify the path c:\docs\file.zip?item.3 in the call to DAOpenDocument. DAOpenDocument always attempts to open the specification as a file first. In the unlikely event there is a file with the same name (including the question mark) as a file plus the subdocument specification, that file is opened instead of the archive item.

To take advantage of this feature when providing access to the input file using redirected IO, a subdocument specification must be provided via a response to an IOGetInfo message, IOGETINFO_SUBDOC_SPEC. To specify an item in an archive, first follow the standard redirected IO methods to provide a BASEIO pointer to the archive file itself. To specify an item within the archive, a redirected IO object must respond to the IOGETINFO_SUBDOC_SPEC message by copying to the supplied buffer the subdocument specification of the archive item to be opened. This message is received during the processing of DAOpenDocument.

Prototype

DAERR DAOpenDocument(
   VTLPHDOC   lphDoc,
   VTDWORD    dwSpecType,
   VTLPVOID   pSpec,
   VTDWORD    dwFlags);

Parameters

Return Values

IOSPECLINKEDOBJECT Structure

Structure used by DAOpenDocument.

Prototype

typedef struct IOSPECLINKEDOBJECTtag
   {
   VTDWORD    dwStructSize;
   VTSYSPARAM hDoc;
   VTDWORD    dwObjectId;  /* Object identifier. */
   VTDWORD    dwType;      /* Linked Object type */
                           /* (SO_LOCATORTYPE_*) */
   VTDWORD    dwParam1;    /* parameter for DoSpecial call */
   VTDWORD    dwParam2;    /* parameter for DoSpecial call */
   VTDWORD    dwReserved1; /* Reserved. */
   VTDWORD    dwReserved2; /* Reserved. */
} IOSPECLINKEDOBJECT,       * PIOSPECLINKEDOBJECT;

IOSPECARCHIVEOBJECT Structure

Structure used by DAOpenDocument.

Prototype

typedef struct IOSPECARCHIVEOBJECTtag
   {
   VTDWORD dwStructSize;
   VTDWORD hDoc;        /* Parent Doc hDoc */
   VTDWORD dwNodeId;    /* Node ID */
   VTDWORD dwStreamId; 
   VTDWORD dwReserved1; /* Must always be 0 */
   VTDWORD dwReserved2; /* Must always be 0 */
} IOSPECARCHIVEOBJECT,   * PIOSPECARCHIVEOBJECT;

SCCDAOBJECT Structure

Structure used by DAOpenDocument.

Prototype

typedef struct SCCDAOBJECTtag
{
   VTDWORD   dwSize;         /* sizeof(SCCDAOBJECT) */
   VTHDOC    hDoc;           /* DA handle for the document 
                                containing the object */
   VTDWORD   dwObjectType;   /* SCCCA_EMBEDDEDOBJECT, 
                                SCCCA_LINKEDOBJECT, 
                                SCCCA_COMPRESSEDFILE or 
                                SCCCA_ATTACHMENT */
   VTDWORD   dwData1;        /* Data identifying the object */
   VTDWORD   dwData2;        /* Data identifying the object */
   VTDWORD   dwData3;        /* Data identifying the object */
   VTDWORD   dwData4;        /* Data identifying the object */
} SCCDAOBJECT, VTFAR* PSCCDAOBJECT;

DAOpenSubdocumentById

Allows an embedding to be opened using the integer value of the object_id attribute from the locator element.

Prototype

DAERR DAOpenSubdocumentById(
   VTHDOC     hDoc,
   VTLPHDOC   lphDoc,
   VTDWORD    pSpec,
   VTDWORD    dwFlags);

Parameters

DAOpenNextDocument

Allows an existing Export or Data Access document handle to be used or reused when opening a new document, enabling options to be preserved across multiple exports, or allowing multiple documents to be exported to the same output destination.

This function uses an existing “reference” handle as a starting point for opening another document. The reference handle may be either a document handle (obtained through DAOpenDocument) or an export handle (obtained via a call to EXOpenExport). The difference between using these two handle types is that certain document specification types (subdocuments of the original document) will not be successfully opened when a document handle is used as the reference handle. If an Export handle is used as the reference handle, subdocument specifications are allowed.

Since the same handle is used multiple times, only a single call to DACloseDocument is needed. Each document is actually closed when the next document is opened; successive calls to DAOpenNextDocument free the resources used in previous calls.

Using this function allows developers to make multiple calls to the EX functions, without having to re-set options every time. Options can be set once for the original document, and retained for each subsequent document.

Additionally, some export libraries allow exporting multiple source documents to a single output document. Currently, this is supported for PDF and multi-page TIFF output only. To do this, a developer would export the first document normally, then call DAOpenNextDocument to open the subsequent source documents, followed by a call to EXRunExport. EXOpenExport and EXCloseExport should only be called once each for this type of export.

Prototype

DAERR DAOpenNextDocument(
     VTHANDLE hReference,
     VTDWORD  dwSpecType,
     VTLPVOID pSpec,
     VTDWORD  dwFlags );

Parameters

Return Values

DACloseDocument

This function is called to close a file opened by the reader that has not encountered a fatal error.

Prototype

DAERR DACloseDocument(
   VTHDOC hDoc);

Parameters

Return Value

DARetrieveDocHandle

This function returns the document handle associated with any type of Data Access handle. This allows the developer to only keep the value of hItem, instead of both hItem and hDoc.

Prototype

DAERR DARetrieveDocHandle(
   VTHDOC     hItem,
   VTLPHDOC   phDoc);

Parameters

Return Value

DASetOption

This function is called to set the value of a data access option.

Prototype

DAERR DASetOption(
   VTHDOC      hDoc,
   VTDWORD     dwOptionId,
   VTLPVOID    pValue,
   VTDWORD     dwValueSize);

Parameters

Return Value

DASetFileSpecOption

This function is called to set the value of an option that takes a spec and spec type as parameters. It is currently only implemented for use in setting the template option in HTML Export. This function only needs to be used if the developer wishes to use Redirected IO on the template files. It may be used to set the template option even if the developer does not wish to use redirected IO, although DASetOption may also be used in this situation.

Prototype

DAERR DASetFileSpecOption(
   VTHDOC     hDoc,
   VTDWORD    dwOptionId,
   VTDWORD    dwSpecType,
   VTLPVOID   pSpec);

Parameters

Return Value

DAGetOption

This function is called to retrieve the value of a data access option. The results of a call to this option are only valid if DASetOption has already been called on the option.

Prototype

DAERR DAGetOption(
   VTHDOC    hItem,
   VTDWORD   dwOptionId,
   VTLPVOID  pValue,
   VTLPDWORD pSize);

Parameters

Return Value

DAGetFileId

This function allows the developer to retrieve the format of the file based on the technology’s content-based file identification process. This can be used to make intelligent decisions about how to process the file and to give the user feedback about the format of the file they are working with.

Note: In cases where File ID returns a value of FI_UNKNOWN, this function will apply the Fallback Format before returning a result.

Prototype

DAERR DAGetFileId(
   VTHDOC      hDoc,
   VTLPDWORD   pdwFileId);

Parameters

Return Value

DAGetFileIdEx

This function allows the developer to retrieve the format of the file based on the technology’s content-based file identification process. This can be used to make intelligent decisions about how to process the file and to give the user feedback about the format of the file they are working with. This function has all the functionality of DAGetFileID and adds the ability to return the raw FI value; in other words, the value returned by normal FI, without applying the FallbackFI setting.

Prototype

DAERR DAGetFileIdEx(
   VTHDOC      hDoc,
   VTLPDWORD   pdwFileId,
   VTDWORD     dwFlags);

Parameters

Return Value

Values with RAWFI turned off

Input file type ExtendedFI FallbackID DAGetFileId DAGetFileIdEx

true binary

off

fallback value

fallback value

fallback value

true binary

on

fallback value

fallback value

fallback value

true text

off

fallback value

fallback value

fallback value

true text

on

fallback value

40XX

40XX

Values with RAWFI turned on

Input file type ExtendedFI FallbackID DAGetFileId DAGetFileIdEx

true binary

off

fallback value

fallback value

1999

true binary

on

fallback value

fallback value

1999

true text

off

fallback value

fallback value

1999

true text

on

fallback value

40XX

1999

DAGetErrorString

This function returns to the developer a string describing the input error code. If the error string returned does not fit the buffer provided, it is truncated.

VTVOID DAGetErrorString(
   DAERR      deError,
   VTLPVOID   pBuffer,
   VTDWORD    dwBufSize);

Parameters

Return Value

DAGetObjectInfo

This function returns information about the document or object pointed to by hDoc. The object may be an embedded object, a linked object, or a compressed file.

DAERR DAGetObjectInfo(
   VTHDOC     hDoc,
   VTDWORD    dwInfoId,
   VTLPVOID   pInfo);

Parameters

Return Values

DAGetTreeCount

This function is called to retrieve the number of records in an archive file.

DAERR DAGetTreeCount(
      VTHDOC      hDoc,
      VTLPDWORD   lpRecordCount);

Parameters

Return Value

DAGetTreeRecord

This function is called to retrieve information about a record in an archive file.

DAERR DAGetTreeRecord(
      VTHDOC         hDoc,
      PSCCDATREENODE pTreeNode);

Parameters

Return Values

SCCDATREENODE Structure

This structure is passed by the OEM through the DAGetTreeRecord function. The structure is defined in sccda as follows:

typedef struct SCCDATREENODEtag{
   VTDWORD   dwSize;
   VTDWORD   dwNode;
   VTBYTE    szName[1024];
   VTDWORD dwFileNameLen;
   VTDWORD   dwFileSize;
   VTDWORD   dwTime;
   VTDWORD   dwFlags;
   VTDWORD   dwCharSet;
   } SCCDATREENODE,   *PSCCDATREENODE;

Parameters

DAOpenTreeRecord

This function is called to open a record within an archive file and make it accessible by one or more of the data access technologies.

Search ExportOnly: Search Export’s default behavior is to automatically open and process the contents of an archive. Use DAOpenTreeRecord and SCCOPT_XML_SEARCHML_FLAGS to change the default behavior if discrete processing of each document in an archive is desired.

DAERR DAOpenTreeRecord(
      VTHDOC      hDoc,
      VTLPHDOC    lphDoc,
      VTDWORD     dwRecord);

lphDoc is not a file handle.

Parameters

Return Value

DASaveInputObject

This function saves a copy of the document or object pointed to by hDoc. The object may be an embedded object, a linked object or a compressed file.

Some file formats store only partial files as embedded objects. Outside In is not able to create readable files from these objects. You should use use DAGetObjectInfo with dwInfoId set to DAOBJECT_FLAGS to discern which objects Outside In can successfully extract.

DAERR DASaveInputObject(
   VTHDOC     hDoc,
   VTDWORD    dwSpecType,
   VTLPVOID   pSpec,
   VTDWORD    dwFlags);

Parameters

Return Values

DASaveTreeRecord

This function is called to extract a record in an archive file to disk.

DAERR DASaveTreeRecord(
      VTHDOC      hDoc,
      VTDWORD     dwRecord,
      VTDWORD     dwSpecType,
      VTLPVOID    pSpec,
      VTDWORD     dwFlags);

Parameters

Return Values

Currently, only extracting a single file is supported. There is a known limitation where files in a Microsoft Binder file cannot be extracted.

DACloseTreeRecord

This function is called to close an open record file handle.

Search Export Only: Search Export’s default behavior is to automatically open and process the contents of an archive. Use DACloseTreeRecord and SCCOPT_XML_SEARCHML_FLAGS to change the default behavior if discrete processing of each document in an archive is desired.

DAERR DACloseTreeRecord(
      VTHDOC      hDoc);

Parameters

Return Value

DASetStatCallback

This function sets up a callback that the technology periodically calls to verify the file is still being processed. The customer can use this with a monitoring process to help identify files that may be hung. Because this function is called more frequently than other callbacks, it is implemented as a separate function.

Use of the Status Callback Function

An application’s status callback function will be called periodically by Oracle Outside In to provide a status message. Currently, the only status message defined is OIT_STATUS_WORKING, which provides a “sign of life” that can be used during unusually long processing operations to verify that Oracle Outside In has not stopped working. If the application decides that it would not like to continue processing the current document, it may use the return value from this function to tell Oracle Outside In to abort.

The status callback function has two return values defined:

The following is an example of a minimal status callback function.

VTDWORD MyStatusCallback( VTHANDLE hUnique, VTDWORD dwID, VTSYSVAL
pCallbackData, VTSYSVAL pAppData)
{
    if(dwID == OIT_STATUS_WORKING)
    {
        if( checkNeedToAbort( pAppData ) )
            return (OIT_STATUS_ABORT);
    }
  
    return (OIT_STATUS_CONTINUE);
}

Prototype

DAERR DASetStatCallback(DASTATCALLBACKFN pCallback, 
   VTHANDLE hUnique, 
   VTLPVOID pAppData)

Parameters

The callback function should be of type DASTATCALLBACKFN. This function has the following signature:

(VTHANDLE hUnique, VTDWORD dwID, VTSYSVAL pCallbackData, VTSYSVAL pAppData)

Return Values

DASetFileAccessCallback

This function sets up a callback that the technology will call into to request information required to open an input file. This information may be the password of the file or a support file location.

Use of the File Access Callback

When the technology encounters a file that requires additional information to access its contents, the application’s callback function will be called for this information. Currently, only two different forms of information will be requested: the password of a document, or the file used by Lotus Notes to authenticate the user information.

The status callback function has two return values defined:

This function will be repeatedly called if the information provided is not valid (such as the wrong password). It is the responsibility of the application to provide the correct information or return SCCERR_CANCEL.

Prototype

DAERR DASetFileAccessCallback (DAFILEACCESSCALLBACKFN pCallback);

Parameters

Return Values

The callback function should be of type DAFILEACCESSCALLBACKFN. This function has the following signature:

typedef VTDWORD (* DAFILEACCESSCALLBACKFN)(VTDWORD dwID, VTSYSVAL pRequestData, VTSYSVAL pReturnData, VTDWORD dwReturnDataSize);

Note:

Not all formats that use passwords are supported. DASetFileAccessCallback applies to filters that support password protected files. Check filter for any or all calls to UTGetFileAccess in filters and core modules. 

Only Microsoft Office binary (97-2003), Microsoft Office 2010-2013, Microsoft Outlook PST 97-2016, Lotus NSF, PDF (with RC4 encryption), and 7zip (with AES 128 & 256 bit, ZipCrypto) are currently supported.

Passwords for PST/OST files must be in the Windows single-byte character set. For example, Cyrillic characters should use the 1252 character set. For PST/OST files, Unicode password characters are not supported.