Oracle® Outside In PDF Export Developer's Guide Release 8.4.0 Part Number E12886-03 |
|
|
View PDF |
PDF Export allows an OEM to convert almost any document, spreadsheet or presentation file into a PDF file.
There may be references to other Oracle Outside In Technology SDKs within this manual. To obtain complete documentation for any other Oracle Outside In product, see:
http://www.oracle.com/technetwork/indexes/documentation/index.html#middleware
and click on Outside In Technology.
This chapter includes the following sections:
The updated list of supported formats is linked from the page http://www.outsideinsdk.com/
. Look for the data sheet with the latest supported formats.
In Chapter 4, "Data Access Common Functions," DASaveTreeRecord has a new dwSpecType. IOTYPE_REDIRECT specifies that redirected I/O will be used to save the file.
NSF support has been added for the Win x86-64 platform. Please see Section 2.1.1, "NSF Support" for more details.
The new value SCCUT_WP_EMAILHEADERNONE has been added to the SCCOPT_WPEMAILHEADEROUTPUT option. This will produce no email header fields.
Two new options, SCCOPT_MAILHEADERVISIBLE and SCCOPT_MAILHEADERHIDDEN, allow you to control what email headers are rendered.
A new option has been added: SCCOPT_IMAGEPASSTHROUGH. This feature is used to allow certain input files to circumvent the normal filtering process and to be 'wrapped' in a PDF output file directly. This allows for much faster exporting of the supported file formats, which for release 8.4 are JPEG, JPEG2000, and TIFF.
A new option has been added: SCCOPT_EXPORTEMAILATTACHMENTS. For input files in all OIT-supported email formats that contain attachments, this option instructs the PDF Export process to export the contents of the attachments to PDF. The contents of the export are attached to the end of the email message so that only one PDF output file is produced. In addition, hyperlinks are provided that link to bookmarks marking the beginning of each attachment in the resultant PDF.
A new option, SCCOPT_SYSTEMFLAGS, allows you to control miscellaneous interactions between the developer and the Outside In Technology.
A new option, SCCOPT_FONTEMBEDPOLICY, allows you to determine whether or not to embed Adobe Standard Base 14 fonts.
PDF Export now follows "Font Embedding Guidelines for Adobe Third-party Developers."
When automatic font color is selected in Microsoft Office (the default setting), the application renders the text as white if the text is on a dark background. The Outside In Technology now assumes the same behavior.
A new function has been added: DAInitEx. It replaces DAInit and DAThreadInit and adds an option not to load or save the options file.
A new sample application, extract_archive, demonstrates using the DATree API to extract all nodes in an archive.
Support has been added for AutoCAD 2011 and 2012 files, using the OpenDesign Alliance's Teigha 3.05.00 libraries.
Support has been added for Hangul 2010 documents.
Scalable Vector Graphics (SVG) files are now identified and processed by the XML filter.
When saving email messages that have been created in HTML format, Outlook creates two message bodies, a plain text body and an RTF body that contains embedded HTML. Support has been added for the HTML embedded in the RTF.
Support has been added to extract and render MSGs and EMLs to which a digital signature has been applied.
A new option, SCCOPT_HTML_COND_COMMENT_MODE (SOAP equivalents: htmlCondCommentIE5On, htmlCondCommentIE6On, htmlCondCommentIE7On, htmlCondCommentIE8On, htmlCondCommentIE9On, htmlCondCommentAllOn), allows you to control which special comments targeted for particular versions of browsers or other products that are found in the HTML will be included in the output.
PDF files created by Acrobat 10 are now validated and processed.
Support has been added for the extraction of table data in a Microsoft Jet 3.x- or 4.x-based file. This means that for database files created in Access 95, 97, 2000, 2002, 2003, 2007, and 2010, the TABLES data can be extracted.
Support has been added for text extraction from Microsoft OneNote 2007 and 2010 files.
Support has been added for Outlook 2010 PST and OST files, including support for High Encryption in all versions of Outlook PST and OST files.
Support has been added for rendering Outlook MSG files: Note, Task, Appointment, Contact, and Journal.
Support has been added for two types of Office 2003 files: WordProcessingML (Word 2003), text only; and SpreadSheetML (Excel 2003), text only. The XML version of the binary format will be processed, skipping embedded objects and tagging properties.
Support has been added for IBM SmartSuite 9.8 files: Lotus WordPro, Lotus 1-2-3, and Lotus Freelance.
Support has been added for Apple iWork 09 files for Mac OSX: Pages 09 PDF Preview & Text, Numbers 09 PDF Preview & Text, and Keynote 09 PDF Preview & Text.
Support has been added for WordPerfect X5 files: Word Processor, Quattro Pro, and Presentations.
Support has been added for Adobe Creative Suite 5 files: Photoshop CS5, Illustrator CS5, and InDesign CS5.
Support has been added for the IBM AIX PPC (64-bit) OS version 7.1.
Support has been added for Red Hat Linux (x86 64-bit), Red Hat Enterprise Linux (RHEL) 6.
Certification on Windows 2000 has been discontinued.
The core rendering engine has been changed to apply the SMALLCAPS character attribute.
A new function, EXExportStatus, has been added to determine if there were conversion problems during an export.
An output type has been added for PDF/A-2a.
Support has been added for documents with bookmarks. When converted, the bookmarks appear in the bookmark panel of Acrobat Pro.
Support has been added for PDF input for Global Streams in JBIG2 Explicit masks.
Support has been added for viewing compressed PDF files.
The PDF filter has been updated to enable support for PDFs using AES 256-bit encryption.
Note:
Not all formats that use passwords are supported. Only Microsoft Office binary (97-2003) and Microsoft Office 2007, Lotus NSF, PDF (with RC4 encryption), Zip (with AES 128 & 256 bit, ZipCrypto) are currently supported.
Fonts embedded in PDF input files are now extracted and used to render text.
Transformation Server has been ported to Windows x86-64.
The basic architecture of Oracle Outside In technologies is the same across all supported platforms.
Filter/Module | Description |
---|---|
Input Filter |
The input filters form the base of the architecture. Each one reads a specific file format or set of related formats and sends the data to OIT through a standard set of function calls. There are more than 150 of these filters that read more than 500 distinct file formats. Filters are loaded on demand by the data access module. |
Export Filter |
Architecturally similar to input filters, export filters know how to write out a specific format based on information coming from the chunker module. The export filter produces the page layout for PDF output. |
Chunker |
The Chunker module is responsible for caching a certain amount of data from the filter and returning this data to the export filter. |
Export |
The Export module implements the export API and understands how to load and run individual export filters. |
Data Access |
The Data Access module implements a generic API for access to files. It understands how to identify and load the correct filter for all the supported file formats. The module delivers to the developer a generic handle to the requested file, which can then be used to run more specialized processes, such as the Export process. |
The following terms are used in this documentation.
Term | Definition |
---|---|
Developer |
Someone integrating this technology into another technology or application. Most likely this is you, the reader. |
Source File |
The file the developer wishes to export. |
Output File |
The PDF file being written. |
Data Access Module |
The core of Oracle Outside In Data Access, in the SCCDA library. |
Data Access Submodule (also referred to as "Submodule") |
This refers to any of the Oracle Outside In Data Access modules, including SCCEX (Export), but excluding SCCDA (Data Access). |
Document Handle (also referred to as "hDoc") |
A Document Handle is created when a file is opened using Data Access (see Chapter 4, "Data Access Common Functions"). Each Document Handle may have any number of Subhandles. |
Subhandle (also referred to as "hItem") |
Any of the handles created by a Submodule's Open function. Every Subhandle has a Document Handle associated with it. For example, the hExport returned by EXOpenExport is a Subhandle. The DASetOption and DAGetOption functions in the Data Access Module may be called with any Subhandle or Document Handle. The DARetrieveDocHandle function returns the Document Handle associated with any Subhandle. |
Each Oracle Outside In product has an sdk directory, under which there is a subdirectory for each platform on which the product ships (for example, px/sdk/px_win-x86-32_sdk). Under each of these directories are the following three subdirectories:
docs: Contains both a PDF and HTML version of the product manual.
redist: Contains only the files that the customer is allowed to redistribute. These include all the compiled modules, filter support files, .xsd and .dtd files, cmmap000.bin, and third-party libraries.
sdk: Contains the other subdirectories that used to be at the root-level of an sdk (common, lib (windows only), resource, samplefiles, and samplecode (previously samples). In addition, one new subdirectory has been added, demo, that holds all of the compiled sample apps and other files that are needed to demo the products. These are files that the customer should not redistribute (.cfg files, exportmaps, etc.).
In the root platform directory (for example, px/sdk/px_win-x86-32_sdk), there are two files:
README: Explains the contents of the sdk, and that makedemo must be run in order to use the sample applications.
makedemo (either .bat or .sh – platform-based): This script will either copy (on Windows) or Symlink (on Unix) the contents of …/redist into …/sdk/demo, so that sample applications can then be run out of the demo directory.
If you load more than one OIT SDK, you must copy files from the secondary installations into the top-level OIT SDK directory as follows:
docs – copy all subdirectories named “[product name]guide” into this directory.
redist – copy all binaries into this directory.
sdk – this directory has several subdirectories: common, demo, lib, resource, samplecode, samplefiles. In each case, copy all of the files from the secondary installation into the top-level OIT SDK subdirectory of the same name. If the top-level OIT SDK directory lacks any directories found in the directory being copied from, just copy those directories over.
Here's a step-by-step overview of how to export a PDF file.
Call DAIniExt to initialize the Data Access technology. This function needs to be called only once per application. If using threading, then pass in the correct ThreadOption.
Set any options that require a NULL handle type (optional). Certain options need to be set before the desired source file is opened. These options are identified by requiring a NULL handle type. They include, but aren't limited to:
SCCOPT_FALLBACKFORMAT
SCCOPT_FIFLAGS
SCCOPT_TEMPDIR
It is also necessary to set the SCCOPT_FONTDIRECTORY option before exporting a document. Files will fail to export unless SCCOPT_FONTDIRECTORY is defined.
Open the Source File. DAOpenDocument is called to create a document handle that uniquely identifies the source file. This handle may be used in subsequent calls to the EXOpenExport function or the open function of any other Data Access Submodule, and will be used to close the file when access is complete. This allows the file to be accessed from multiple Data Access Submodules without reopening.
Set the Options. If you require option values other than the default settings, call DASetOption to set options. Note that options listed in the Options Guide as having "Handle Types" that accept VTHEXPORT may be set any time before EXRunExport is called. For more information on options and how to set them, see Section 4.8, "DASetOption."
Open a Handle to PDF Export. Using the document handle, EXOpenExport is called to obtain an export handle that identifies the file to the specific export product. This handle will be used in all subsequent calls to the specific export functions. The dwOutputId parameter of this function is used to specify that the output file type should be set to either FI_PDF (for generic PDF 1.4) or FI_PDFA (for PDF/A-1a compliance).
Make Any Required Calls to Annotation Functions. This is the point at which any calls to annotation functions (such as EXHiliteText, EXInsertText or EXHideText) should be made.
Export the File. EXRunExport is called to generate the output file(s) from the source file.
Close the Handle to PDF Export. EXCloseExport is called to terminate the export process for the file. After this function is called, the export handle will no longer be valid, but the document handle may still be used.
Close the Source File. DACloseDocument is called to close the source file. After calling this function, the document handle will no longer be valid.
Close PDF Export. DADeInit is called to de-initialize the Data Access technology.
The following notice must be included in the documentation, help system, or About box of any software that uses any of Oracle's executable code:
Oracle Outside In PDF Export © 1991, 2012 Oracle.
The following notice must be included in the documentation of any software that uses Oracle's TIF6 filter (this filter reads TIFF and JPEG formats):
The software is based in part on the work of the Independent JPEG Group.