Note:
Native conversions fail when Inbound Refinery is run as a service on win64 platforms. This is due to the fact that services on win64 platforms do not have access to printer services. If performing native conversions, Inbound Refinery should not be run as a service.
For additional information describing the different types of conversion, how and where they are performed, and the advantages of each type, see the "Conversions in WebCenter Content" blog.
This chapter includes the following topics:
Inbound Refinery can convert native files to PDF by either exporting to PDF directly using Oracle Outside In PDF Export (included with Inbound Refinery) or by using third-party applications to output the native file to PostScript and then using a third-party PDF distiller engine to convert the PostScript file to PDF.
PDF conversions require the following components to be installed and enabled on the Inbound Refinery server.
Component Name | Component Description | Enabled on Server |
---|---|---|
PDFExportConverter |
Enables Inbound Refinery to use Oracle OutsideIn to convert native formats directly to PDF without the use of any third-party tools. PDF Export is fast, multi-platform, and allows concurrent conversions. |
Inbound Refinery Server |
WinNativeConverter |
Enables Inbound Refinery to convert native files to a PostScript file with either the native application or OutsideInX and convert the PostScript file to PDF using a third-party distiller engine. This component is for Windows platform only. It replaces the functionality previously made available in the deprecated PDFConverter component. WinNativeConverter offers the best rendition quality of all PDF conversion options when used with the native application on a Windows platform. This does not allow concurrent conversions. WinNativeConverter also enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, PowerPoint and Visio to HTML using the native Office application. |
Inbound Refinery Server |
OpenOfficeConversion |
Provides cross-platform support allowing Inbound Refinery to convert supported files to PDF using Open Office. Like WinNativeConverter, OpenOfficeConversion doesn't allow concurrent conversions, but unlike WinNativeConverter, it does support UNIX platforms. |
Inbound Refinery Server |
Note:
Native conversions fail when Inbound Refinery is run as a service on win64 platforms. This is due to the fact that services on win64 platforms do not have access to printer services. If performing native conversions, Inbound Refinery should not be run as a service.
This section describes how to work with PDF conversions and includes the following topics:
There are several factors to consider when choosing a PDF conversion method. System performance (the time it takes to convert a file to PDF format), the fidelity of the PDF output (how closely it matches the look and formatting of the native file), what native applications are needed (such as Microsoft Word or PowerPoint, used to generate the PostScript file converted by Inbound Refinery), and the platform a conversion application requires should all be taken into consideration.
If the speed of conversion is a primary concern, using PDF Export to convert original files directly to PDF is fastest. In addition to not having to use third-party tools, PDF Export allows concurrent PDF conversions and supports Windows, Linux and UNIX platforms.
If the fidelity of the PDF output is a primary concern, then using the native application to open the original file, output to PostScript, and convert the PostScript to PDF is the best option. However, this method is limited to the Windows platform and it cannot run concurrent PDF conversions.
If conversion must be done on a UNIX platform, then using OpenOffice to open a native file and export directly to a PDF file may be the best option. Depending on how it is set up, it may provide greater fidelity than PDF Export. However, unlike PDF Export, it does not support concurrent PDF conversions. Table 23-1 compares conversion methods and lists the platforms they support.
Note:
Regardless of the conversion option used, a PDF is a web-ready version of the native format. A converted PDF should not be expected to be an exact replica of the native format. Many factors such as font substitutions, complexity and format of embedded graphics, table structure, or issues with third-party distiller engines may cause the PDF output to differ from the native format.
Table 23-1 PDF Conversion Methods
Conversion Method | Performance | Fidelity | Supported Platforms | Concurrent PDF Conversions |
---|---|---|---|---|
PDF Export |
Best |
Good |
Windows/UNIX |
Yes |
3rd-Party Native Applications |
Good |
Best |
Windows |
No |
OpenOffice |
Good |
Good |
Windows/UNIX |
No |
This section discusses the following topics regarding PDF conversion settings:
File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add‐ons. Each Content Server must be configured to send files to refineries for conversion. When a file extension is mapped to a file format and a conversion, files of that type are sent for conversion when they are checked into the Content Server. Use either the File Formats Wizard or the Configuration Manager to set the file extension, file format, and conversion mappings.
All conversions required for Inbound Refinery are available by default in Content Server. For more information about configuring file extensions, file formats, and conversions in your Content Servers, see About MIME Types and Managing File Types.
Conversions available in the Content Server should match those available in the refinery. When a file format is mapped to a conversion in the Content Server, files of that format are sent for conversion upon check-in. One or more refineries must be set up to accept that conversion. Set the conversions that the refinery will accept and queue maximums on the Conversion Listing page. All conversions required for Inbound Refinery are available by default in both Content Server and Inbound Refinery.
For more information about setting accepted conversions, see Setting Accepted Conversions.
To set PDF files as the primary web‐viewable rendition:
When converting documents to PDF using WinNativeConverter, a distiller engine and PDF printer must be obtained, installed and configured. This is not necessary when converting to PDF using either Outside In PDF Export or OpenOffice to open and save documents to PDF.
WinNativeConverter can use several third-party applications to create PDF files of content items. In most cases, a third-party application that can open and print the file is used to print the file to PostScript, and then the PostScript file is converted to PDF using the configured PostScript distiller engine. In some cases, WinNativeConverter can use a third-party application to convert a file directly to PDF.
Note:
A distiller engine is not provided with Inbound Refinery. You must obtain a distiller engine of your choice. The chosen distiller engine must be able to execute conversions via a command-line. The procedures in this section use AFPL Ghostscript as an example. This is a free, robust distiller engine that performs both PostScript to PDF conversion and optimization of PDF files during or after conversion.
To install the PDF printer:
To change third‐party application settings:
To configure timeout settings for PDF file generation:
Inbound Refinery includes Outside In version 8.3.2. When using Outside In to convert graphics to PDF, you can set the margins for the generated PDF from 0–4.23 inches or 0–10.76 cm. By default, Inbound Refinery uses 1‐inch margins on the top, bottom, right, and left.
To adjust these margins:
This section discusses the following topics regarding OpenOffice conversions:
Typically, the OpenOffice Listener must always be running on the Inbound Refinery computer, or PDF conversion will fail. When running OpenOffice on Windows, configure an OpenOffice port in the Setup.xcu
file and run the OpenOffice Quickstarter. The Quickstarter adds shortcuts to OpenOffice applications to the system tray and runs the OpenOffice Listener as a background process.
By default, the Quickstarter loads at system startup and the OpenOffice icon should be in the system tray. To start the Quickstarter, launch any OpenOffice application. The application can then be closed, and the Quickstarter remains running. To set the Quickstarter to load at system startup, right-click the OpenOffice icon in the system tray, and choose Load OpenOffice.org During System Start-Up.
Note:
OpenOffice can be launched by Inbound Refinery running as a service on Windows XP, 2000, 2003. However, because you must be logged in to Windows to run the OpenOffice Listener, you must always be logged in to Windows when using OpenOffice for PDF conversion even when running Inbound Refinery as a service.
When running OpenOffice on UNIX, it is recommended that you configure an OpenOffice port and run soffice, which acts as the Listener. Soffice can be used on Windows instead of the Quickstarter.
To start soffice, launch the soffice.exe
file located in the following directory:
Windows: OpenOffice_install_dir
\openoffice.org3\program\
UNIX: OpenOffice_install_dir
/openoffice.org3/program
Note:
For versions of OpenOffice prior to 3.x, soffice.exe
is located in the following directories:
Windows: OpenOffice_install_dir
\program\
UNIX: OpenOffice_install_dir
/program
Editing Setup.xcu or main.xcd
Prior to version 3.3 of OpenOffice, the file Setup.xcu
was used to configure a listening port. Starting with version 3.3, Setup.xcu
was incorporated into the file main.xcd
. If configuring a version of OpenOffice prior to 3.3, then the steps below apply to editing the Setup.xcu
file. If configuring version 3.3 or later versions of OpenOffice, the steps below apply to editing the main.xcd
file.
To configure an OpenOffice port:
As an alternative to configuring an OpenOffice port in the Setup.xcu
file and then running the OpenOffice Quickstarter (Windows) or soffice (UNIX or Windows), soffice can be launched from the command line with parameters. However, these settings only apply to the current session. To launch soffice from the command line:
To configure Inbound Refinery to use OpenOffice:
If port 8100 was not used when modifying the OpenOffice Setup.xcu
file, do the following:
In the Inbound Refinery administration interface, select Conversion Settings then Third-Party Application Settings.
On the Third-Party Application Settings page, click the Options button for OpenOffice.
On the The OpenOffice Options page, in the Port to Connect to the OpenOffice Listener field, enter the port that you used when modifying the OpenOffice Setup.xcu
file.
Click Update.
Restart Inbound Refinery.
If converting documents using OpenOffice, Oracle Inbound Refinery requires class files distributed with OpenOffice. You must set the path to the OpenOffice class files in the refinery intradoc.cfg
file, located in the DomainHome
/ucm/ibr/bin
directory. To set the path in the intradoc.cfg
file:
Inbound Refinery can use OpenOffice to convert some file types directly to PDF. This is done by configuring the OpenOffice listener, which must be running in order for conversions to be successful. Typically, you must be logged in to the computer on which OpenOffice is installed in order for OpenOffice to be able to open and process any documents. However, the OpenOffice listener can be run in headless mode with no graphical user interface.
Note:
Before setting up the OpenOffice listener to run in headless mode, confirm that documents can be converted to PDF using OpenOffice running in a non-headless mode. Also, turn off any extra screens that start up before OpenOffice can be used, such as startup dialogs, tip wizards, or update notices. These cause the refinery process to time out, because conversions will not proceed until these screens are cleared and they are not displayed in headless mode.
The following information describes how to set up headless mode on a Windows host and on a UNIX host.
To convert documents to PDF using OpenOffice without being logged in to a Windows host, you must create a custom service to run the OpenOffice listener in headless mode. The Windows Resource Kits provide the INSTSRV.EXE and SRVANY.EXE utilities to create custom services.
To set up a custom OpenOffice service:
In the MS-DOS command prompt, type the following command:
path\INSTSRV.EXE service_name path\SRVANY.EXE
where path is the path to the Windows Resource Kit, and service_name is the name of your custom service. This name can be anything, but should be descriptive to identify the service. When done, a new service key is created in your Windows registry.
Open the Registry Editor by selecting Start, then select Run, entering regedit, and clicking OK.
Note:
Backup your registry before editing it.
Backup your registry by choosing File then Export and entering a name for the backup file, then clicking Save. Remember the location to which the backup file is saved should you need to restore the registry.
Navigate to the new registry key created in the first step and select the new service key. The new key is located at:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\service_name
With the new key selected, choose Edit then New, then select Key, and name it Parameters
.
Right-click on the Parameters key, select New, then select String Value, and name the value Application
.
Right-click on the Application string and select Modify.
Type in the full path to soffice.exe, appended with -headless
. For example:
C:\Program Files\OpenOffice2.0\program\soffice.exe-headless
Close the Registry Editor and restart the computer.
After the computer has successfully restarted, choose Start then Settings then Control Panel. Choose Administrative Tools then Services to open Windows Services.
On the Windows Services page, right-click the service you just created, choose Properties and ensure that the service is set up to start automatically
Select the Log On tab and enable This account. This enables the service to run using a specific user account.
Enter the same user credentials that the Inbound Refinery is using to run.
Note:
The Inbound Refinery user will need to have the right to log on as a service on the Inbound Refinery computer.
Start the service, accept the changes and close Windows Services.
To convert documents to PDF using OpenOffice without being logged in to a UNIX host, the OpenOffice listener must run in headless mode with no graphical user interface, using a virtual buffer display (X server).
Note:
Each UNIX environment is unique. This information is a general guideline for setting up the OpenOffice listener in headless mode on UNIX platforms. An example of the procedure for Red Hat EL4 is also included.
In general, to configure the OpenOffice listener to run in headless mode on UNIX platforms:
Note:
Before setting up OpenOffice to run in headless mode, ensure that Inbound Refinery is installed and configured correctly to successfully convert documents to PDF using OpenOffice in non-headless mode.
Create a startup script to run Inbound Refinery when the system boots up.
Configure a virtual X server and create a startup script to run it when the system boots up, to enable OpenOffice to run.
Create a startup script to run OpenOffice in headless mode when the system boots up.
Configure the system to run the startup scripts in the following order:
Start Inbound Refinery
Start the virtual X server
Start OpenOffice
Note:
The virtual X server must be started prior to starting OpenOffice, or OpenOffice will not run. Additionally, remember to ensure that the web server is also configured to run when the system boots up.
When running on Windows, Inbound Refinery can use Microsoft Office to convert Microsoft Office files to PDF files. The following Microsoft Office versions are supported:
Microsoft Office 2003
Microsoft Office 2007
Microsoft Office 2010
Note:
Support for Microsoft Office 2007 excludes support for Microsoft Project 2007.
Please note the following important general considerations:
Microsoft Office is used to convert Microsoft Office files to PDF when the Convert to PDF Using third-party applications option is selected on the Primary Web-Viewable Rendition page.
Inbound Refinery can convert a number of special features in Microsoft Office files into links in the generated PDF files. You set the conversion options for Microsoft Office files using the Third-Party Application Settings page.
To keep a conversion of a Microsoft Office file from timing out, all functions requiring user input should be disabled. These include password protection, security notifications, such as disabling of macros, and online access requests to show online content or participate in user feedback programs. For details on how to disable these and other similar features, see the Microsoft documentation for each product.
If a Microsoft Office file was converted to a PDF file successfully, but one or more links in the file could not be converted to links in the PDF file, the conversion status of that file is set to Incomplete. To prevent this from happening, you can set AllowSkippedHyperlinkToCauseIncomplete=false
in the intradoc.cfg
configuration file located in the refinery DomainDir
\ucm\ibr\bin
directory.
This section discusses the following topics regarding Microsoft Office conversions:
Consider the following when running Inbound Refinery on Windows and using Microsoft Word to convert Word files to PDF:
Any information in a Word file that is outside of the document's print area will not be converted to PDF.
Password-protected files will time out unless the need for a password is removed.
On Word 2003, choose Tools then Options then General. Turn off Show content and links from Microsoft Online under the Online category, and opt out of the Customer Experience Improvement Program under the Customer Feedback category. If you do not, these files might time out.
The following types of links in Word files can be converted to PDF:
Absolute URL links (for example, http://www.example.com
). You can also use links that specify targets on the page (for example http://idvm001/ibr/portal.htm#target
). In order to be processed as an absolute URL link, Word must return the http://
prefix as a part of the link. All supported versions of Microsoft Word automatically enforce this rule.
Relative URL links (for example, ../../../../portal.htm
). These links do not contain any server name or protocol prefix.
Mailto links (links to email addresses; for example mailto:support@example.com)
. In order to be processed as an email link, Word must return the mailto:
prefix as a part of the link. All supported versions of Microsoft Word automatically enforce this rule.
Table of Contents links (converted to bookmarks in the generated PDF file).
Bookmarks (internal links to auto-generated or author-generated bookmarks).
Standard heading styles (Heading 1, Heading 2, and so on, which are converted to bookmarks in the generated PDF file).
Links to footnotes and endnotes.
UNC path links (for example, \\server1\c\TestDocs\MSOfficeXP\word\target.doc
). This option is not currently available on the Word Options panel. To enable this functionality, you must set the ProcessWordUncLinks=true
variable in the refinery connection's intradoc.cfg
file (DomainHome
\ucm\ibr\bin\intradoc.cfg
). In general, UNC paths have no relevance in a web browser; a UNC path is not a URL. Therefore, the PDF must be opened outside of the web browser for UNC path links to be resolved correctly. If you are using UNC path links, you might want to configure the Reader on client computers to open PDF files outside the browser.
Links in text boxes are not converted.
Links within nested tables are not converted. If it is critical to convert links within nested tables, consider using Oracle Outside In PDF Export (included with Inbound Refinery) to convert MS Word documents to PDF, rather than the native MS Word application. For information about different conversion options, see Managing PDF Conversions. Alternately, Adobe Reader has a general preference that allows Adobe Reader to enable links formatted in the document. See the documentation that is available for your version of Adobe Reader for information.
Linked AutoShapes and objects (for example, pictures or WordArt objects) located in tables are not converted.
You might notice in some generated PDF files that the hotspot for a link is sometimes slightly off from the actual text (within a character or two). To date there are no know problems related to this occurrence, and there is currently no solution.
Consider the following when running Inbound Refinery on Windows and using Microsoft Excel to convert Excel files to PDF:
Any information in an Excel file that is outside of the document's print area will not be converted to PDF.
Password-protected files will time out unless the need for a password is removed.
On Word 2003, choose Tools then Options then General. Turn off Show content and links from Microsoft Online under the Online category, and opt out of the Customer Experience Improvement Program under the Customer Feedback category. If you do not, these files might time out.
Only external links are converted to PDF links. This is because, in the current implementation, it is impossible (or extremely difficult) to know which page of the generated PDF file will contain the target of an internal link (bookmark).
Only the following types of links in Excel files can be converted to PDF:
Absolute URL links (for example, http://www.example.com
). You can also use links that specify targets on the page (for example, http://idvm001/ibr/portal.htm#target
). In order to be processed as an absolute URL link, Excel must return the http://
prefix as a part of the link. All supported versions of Microsoft Excel automatically enforce this rule.
Relative URL links (for example, ../../../../portal.htm
). These links do not contain any server name or protocol prefix.
Mailto links (links to email addresses; for example mailto:support@example.com
). In order to be processed as an email link, Excel must return the mailto:
prefix as a part of the link. All supported versions of Microsoft Excel automatically enforce this rule.
Note:
The Excel Options panel does not have separate settings for absolute URL links and relative URL links. If the Process Excel URL Links option is selected, absolute URL links and relative URL links are all converted to PDF.
UNC path links (for example, \\server1\c\TestDocs\MSOfficeXP\word\target.doc
). This option is not currently available on the Excel Options panel. To enable this functionality, you must set the ProcessExcelUncLinks=true
variable in the refinery connection's intradoc.cfg
file (DomainHome
\ucm\ibr\bin\intradoc.cfg
). In general, UNC paths have no relevance in a web browser; a UNC path is not a URL. Therefore, the PDF must be opened outside of the web browser for UNC path links to be resolved correctly. If you are using UNC path links, you might want to configure the Reader on client computers to open PDF files outside the browser.
Links are only converted if they are located in cells. Links in text boxes, WordArt objects, and so forth are not converted. In the generated PDF file, the hotspot for the link is the cell that contains the link.
The Scaling on the Page Setup for the worksheet must be set to Adjust to: ###%normal size (and not Fit to Page). Further, the closer to 100% the scale is set, the better the results.
Consider the following when running Inbound Refinery on Windows and using Microsoft PowerPoint to convert PowerPoint files to PDF:
Any information in a PowerPoint file that is outside of the document's print area will not be converted to PDF.
Password-protected files will time out unless the need for a password is removed.
On Word 2003, choose Tools then Options then General. Turn off Show content and links from Microsoft Online under the Online category, and opt out of the Customer Experience Improvement Program under the Customer Feedback category. If you do not, these files might time out.
PowerPoint has two types of links: Hyperlinks, which behave the same in all Office applications, and Action Settings. The MSOfficeConverter.exe
supports the following Action Settings: Hyperlink to: Next Slide, Previous Slide, and URL. All other links should be inserted as hyperlinks.
The following types of hyperlinks in PowerPoint files can be converted to PDF:
Absolute URL links (for example, http://www.example.com
). You can also use links that specify targets on the page (for example http://idvm001/ibr/portal.htm#target
). In order to be processed as an absolute URL link, PowerPoint must return the http://
prefix as a part of the link. All supported versions of Microsoft PowerPoint automatically enforce this rule.
Relative URL links (for example, ../../../../portal.htm
). These links do not contain any server name or protocol prefix.
Mailto links (links to email addresses; for example mailto:support@example.com
). In order to be processed as an email link, PowerPoint must return the mailto:
prefix as a part of the link. All supported versions of Microsoft PowerPoint automatically enforce this rule.
Bookmarks (internal links to auto-generated or author-generated bookmarks).
Note:
The PowerPoint Options panel does not have separate settings for absolute URL links and relative URL links. If the Process PowerPoint Hyperlinks
option is selected, absolute and relative URL link are all converted to PDF.
UNC path links (for example, \\server1\c\TestDocs\MSOfficeXP\word\target.doc
). This option is not currently available on the PowerPoint Options panel. To enable this functionality, you must set the ProcessPowerPointUncLinks=true
variable in the refinery connection's intradoc.cfg
file (DomainHome
\ucm\ibr\bin\intradoc.cfg
). In general, UNC paths have no relevance in a web browser; a UNC path is not a URL. Therefore, the PDF must be opened outside of the web browser for UNC path links to be resolved correctly. If you are using UNC path links, you might want to configure the Reader on client computers to open PDF files outside the browser.
PowerPoint hyperlinks can only be processed if PowerPoint presentations are converted in the Slides format.
It is technically possible to have a link on an object (for example, a text box) over a link on an individual line of text. Because of the way the PDF is assembled, only the link on the object are active in the generated PDF file. Logically, any given spot in a PDF can only be assigned one action; so the top action is performed.
Consider the following when running Inbound Refinery on Windows and using Microsoft Visio to convert Visio files to PDF:
Any information in a Visio file that is outside of the document's print area will not be converted to PDF.
Visio files created using the Cross-Functional Flowchart template might cause a pop-up dialog to appear when Visio attempts to open the file, and thus the refinery process will time out unless this dialog is cleared manually on the refinery computer. To prevent this from happening, do not base files on the Cross-Functional Flowchart template. You can use Outside In to convert Visio files based on the Cross-Functional Flowchart template. Note that Visio links are not converted by Outside In.
Password-protected files will time out unless the need for a password is removed.
On Visio 2003, choose Tools then Options then General. Turn off Show content and links from Microsoft Online under the Online category, and opt out of the Customer Experience Improvement Program under the Customer Feedback category. If you do not, these files might time out.
The following types of links in Visio files can be converted to PDF:
Absolute URL links (for example, http://www.example.com
). You can also use links that specify targets on the page (for example http://idvm001/ibr/portal.htm#target
). Visio does not enforce some of the link rules that are enforced by other Office applications. Therefore, the author of the Visio document must use the http://
prefix as a part of the link. If the prefix is not found, the link is converted as a relative URL link, which will probably not produce the link. Without the author applying this prefix, there is no way for the conversion engine to distinguish the link from relative and mailto links.
Relative URL links (for example, ../../../../portal.htm
). These links do not contain any server name or protocol prefix.
Mailto links (links to email addresses; for example mailto:support@example.com
). Visio does not enforce some of the link rules that are enforced by other Office applications. Therefore, the author of the Visio document must use the mailto:
prefix as a part of the link. If the prefix is not found, the link is converted as a relative URL link, which will probably not produce the link. Without the author applying this prefix, there is no way for the conversion engine to distinguish the link from absolute and relative links.
Bookmarks (internal links to auto-generated or author-generated bookmarks). When Process internal Visio links is selected on the Visio Options panel, all internal document links to other sheets are included in the generated PDF.
Note:
For proper conversion of internal links in Visio 2003, you must clear the address field when creating the link. By default, the address field is populated with the file name of the file in which the link occurs. If this is not cleared, the link is converted as a link to the original file and prompts the user to download the original file when the link is clicked.
UNC path links (for example, \\server1\c\TestDocs\MSOfficeXP\word\target.doc
). This option is not currently available on the Visio Options panel. To enable this functionality, you must set the ProcessVisioUncLinks=true
variable in the refinery connection's intradoc.cfg
file (DomainHome
\ucm\ibr\bin\intradoc.cfg
). In general, UNC paths have no relevance in a web browser; a UNC path is not a URL. Therefore, the PDF must be opened outside of the web browser for UNC path links to be resolved correctly. If you are using UNC path links, you might want to configure the Reader on client computers to open PDF files outside the browser.
All Microsoft Visio files should be set so the printer paper size and orientation matches the drawing page size and orientation. Otherwise, links are not converted correctly (they are placed in the wrong location). For example, if the printer paper is set to Letter/Landscape
, the drawing page should also be set to Letter/Landscape
.
In the generated PDF file, the hotspot for a Visio link is a square that encompasses the shape; even if the shape itself is not a square.
Both relative and absolute links can be converted to PDF in Word, Excel, PowerPoint, and Visio files.
Example absolute link:
http://system/ucm/groups/public/documents/addacct/000123.pdf
Example relative link:
..\addacct\000123.pdf
When creating links, absolute and relative links each have advantages and disadvantages. Absolute links are easy to copy and paste, however, relative links can eliminate issues if the Content Server is migrated to a new computer or if the IP address and DNS names change; relative links will always be relative to the location of the web viewable file for the document you are checking in.
Note:
The following procedure applies to Inbound Refinery with Microsoft Office installed, using the Convert to PDF using third-party applications option. This procedure does not apply to configurations using Inbound Refinery on UNIX.
To use relative links in Word, Excel, PowerPoint, and Visio documents:
Log into the refinery.
Choose Conversion Settings then Third‐Party Application Settings.
On the Third-Party Application Settings page, click Options for the third‐party application.
Click Update to save your changes.
Use relative links, instead of absolute links, when authoring documents. It is important to understand that these links are relative to the location of the web viewable file for the document being checked in:
Example 1: Relative linking with the same document type and security
Create a link to document 000123: If the document was checked into security group "public" and document type "adacct", this document has a web viewable URL of:
http://system/ucm/groups/public/documents/adacct/000123.pdf
If you check document 000456 into the same security group and document type, its web viewable URL would be:
http://system/ucm/groups/public/documents/adacct/000456.pdf
Because the URL path is identical to 000123, the relative URL link in the document for 000456 would only need to be:
000123.pdf
Example 2: Relative linking to a different document type
Using the same document names, if you checked document 000456 into the same security group but a different document type, its web viewable URL would look like:
http://system/ucm/groups/public/documents/adcorp/000456.pdf
This means that your relative URL link needs to go up one directory and then into "adacct" to find 000123.pdf
. So the relative URL link would be:
..\adacct\000123.pdf
Example 3: Relative linking to a different document security
Now if you also change the security group of document 000456, its web viewable URL would look like:
http://system/ucm/groups/secure/documents/adcorp/000456.pdf
This means that the relative URL link will need to go up three directories and then back down to 000123.pdf
. So the relative URL link would be:
..\..\..\public\adacct\000123.pdf
Check the documents into the Content Server. When converting the documents to PDF, the refinery will create links relative to the location of the web viewable file for each document you are checking in.
Tiff conversion enables the following functionality specific to TIFF (Tagged Image File Format) files:
Creation of a managed PDF file from a single or multiple-page TIFF file.
Creation of a managed PDF file from multiple TIFF files that have been compressed into a single ZIP file.
OCR (Optical Character Recognition) during TIFF-to-PDF conversion. This enables indexing of the text within checked-in TIFF files, so that users can perform full-text searches of these files.
The TiffConverter component is supported on Windows only. For information on file formats and languages that can be converted by PdfCompressor, see the documentation provided by CVISION.
Note:
The TiffConverter component requires CVISION CVista PdfCompressor to perform TIFF-to-PDF conversion with OCR. PdfCompressor is not provided with the TiffConverter component. You must obtain PdfCompressor from CVISION.
TIFF conversions require the following components to be installed and enabled on the specified server.
Component Name | Component Description | Enabled on Server |
---|---|---|
TiffConverter |
Enables Inbound Refinery to convert single or multipage TIFF files to PDF complete with searchable text. |
Inbound Refinery Server |
TiffConverterSupport |
Enables Content Server to support TIFF to PDF conversion. |
Content Server |
File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options. Installing and enabling the TiffConverterSupport component on a Content Server adds three TIFFConversion options on the File Formats Wizard page.
For a content item to be processed by Inbound Refinery, its file extension (for example, TIF or TIFF) must be mapped to a format name associated with the TIFFConversion conversion method. The added conversion options for Tiff Converter are not automatically mapped. They must be mapped manually. The following topics describe how to set the mappings:
File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. You can convert TIFF to PDF with OCR or TIFF to PDF without OCR.
To convert TIFF to PDF with OCR:
Log in to the Content Server as an administrator.
From the main menu, choose Administration then Refinery Administration then File Formats Wizard.
On the File Format Wizard page, select tiff, tif to enable Convert TIFF to PDF (TIFFConversion) in the File Type (conversion name) field menu. Selecting this menu item maps the TIF and TIFF file extensions to the image/tiff file format and associates the image/tiff file format with the TIFFConversion conversion method. When TIF or TIFF files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the image/tiff file format to PASSTHRU, so TIF and TIFF files are not processed by Inbound Refinery.
Note:
The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the Content Server has been restarted.
If you have added tifz
and tiz
file extensions using the Configuration Manager, you can select tifz, tiz on the File Format Wizard page to enable application/zip options in the File Type (conversion name) field menu.
Compressed Tiff to PDF (tifz, tiz): Selecting this menu item maps the TIFZ and TIZ file extensions to the graphic/tiff-x-compressed file format and associates the graphic/tiff-x-compressed file format with the TIFFConversion conversion method. When TIFZ or TIZ files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the graphic/tiff-x-compressed file format to PASSTHRU, so TIFZ and TIZ files are not processed by Inbound Refinery.
Compressed Tiff to PDF (zip): Selecting this menu item maps the ZIP file extension to the application/zip file format and associates the application/zip file format with the TIFFConversion conversion method. When ZIP files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the application/zip file format to PASSTHRU, so that ZIP files are not processed by Inbound Refinery.
Click Update to save all changes.
To convert TIFF to PDF without OCR:
Log in to the Content Server as an administrator.
From the main menu, choose Administration then Refinery Administration then File Formats Wizard.
On the File Format Wizard page, select tiff, tif to enable Convert TIFF to PDF (Direct PDFExport) in the File Type (conversion name) field menu. Selecting this menu item maps the TIF and TIFF file extensions to the image/tiff file format and associates the image/tiff file format with the Direct PDFExport conversion method. When TIF or TIFF files are checked into the Content Server, they are processed by the refinery using oit PDFExport and converted to PDF without OCR.
Note:
When the TIFF to PDF (Direct Export) options is used, only the metadata in the resulting PDF is searchable, the text is not searchable.
Click Update to save all changes.
File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes:
Log in to Content Server as an administrator.
From the main menu, choose Administration, then Admin Applets.
From the Applets list, choose Configuration Manager.
The Configuration Manager applet is started.
In the Configuration Manager applet, choose Options then File Formats.
To enable single, unzipped TIFF files (TIF and TIFF) to be processed by Inbound Refinery:
In the File Formats section, check that the image/tiff file format is added and associated with the TIFFConversion conversion method.
Note:
The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the Content Server has been restarted.
In the File Extensions section, check that the tif and tiff file extensions are added and mapped to the image/tiff file format.
To enable TIFF files that have been compressed into a single TIFZ or TIZ file to be processed by Inbound Refinery:
In the File Formats section, check that the graphic/tiff-x-compressed file format is and associated with the TIFFConversion conversion method.
In the File Extensions section, check that the tifz and tiz file extensions are added and mapped to the graphic/tiff-x-compressed file format.
To enable TIFF files that have been compressed into a single ZIP file to be processed by Inbound Refinery:
In the File Formats section, check that the application/zip file format is added and associated with the TIFFConversion conversion method.
In the File Extensions section, check that the zip file extension is added and mapped to the application/zip file format.
The ZIP file extension might be used in multiple ways in your environment. For example, you might be checking in:
Multiple TIFF files compressed into a single ZIP file for Inbound Refinery to convert to a single PDF file with OCR.
Multiple file types compressed into a single ZIP file that should not be processed (the ZIP file should be passed through in its native format).
When using the ZIP file extension in multiple ways, Oracle recommends configuring the Content Server to allow the user to choose how ZIP files are processed at check-in. This is referred to as Allow override format on check-in. To enable this Content Server functionality:
Note:
If you are using the upload applet to check in multiple files, the files are compressed into a single ZIP file before being checked in. In this case Oracle also recommends enabling Allow override format on check-in so the user can choose how the ZIP file is processed when uploading multiple TIFFs.
Tip:
When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.
This section discusses the following topics regarding conversion settings:
When installed on the refinery, the TiffConverter component adds the TIFFConversion option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the Content Server.
The timeout settings should reflect the processing time required for the size of TIFF files that are commonly checked in to the Content Server. This is highly variable depending on CPU power and TIFF complexity. Perform these tasks to determine the appropriate timeout values for TIFF files:
Run and time several representative Inbound Refinery jobs using CVista PdfCompressor alone (without the Inbound Refinery).
Examine the document history information and evaluate the required processing time.
Change Inbound Refinery timeout settings accordingly.
Note:
Information about Tiff Converter timeouts is recorded in the Inbound Refinery and agent logs.
To configure timeout settings for Tiff to PDF file generation:
This section discusses the following topics regarding the CVista PdfCompressor:
These options are specific to CVista PdfCompressor. If the TiffConverter component is not installed, the CVista PdfCompressor Options are not available.
To change the PdfCompressor settings:
Tip:
When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.
The following recommended parameter strings should produce optimal results for each given scenario. If these settings do not produce the intended results, modify these strings by removing or appending settings. For more information on these and other available settings, see the online help provided with CVista PdfCompressor (especially "Appendix A: Command-Line Flags for Compression").
Default CVista PdfCompressor Parameters - OCR Enabled
A default string is set when the TiffConverter component is installed unless a string already exists (if the string was set using a previous version of Tiff Converter). The default string has been optimized for typical PdfCompressor usage with OCR enabled:
‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐o ‐ocrmode 1 ‐ot 120 ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong
CVista PdfCompressor Parameters- Horizontal and Vertical OCR Enabled
The following string can be used for typical usage with OCR and support OCR processing of both vertical and horizontal text in the same image (add -ocrtwod):
‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐o ‐ocrmode 1 ‐ot 120 ‐ocrtwod ‐lsize 25 ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong
CVista PdfCompressor Parameters - No OCR
The following string can be used for simple conversion (without OCR):
‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong
Note:
Changes made in the CVista PdfCompressor user interface do not affect how CVista PdfCompressor functions when called by Tiff Converter.
By default, CVista PdfCompressor uses an English OCR dictionary when performing OCR on TIFF files. However, CVista PdfCompressor can perform OCR on several other languages.
To set up multiple OCR languages and enable the user to choose the OCR language at check-in:
Note:
If the following method is used, language parameters should not be specified or passed to the refinery via the CVista PdfCompressor Options Page.
Obtain the appropriate current language files by contacting CVISION:
A lng
file is required for each language.
Czech, Polish, and Hungarian also require the latin2.shp
file.
Russian also requires the cyrillic.shp
file.
Greek also requires the greek.shp
file.
Turkish also requires the turkish.shp
file.
Place the CVISION language files in the CVista installation directory. The default location is C:\Program Files\CVision\PdfCompressor
xx
\
where xx
stands for the version number of PdfCompressor.
Log in to Content Server as an administrator.
From the main menu, choose Administration then Admin Applets.
From the Applets list, choose Configuration Manager.
On the Configuration Manager page, click Information Fields tab.
If the OCRLang information field has been added, skip this step. If it has not been added:
In the Field Info section, click Add.
On the Add Custom Info page, in the Field Name field, enter OCRLang. This creates a new information field for CVista language conversion options.
Note:
Enter this field name exactly.
Click OK.
On the Add Custom Info Field page, in the Field Caption field, enter the descriptive caption to be displayed on the Content check-in Form page. For example, OCR Language
.
From the Field Type list, choose Text.
Select the Enable Option List check box.
From the Option List Type list, choose Select List Validated.
In the Use option list field, enter xOCRLangList
.
Click Edit next to the Use Option List field.
On the Option List page, enter the CVista OCR languages to present as options. The following language names are valid options.
Note:
You can use either the English language name or the native equivalent (if listed). However, you must enter the language options exactly as they appear in the following table.
English | Native |
---|---|
Czech |
- |
Danish |
Dansk |
Dutch |
Nederlands |
English |
- |
Finnish |
Suomi |
French |
Français |
German |
Deutsch |
Greek |
- |
Hungarian |
Magyar |
Italian |
Italiano |
Norwegian |
Norsk |
Polish |
Polski |
Portuguese |
Português |
Russian |
- |
Spanish |
Español |
Swedish |
Svenska |
Turkish |
- |
Select the Ignore Case check box.
Click OK.
In the Default Value field, enter the default OCR language option.
Click OK to save the settings and return to the Information Fields tab.
Click Update Database Design.
If the OCRLang Information field has been added, but changes must be made to the languages option list and/or the default language:
In the Field Info section, select OCRLang and click Edit.
On the Add Custom Info page, click Edit next to the Use Option List field.
On the Option List page, delete any unused CVista OCR languages.
Click OK.
In the Default Value field, enter the default OCR language option.
Click OK to save the settings and return to the Information Fields tab.
Close the Configuration Manager applet. When a user checks in a TIFF file, the user can override the default OCR language by selecting any of the OCR languages that were set up.
XML conversions require the following components to be installed and enabled on the specified server.
Component Name | Component Description | Enabled on Server |
---|---|---|
XMLConverter |
Enables Inbound Refinery to produce FlexionDoc and SearchML-styled XML as the primary web-viewable file or as independent renditions, and can use the Xalan XSL transformer to process XSL transformations. |
Inbound Refinery Server |
XMLConverterSupport |
Enables Content Server to support XML conversions and XSL transformations. |
Content Server |
This section discusses the following XML conversion management topics:
File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add‐ons. Each Content Server must be configured to send files to refineries for conversion.
When a file extension is mapped to a file format and a conversion, files of that type are sent for conversion when they are checked into the Content Server. File extension, file format, and conversion mappings can be configured using either the File Formats Wizard or the Configuration Manager.
Most conversions required for Inbound Refinery are available by default in Content Server. In addition to the default conversions, the following conversions are added to the Content Server when the XMLConverterSupport component is installed.
Conversion | Description |
---|---|
FlexionXML |
Used to convert files to XML using the FlexionDoc schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). To send these standard file types to a refinery for conversion to XML using FlexionDoc, their file formats do not need to be re-mapped to the FlexionXML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager. |
SearchML |
Used to convert files to XML using the SearchML schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). To send these standard file types to a refinery for conversion to XML using SearchML, their file formats do not need to be re-mapped to the SearchML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager. |
XSLT Transformation |
After XML Converter converts documents to the FlexionDoc schema, the XSLT conversion allows the resultant XML to be transformed into other XML schema specified by a developer. |
Conversions available in the Content Server should match those available in the refinery. When a file format is mapped to a conversion in the Content Server, files of that format are sent for conversion on check-in. One or more refineries must be set up to accept that conversion.
Most conversions required for Inbound Refinery are available by default. In addition to the default conversions that can be accepted by a refinery, the FlexionXML and SearchML conversions are added to the refinery when the XMLConverter component is installed. The FlexionXML and SearchML conversions are accepted by default.
To set XML files as the primary web‐viewable rendition:
Inbound Refinery uses the Xalan XSLT processor and the SAX validator built into the Java virtual machine running Inbound Refinery. To enable transformation, the XMLConverter component must be installed and enabled on the refinery server and the XMLConverterSupport component must be installed and enabled on the Content Server.
To turn on XSL Transformation:
Log into the refinery server.
Do one of the following:
If the XML rendition is to be the primary web-viewable file, click Conversion Settings then Primary Web Rendition. Enable Convert to XML on the Primary Web-Viewable Rendition Page when it is displayed.
If the XML is to be an additional rendition, click Conversion Settings then Additional Renditions. Enable Create XML renditions for all supported formats on the Additional Renditions Page when it is displayed.
Click XML Options.
On the XML Options page, enable Process XSLT Transformation and select the XML schema to use from the following options:
Produce FlexionDoc XML
Produce SearchML
Click Update to save all changes or Reset to revert to the last saved settings.
In order to preform XSL transformations Inbound Refinery must have an XSL template to apply during the transformation checked into Content Server. To check in an XSL template to Content Server:
Create an XSL file. The XSL file specifies how an XML file with a specific Content Type will be transformed to a new XML file. A DTD or schema can be specified for validation and stored in the Content Server, but is not required.
Check the XSL file into the Content Server and associate it to a Content Type.
In the Content check-in Form, select the Content Type from the Type list.
Enter the Content ID according to the following convention:
Content Type
.xsl
For example, if the Content Type is Documents
, enter documents.xsl.
Enter the XSL file as the Primary File.
Check that the Security Group matches any DTD/schema files in the Content Server associated with the XSL file and the native files that are checked into the Content Server.
Click Check In.
When files are checked in with this Content Type, and a FlexionDoc/SearchML XML file is generated by XML Converter or the checked-in file is XML, this XSL file will be used for XSL transformation to a new XML document.
Repeat these steps for each Content Type to post-process to XML.
When a validation fails, Inbound Refinery collects the errors from the SAX Validation engine, creates an hcsp error page and attempts to check in the page to Content Server.
Manually set up outgoing providers on Inboard Refinery to the Content Server for the refinery to check in an error page. The name of Inbound Refinery provide must match the agent name. For example if Inbound Refinery is named production_ibr
and it is converting files for a Content Server named production_cs
, then an outgoing provider named production_cs
must be created on the production_ibr
Inbound Refinery.
To set up a criteria workflow to be notified regarding XSL transformation failures:
Users: specify the users that should be notified.
Exit Conditions: select At least this many reviewers, and set the value to 0.
Events: For the Entry event, add the following Custom Script Expression:
<$if dDocTitle like "*XSLT Error"$> <$else$> <$wfSet("wfJumpEntryNotifyOff", "1")$> <$wfExit(0,0)$> <$endif$>
For details about using workflows, see Managing Workflows.
Inbound Refinery can convert native Microsoft Office files to HTML by using the native Microsoft Office applications installed on a Windows system. Content Server can be installed on either a Windows or UNIX platform, but for Microsoft Office to HTML conversions to work, Inbound Refinery must be configured on the Windows system where the Microsoft Office native applications are installed.
HTML conversion automates opening Microsoft office files in their native application, saves them out as HTML pages, then collects the HTML output into a compressed ZIP file that gets returned to Content Server.
HTML conversion can process the following types of files:
Microsoft Word 2003 through 2010
Microsoft Excel 2003 through 2010
Microsoft PowerPoint 2003 through 2010
Microsoft Visio 2007
When WinNativeConverter is enabled to work with Inbound Refinery, native Microsoft Office files checked into Content Server are sent to Inbound Refinery for conversion. Inbound Refinery automates the process of converting the files to HTML using the native Microsoft Office applications. If a single HTML page is returned to Content Server, it is used as the web-viewable file. If conversion results in multiple HTML pages, the following files are returned to Content Server:
An HCSP page as the primary web-viewable rendition
A ZIP file that includes the HTML output from the Office application
Optionally, a thumbnail rendition of the native Microsoft Office file
When a user clicks on the web-viewable link in Content Server of a document converted to multiple HTML pages by Inbound Refinery, the HCSP page redirects the server to the HTML rendition.
Microsoft Office to HTML conversions require the following components to be installed and enabled on the specified server.
Component Name | Component Description | Enabled on Server |
---|---|---|
WinNativeConverter |
Enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, PowerPoint and Visio to HTML using the native Office application. |
Inbound Refinery Server |
MSOfficeHtmlConverterSupport |
Enables Content Server to support HTML conversions of native Microsoft Office files converted by Inbound Refinery and returned to Content Server in a ZIP file. Requires that ZipRenditionManagement component be installed on the Content Server. |
Content Server |
ZipRenditionManagement |
Enables Content Server access to HTML renditions created and compressed into a ZIP file by Inbound Refinery. |
Content Server |
This section discusses how to configure Content Server to work with Microsoft Office to HTML conversions:
When installed on the refinery, the WinNativeConverter adds the Word HTML, PowerPoint HTML, Excel HTML, and Visio HTML option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the Content Server. File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options.
For a Microsoft Office document to be processed by Inbound Refinery, its file extension must be mapped to a format name that is associated with the HTML Conversion method. The added conversion options for HTML Conversion are not automatically mapped: they must be mapped manually. They can be set either using the File Formats Wizard or the Configuration Manager applet. The Configuration Manager applet gives you greater control over which file extensions are mapped to which conversion options. For details, see the following sections:
File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. To make changes:
File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes: