Search Can Assign Titles to PDFs

In many cases, some or all of the articles in your collections are PDFs.

Most PDFs have a visual title, which is a string of text on the first page that most readers would recognize as the title, for example Mobile Phone User Manual or How to Read a Stock Report. However, not every visual title is that easy to spot. There may be many lines of text on the first page, or maybe no text at all. So how does Search determine the visual title to match to a search request?

Search uses the automatic PDF title discovery feature to determine PDF titles. It finds the visual title of a PDF automatically and uses it for title matching and as the search result title. A key advantage to the title discovery feature is you don't have to do any additional authoring to provide the best search result title.

The title discovery evaluates the PDF for visual factors, such as:

  • The size of the font. Usually the larger text on a first page denotes a title.

  • The position of the text, for example, the first few sentences of the PDF.

  • The phrase length and distance between lines of the text.

For example, a string of text appears on a first page of a PDF as:

Oracle [Font Size 30, Bold, Red]

User Guide [Font Size 24, Bold]

Knowledge Management [Font Size 20]

Version 1.1 [Font Size 18]

This user guide describes how to use Knowledge applications [Font Size 11]

Most users wouldn't read this string of text as:

  • Oracle, or

  • Version 1.1, or

  • This user guide describes how to use Knowledge applications.

But they would read this text as:

  • Oracle User Guide or User Guide

  • Oracle User Guide Knowledge Management or User Guide Knowledge Management

  • Oracle User Guide Knowledge Management Version 1.1 or User Guide Knowledge Management Version 1.1

So, in this example the PDF title discovery automatically determines the best titles for search accuracy as Oracle User Guide Knowledge Management Version 1.1, or User Guide Knowledge Management Version 1.1.

If the PDF has no string of text on the first page, making it impossible to find a visual title to assign, it selects one of the following as a search result title:

  • The title in the PDF's properties, if a properties title is defined.

  • The PDF's file name.

Note: The visual title that Search assigns to a PDF appears as a search result title only after the full content processing cycle is complete.

How to Disable the Automatic PDF Title Discovery

The PDF title discovery feature is enabled when you open the application, but you can disable it. However, be aware that disabling this option may severely compromise search accuracy. If you do disable it, you will see the following warning message:

Enable the automatic title discovery for PDFs. Test thoroughly as this changes impact search accuracy. Search will reflect the change after content processing completes.

When this configuration option is off, the application assigns a search result title for a PDFs as if there is no visual title available. The change of this option takes effect after the next content processing cycle completes. To disable the automatic PDF title discovery:

  1. Log in and click Setup and Maintenance.

  2. At the Setup menu, scroll down and select Service.

  3. At Functional Areas, select Knowledge Management.

  4. At Knowledge Management, click Manage Knowledge Search Profile Options.

  5. At the Manage Knowledge Search Profile Options page, click CSO_AUTO_PDF_TITLE_DISCOVERY.

  6. At the Site menu, select No.