Setting Up OracleAS Portal Sources

An OracleAS Portal source enables users to search across multiple portal installations and repositories, such as Web pages, disk files, and pages on other OracleAS Portal instances. Oracle Secure Enterprise Search can securely crawl both public and private OracleAS Portal content.

To create an OracleAS Portal source: 

  1. On the Home page, select the Sources secondary tab to display the Sources page.

  2. For Source Type, select OracleAS Portal.

  3. Click Create to display the Create OracleAS Portal Source page.

  4. Complete the following fields. Click Help for additional information.

    • Source Name: Name that you assign to this OracleAS Portal source.

    • URL Base: Base URL for OracleAS Portal.

    • Page Groups: List of page groups in OracleAS Portal retrieved when you click Retrieve Page Groups. Select the ones to crawl.

  5. Click Create & Customize.

  6. Select the Authentication tab.

  7. Select Enable OracleAS Single Sign-On Authentication and enter your credentials.

  8. Click Apply.

  9. Follow the steps for crawling and indexing in "Getting Started Basics for the Oracle SES Administration GUI" for the mailing list schedule.

Crawling a Folder or Page

The portal crawler can crawl a subtree under a specific folder or page instead of under an entire portal tree.

To set the boundary rule to crawl a specific folder or page: 

  1. On the Home page, click the Sources secondary tab to display the Sources page.

  2. Select a source and click Edit to display the Edit User-Defined Source page.

  3. Click the URL Boundary Rules subtab.

  4. Under Inclusion Rules for the URL, select the starts with rule and enter the value of the PORTAL_PATH for the folder or page.

    For example, to crawl only the P2 subtree of a portal tree, enter the path from the root to P2, such as /Proot/P1/P2.

Omitting the Portal Pages for OracleAS Portal Sources

You can omit the portal pages of OracleAS Portal sources from a crawl, so that only the documents and files are indexed. This can increase the quality of the search result list. Because page groups can be just place holders for content, you can eliminate these useless results.

To omit portal pages from being crawled: 

  1. Open the crawler configuration file in a text editor


  2. Remove the comment symbol (#) preceding NO_INDEX_CONTAINER_PAGE.

Smart Incremental Crawl for OracleAS Portal Sources

Oracle SES provides a smart incremental crawl for OracleAS Portal sources, designed to make re-crawls more efficient by not traversing the entire portal tree. Instead of trying to detect all content and permission changes, Oracle SES obtains change information from OracleAS Portal. During re-crawl, the Oracle SES crawler asks OracleAS Portal for list of changes since a certain date (that is, the last re-crawl date) and OracleAS Portal generates a list of added, updated, and deleted resources.

To enable smart incremental crawl: 

  1. Open the crawler configuration file in a text editor


  2. Remove the comment symbol (#) preceding PORTAL_SMART_INCR_CRAWL.

OracleAS Portal Search Attributes

The crawler picks up key attributes offered by OracleAS Portal, as described in Table 6-1.

Table 6-1 OracleAS Portal Source Attributes

Attribute Description


Date the document was created


User name of the person who created the document


User-editable field so that they can specify a full name or whatever they want


Hierarchy path of the portal page/item in the portal tree (contains page titles)


Hierarchy path of the portal page/item in the portal tree, used for browsing and boundary rules (contains page names)

When searching OracleAS Portal 10.1.2, portal_path appears as upper case in the browse. When searching OracleAS Portal 10.1.4, portal_path appears in lowercase.


Title of the document


Brief description of the document


Keywords of the document


Expiration date of the document


Portal host


Path of the Portal page in the browse hierarchy


Language of the portal page or item


Last modified date of the document


Usually 'text/html' for portal


User-created markers that can be applied to pages or items, such as 'INTERNAL ONLY', 'REVIEWED', or 'DESIGN SPEC'. For example, a Portal containing recipes could have items representing recipes with perspectives such as 'Breakfast', 'Tea', 'Contains Nuts', 'Healthy' and one particular item could have several perspectives assigned to it.


Internal name of the portal page or item


Character set of the portal page or item


Category of the portal page or item


Date the last time the portal page or item was updated


Person who last updated the page or item


Subtype of the portal page/item (for example, container)


Portal item type


Mimetype of the portal page or item


Date the portal page or item was published


Version number of the portal item

Tips for Using OracleAS Portal Sources

  • An OracleAS Portal source name cannot exceed 35 characters.

  • URL boundary rules are not enforced for URL items. A URL item is the metadata that resides on the OracleAS Portal server. Oracle SES does not touch the display URL or the boundary rules for URL items.

  • The portal_path attribute is used to compare boundary rules. Portal pages and items are organized in a tree structure. When a page is included or excluded, its entire subtree starting with that node is included or excluded.

  • If OracleAS Portal user privileges change, the content the crawler collects might not be properly authorized. For example, in a Portal crawl, the user specified in the Home - Sources - Authentication page does not have privileges to see certain Portal pages. However, after privileges are granted to the user, on subsequent incremental crawls, the content still is not picked up by the crawler. Similarly, if privileges are revoked from the user, the content might still be picked up by the crawler.

    To be certain that Oracle SES has the correct set of documents, whenever a user's privileges change, update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and restart the crawl.