Overview of Attributes

Each source has its own set of document attributes. Docume nt attributes, like metadata, describe the properties of a document. The crawler retrieves values and maps them to a search attributes. This mapping lets users search documents based on their attributes. Document attributes in different sources can be mapped to the same search attribute. Therefore, users can search documents from multiple sources based on the same search attribute.

After you crawl a source, you can see the attributes for that source. Document attribute information is obtained differently depending on the source type.

Document attributes can be used in tasks such as document management, access control, or version control. Different sources can have different attribute names that are used for the same idea; for example, version and revision. It can also have the same attribute name for different ideas; for example, "language" as in natural language in one source but as programming language in another. Document attribute information is obtained differently depending on the source type.

Oracle SES has several default search attributes. They can be incorporated in search applications for a more detailed search and richer presentation.

Search attributes are defined in the following ways:

System-defined search attributes, such as title, author, description, subject, and mimetype.
Search attributes created by the Oracle SES administrator.
Search attributes created by the crawler. During crawling, the crawler plug-in maps the document attribute to a search attribute with the same name and data type. If not found, then the crawler creates a new search attribute with the same name and type as the document attribute defined in the crawler plug-in.

Note:

Search attribute names must be unique; two attributes cannot have the same name. For example, if a search attribute exists with a String data type, and another search attribute is discovered by the crawler with the same name but a different data type, then the crawler ignores the second attribute.

To prevent this conflict and allow Oracle SES to index both attributes, check the list of Oracle SES attribute names and types in Oracle SES Attributes before creating new attributes.

Attributes For Different Source Types

Table and database sources have no predefined attributes. The crawler collects attributes from columns defined during source creation. You must map the columns to the search attributes.

For Siebel 7.8 sources, specify the attributes in the query while creating the source. For Oracle E-Business Suite and Siebel 8 sources, specify the attributes in the XML data feed.

For many source types, such as OracleAS Portal, e-mail, NTFS, and Microsoft Exchange sources, the crawler picks up key attributes offered by the target systems. For other sources, such as Documentum eRoom or Lotus Notes, an Attribute list parameter is in the Home - Sources - Customize User-Defined Source page. Any attributes that you define are collected by the crawler and available for search.

Using Lists of Values for Search Attributes

The list of values (LOV) for a search attribute can help you specify a search. Global search attributes can be specified on the Global Settings - Search Attributes page.

For user-defined sources where LOV information is supplied through a crawler plug-in, the crawler registers the LOV definition. Use the Oracle SES Administration GUI or the crawler plug-in to specify attribute LOVs, attribute value, attribute value display name, and its translation.

When multiple sources define the LOV for a common attribute, such as title, the user sees all the possible values for the attribute. When the user restricts search within a particular source group, only LOVs provided by the corresponding sources in the source group are shown.

LOVs can be collected automatically. The following example shows Oracle SES collecting LOV values to crawl a fictitious URL.

Create a Web source with http://www.example.com as the starting URL. Do not start crawling yet.
From the Global Settings - Search Attributes page, select the Attribute for Oracle SES to collect LOVs and click Manage Lov. (For example, click Manage Lov for Author.)
Select Source-Specific for the created source, and click Apply.
Click Update Policy.
Choose Document Inspection and click Update, then click Finish.
From the Home - Schedules page, start crawling the Web source. After crawling, the LOV button in the Advanced Search page shows the collected LOVs.

System-Defined Search Attributes

There are also two system-defined search attributes, Urldepth and Infosource Path.

Urldepth measures the number of levels down from the root directory. It is derived from the URL string. In general, the depth is the number of slashes, not counting the slash immediately following the host name or a trailing slash. An adjustment of -2 is made to home pages. An adjustment of +1 is made to dynamic pages, such as the example in Table 3-3 with the question mark in the URL.

Urldepth is used internally for calculating relevance ranking, because a URL with a smaller URL depth is typically more important.

Table 3-3 lists the Urldepth of some example URLs.

Table 3-3 Depth of Example URLs

URL	Urldepth
http://example.com/portal/page/myo/Employee_Portal/MyCompany	4
http://example.com/portal/page/myo/Employee_Portal/MyCompany/	4
http://example.com/portal/page/myo/Employee_Portal/MyCompany.htm	4
http://example.com/finance/finhome/topstories/wall_street.html?.v=46	4
http://example.com/portal/page/myo/Employee_Portal/home.htm	2

Infosource Path is a path representing the source of the document. This internal attribute is used in situations where documents can be browsed by their source. The Infosource Path is derived from the URL string.

For example, for this URL:

http://example.com/portal/page/myo/Employee_Portal/home.htm

The Infosource Path is:

portal/page/myo/Employee_Portal

If the document is submitted through a connector, this value can be set explicitly by using the DocumentMetadata.setSourceHierarchy API.