Content is the material you want to make available to be searched, and can take many forms. Examples include:

Note: For a list of supported formats for file system content, see Appendix A, Indexable File Types. Indexing PDF, HTML, and rich content involves separate installations and licensing; see the ATG Search Installation and Configuration Guide for information.

Oracle ATG Web Commerce Search can index content found in either a file system or a repository. Either type of content can in principle be considered to be either structured or unstructured. In usual practice, repository data is structured and file system data is unstructured. A common exception is XHTML documents, which may be stored in a file system but are structured (see the ATG Search Query Guide for information on the XHTML structure required by Oracle ATG Web Commerce Search).

When you index content as structured, the client application through which end-users access Oracle ATG Web Commerce Search can constrain searches to specific text fields (in what is called fielded search). When using fielded search, a search is performed as usual, but only over the text contained in the specified fields; for example, a user might want to search only across product descriptions in a commerce catalog. This differs from constraining by property, which is analogous to “search only items that have brand=BrandX”. (Note that you can achieve a similar effect in unstructured documents using index-only metatags. See the instructions in Adding File System Content to Content Sets.)

Structured content exists in the form of Oracle ATG Web Commerce catalog data. The data loaders and IndexingOutputConfig component generate XHTML representations of this data, which Search then indexes. You may want to index file system content as structured if your data already exists as XHTML files. Otherwise, unstructured content usually consists of text documents in various forms.

In the index, unstructured content is rooted at the /Documents document set. All structured content is rooted at the /Solutions document set. The document set organization allows constraints to be applied at a high level (see the Topic Sets and Query Rules chapters for information on features that use the document set, also referred to as docset).

Once you have identified the content you want to index, you can add it to your search project’s content set. Each search project automatically has one content set when created. If you plan to have only one content set, go to Adding File System Content to Content Sets or Adding Repository Content to Content Sets, depending on your content type.

You can use additional content sets to organize your content if you want to update portions of your index independently, associate content sets with specific sites in a multisite configuration, or allow client software that uses Search to constrain end-user searches by content set.


Copyright © 1997, 2013 Oracle and/or its affiliates. All rights reserved. Legal Notices