Content is the material you want to make available to be searched, and can take many forms. Examples include:

Note: For a list of supported formats for file system content, see Appendix A, Indexable File Types. Indexing HTML and PDF content involves separate installations and licensing; see the ATG Search Installation and Configuration Guide for information.

ATG Search can index content found in either a file system or an ATG repository. Either type of content can in principle be considered to be either structured or unstructured. In usual practice, repository data is structured, and file system data is unstructured. File system data may be structured in the case of XHTML files (see the ATG Commerce Search Guide for information on the XHTML structure required by ATG Search).

When you index content as structured, the client application through which end-users access Search can constrain searches to specific text fields (in what is called fielded search). When using fielded search, a search is performed as usual, but only over the text contained in the specified fields; for example, a user might want to search only across product descriptions in a Commerce catalog, or only symptoms in a Service solution. This differs from constraining by property, which is analogous to “search only items that have brand=BrandX”.

Structured content exists in the form of ATG Commerce catalog data and ATG Knowledge solutions. These ATG products generate XHTML files, which ATG Search then indexes in a way that makes fielded search possible. You may want to index file system content as structured if you have many such XHTML files outside of a repository. Otherwise, unstructured content usually consists of text documents in various forms.

Another practical effect is that in the index, unstructured content is rooted at the /Documents document set. All structured content is rooted at the /Solutions document set. The document set organization allows constraints to be applied at a high level (see the Topic Sets and Query Rules chapters for information on features that use the document set, also referred to as docset).

Once you have identified the content you want to index, you can add it to your Search project’s content set. Each Search project automatically has one content set when created. If you plan to have only one content set, go to Adding File System Content to Content Sets or Adding Repository Content to Content Sets, depending on your content type.

You can use additional content sets to organize your content if you want the option of updating portions of your index independently, or allowing client software that uses Search to constrain end-user searches by content set.

 
loading table of contents...