To add content from a file system to a content set:
On the Projects tab, go to Projects > Your Project > Content.
Click the Add Content button under the content set to which you want to add the content.
On the Add Content Set page, enter a name for the new content.
Select File System as the content type. Additional fields appear on the page.
Enter or browse to the Content File Path. This is the root directory where the content is located.
Under Indexing Options, configure the following optional settings:
Select a content-level Text Processing Option Set to apply to this content, if any. The default is to use the project-level set. See the Text Processing Option Sets chapter for information.
Enter a document set name. A document set name creates a sub-path within the final index, which can be used by the client UI to restrict end-user searches.
For example, if there is no Document Set Name, then unstructured documents in this content are indexed under the root
/Documents
. If you provide a Document Set NameMyContent
, the content is indexed under/Documents/MyContent
.Select whether the content should be treated as structured or unstructured during indexing. Examples of structured documents include databases and repositories, while unstructured content consists of documents in a file system.
Optionally, for each of the file types supported by ATG Search, you can provide file extensions that should be mapped to that type for this content (optional).
Configure additional optional file suppression features:
Blank Extension—Select a file type to which files with blank extensions should be mapped. If you select Suppress, files with no extension are not indexed.
Unspecified Extensions—Select a file type to which files with unmapped extensions should be mapped. If you select Suppress, files with unknown extensions are not indexed.
Suppression—Identify file extensions that should never be indexed in this content.
General file suppression—Identify specific files in this content to suppress. Separate file name with a space. File names can include asterisks as wildcards. For example:
MyDoc*
or*Document.doc.
Configure Advanced Settings (optional) for this content.
External Access URL—Base URL for the document repository folder containing unstructured documents to be indexed.
When ATG Search finds a sentence in the index that answers a question, it sends that sentence to the end user along with a link to the source of the answer. Some document types, such as PDFs, must be retrieved from a URL external to the index in order for the user to view them. This is the URL provided to the end-user viewing the document. For example:
http://ww.somestore.com/product_manuals
Converted Document Output Directory—Path to the directory where you want to store image files that are created during the indexing of rich data files.
Converted Document Access URL—Virtual directory of the Converted Document Output Directory. This allows someone to view images (via a URL link) from answers that are derived from your rich data content.
Default encoding. This optional value can be used to change the default encoding assumed for content that is not tagged with an encoding, and where the encoding differs from the platform default encoding.
For example, the platform default encoding on Windows is Windows-1252. You may have documents encoded as Shift_JIS, which is used for Japanese, but these documents are not tagged with their encoding. Specify Shift_JIS in this field, and the documents will be decoded properly.
Note: If a document is properly tagged with an encoding, Search uses that encoding, ignoring both the system default and the Default Encoding value for this field in the content definition.
The Additional Setting area is for any other
property=value
settings you may need to apply.
Click Add Content.