To add content from a file system to a content set:
On the Projects tab, go to Projects > Your Project > Content.
Click the Add Content link under the content set to which you want to add the content.
On the Add Content Set page, enter a name for the new content.
Select File System as the content type. Additional fields appear on the page.
Enter or browse to the Content File Path. This is the root directory where the content is located.
Under Indexing Options, configure the following optional settings:
Select a content-level Text Processing Option Set to apply to this content, if any. The default is to use the project-level set. See the Text Processing Option Sets chapter for information.
Enter a document set name. A document set name creates a sub-path within the final index, which can be used by the client UI to restrict end-user searches.
For example, if there is no Document Set Name, then unstructured documents in this content are indexed under the root
/Documents
. If you provide a Document Set Name ofMyContent
, the content is indexed under/Documents/MyContent
.Select whether the content should be treated as structured or unstructured during indexing. Examples of structured documents include data in repositories, while unstructured content consists of documents in a file system.
If you are using the Oracle ATG Web Commerce multisite feature, set the value of the Sites metatag to either All Sites or Selected Sites. All Sites means that unstructured content cannot be constrained by site. Selected Sites allows you to associate your unstructured content with the specified sites.
For example, consider the Oracle ATG Web Commerce Reference Store, which includes three sites: ATG Store US, ATG Store Germany, and ATG Home. If you create a content set called Countries, set the Sites metatag to include US and Germany, and your search request specifies the Germany and Home sites, the search returns index items from the Countries content set. If your search request specifies only the Home site, it does not return index items from the Countries content set.
To select a site, click the Select button. Check a site to select it. Click OK when finished selecting sites.
Note: If you later change the sites associated with your content, you must run a full index to ensure that site associations are correctly represented in search results.
Identify and create additional index-only metatags as necessary. Index-only metatags allow you to add metatags to unstructured content (structured content is already tagged), allowing users to perform fielded searches. To create a metatag, specify the property’s type, then provide a name and a value. Note that in order to use this feature, your content should be organized to reflect the metatags you want to apply. For example, you may have a directory of text files representing news articles. You could further organize them into subdirectories by topic, then apply metatags to allow constraints. In this example, the metatags would be of type String, have name Subject, and values of Local, National, Global, etc.
Check the Create Document Set box if you want Search to create a document set based on this metatag. Document sets allow end-users to search subsets of content and to browse without query input.
Optionally, for each of the file types supported by Search, you can provide file extensions that should be mapped to that type for this content (optional).
Configure additional optional file suppression features:
Blank Extension—Select a file type to which files with blank extensions should be mapped. If you select Suppress, files with no extension are not indexed.
Unspecified Extensions—Select a file type to which files with unmapped extensions should be mapped. If you select Suppress, files with unknown extensions are not indexed.
Suppression—Identify file extensions that should never be indexed in this content.
General file suppression—Identify specific files in this content to suppress. Separate file name with a space. File names can include asterisks as wildcards. For example:
MyDoc*
or*Document.doc.
Note that if the filename includes a space, you must enclose that name in double quotes. For example:
MyDoc1 "My Doc2" MyDoc3
Configure Advanced Settings (optional) for this content.
External Access URL—Base URL for the document repository folder containing unstructured documents to be indexed.
When Search finds a sentence in the index that answers a question, it sends that sentence to the end user along with a link to the source of the answer. Some document types, such as PDFs, must be retrieved from a URL external to the index in order for the user to view them. This is the URL provided to the end-user viewing the document. For example:
http://www.somestore.com/product_manuals
Converted Document Output Directory—Path to the directory where you want to store image files that are created during the indexing of rich data files.
Converted Document Access URL—Virtual directory of the Converted Document Output Directory. This allows someone to view images (via a URL link) from answers that are derived from your rich data content.
Default encoding. This optional value can be used to change the default encoding assumed for content that is not tagged with an encoding, and where the encoding differs from the platform default encoding.
For example, the platform default encoding on Windows is Windows-1252. You may have documents encoded as Shift_JIS, which is used for Japanese, but these documents are not tagged with their encoding. Specify Shift_JIS in this field, and the documents will be decoded properly.
Note: If a document is properly tagged with an encoding, Search uses that encoding, ignoring both the system default and the Default Encoding value for this field in the content definition.
The Additional Setting area is for any other
property=value
settings you may need to apply.
Click Add Content.
Repeat for any linked projects to which you want to add the content.