This appendix provides overviews of the autocategorization process and setup tasks, and discusses how to:
Define autocategorization engine vocabularies.
Perform autocategorization.
Note. PeopleSoft Enterprise Portal does not include an autocategorization engine. However, integrating one is relatively straightforward. Before performing content autocategorization, you must install an engine.
See Also
Setting Up to Run the Content Categorization Spider
Multiple autocategorization engines can be configured and used. An engine must be accessible by HTTP GET or POST. The HTTP request would typically target an adapter object that maps the generic autocategorization input parameters to the proprietary interface of a specific engine.
The following table describes the input query parameters to the autocategorization engine.
Note. Certain autocategorization engines may not use all of these parameters. The valid range of values for parameters may vary.
Parameter |
Description |
DOC |
The URL of the document to be classified. |
DOC_TYPE |
The engine-specific code which indicates that the DOC parameter refers to a URL. This may be necessary for engines that support alternatives, such as sending the entire contents of the document instead of just the URL. |
HIERARCHY |
The vocabulary into which the document should be classified. This is the same value that you specify in the Vocabulary field when you define vocabularies on the Autocategorization Vocabularies page. |
MAX_CATS |
The maximum number of categories into which you can classify documents. |
THRESHOLD |
The minimum confidence ranking required for a document to be classified within a category. |
USERID |
The user ID to use when accessing the autocategorization engine. |
PASSWORD |
The password to use when accessing the autocategorization engine. |
The output of the autocategorization request should be a simple text array of classification definitions in the following format. Each element of the array represents a separate classification for the same document. A space character separates element pairs.
<confidence score>,<category name or error message>
The confidence score should be a signed integer that conforms to the ranking scheme of the engine. A confidence score of −1 should denote errors.
For example, output from a successful call to an engine might look like this:
.23,/business/bus_law .18,/computers/internet .17,/business/industry/tech
The output from an unsuccessful call might look like this:
-1,Server Not Responding; Engine May Not Be Running.
The PeopleSoft Enterprise Portal provides sample ASP and Java servlet adapters as templates:
$PS_HOME/web/integration/autoclassify.asp
$PS_HOME/web/integration/AutoCatEJBServlet.java
$PS_HOME/web/integration/AutoCatNativeServlet.java
The templates illustrate how to access the input parameters of the autocategorization request, forward them to sample autocategorization engines, and then format and return the engines’ responses.
The ASP template demonstrates how to integrate with a Component Object Model interface; the servlets show how to integrate with an Enterprise JavaBeans or custom C interface.
This section provides a summary of the tasks required to perform autocategorization.
Note. The categorization spider is used to perform the autocategorization process.
To perform autocategorization:
Define autocategorization engine vocabularies.
This task enables you to set up meaningful names for different vocabularies within an autocategorization engine. Some engines use codes for different vocabularies, such as a number. Assigning a name within Content Categorization enables an administrator to refer to that vocabulary in a more meaningful way.
Define content sources on the Run Categorization Spider page.
This is the where the spider will locate content.
Create taxonomies.
This is where you create your taxonomy and connect the defined content source to the taxonomy. The top folder name should correspond to the autocategorization engine vocabulary name. Information entered to establish the content source for your taxonomy is based on the content source defined using the Run Categorization Spider page.
This table shows how to complete the Content Source page:
Spider Source Fields |
Values |
Source Type |
Auto Categorized File Server. |
Source Name |
Select the source name you entered on the Run Categorization Spider page. |
Source Path |
This value is automatically filled with the URL you entered on the Run Categorization Spider page for the selected source name. |
Auto Expand Folder |
Allows for subfolders to be created automatically. Note. Auto-created folders cannot have content added manually or added automatically from other content sources. |
This section discusses how to define autocategorization engine vocabularies.
Access the Autocategorization Engine Vocabulary Definition page (EPPCM_SPIDR_VOC) (select Content Management, Categorized Contents, Autocategorization).
Use the Autocategorization Engine Vocabulary Definition page to set the Autocategorization vocabulary used in categorizing content.
Name |
Enter the name of the autocategorization engine. |
URL |
Enter the autocategorization engine's URL. |
Vocabulary Name and Long Description |
Enter a name that is used by the autocategorization engine for a taxonomy into which it can classify documents. The description appears on the Content Categorization pages to help identify the vocabulary. |
After the taxonomy has been created and content sources have been mapped, the spider can be invoked.