Configuration Guidelines for Clustering
This section describes some guidelines for configuring cluster discovery.
Copyright ©
Legal Notices
Guided Search Platform Services Content Acquisition System Relationship Discovery Guide
Documentation Home
Highlighting
Prev
Next
Contents
Search
Preface
About this guide
Who should use this guide
Conventions used in this guide
Contacting Oracle Support
Introduction to Term Discovery
Overview of Relationship Discovery
Overview of Term Discovery
Extracting terms from documents
Maximum size of extracted terms
Presenting relevant terms
Overview of Cluster Discovery
Configuration Guidelines for Term Extraction
Adding a Term Extractor manipulator
Term extraction workflow
Minimal term extraction configuration
Source input text property
Record specifier property name
Noun phrase grouping
Output comparison
Terms output property
All-terms destination property
Language
Supported languages
Configuration for the exclude list
Configuration for the main term extraction module
Update mode
STATELESS mode
STATEFUL mode
PARTIAL mode
Notes on update modes
Maximum number of input records
Configuration for candidate term identification
Input term property
Language specification of input records
LANG pass-through
LANG_PROP_NAME pass-through
Configuration for corpus-level filtering
Minimum and maximum occurrences in records
Minimum and maximum coverage settings
Threshold for the global informativeness of terms
Using regular expressions
Enabling debugging information for corpus-level filtering
Configuration for record-level filtering
Specifying a scoring threshold
Limiting the number of terms per record
Setting a hard limit
Establishing a cut-off window
Best practices for term filtering
Corpus-level filtering best practices
Record-level filtering best practices
Format of the source data
Configuration Guidelines for Clustering
Configuration for clusters
Clustering parameter descriptions
Sample size
Maximum clusters
Coherence
Maximum precision
Maximum cluster size
Maximum cluster overlap
Tuning strategy for clusters
1: Number of records sampled from the navigation state
2: Maximum refinement precision
3: Maximum number of terms per cluster
4: Cluster Coherence
5: Maximum cluster overlap
6: Maximum number of clusters
Building the Front End of the Term Discovery Application
Files to be changed
Adding global constants
Setting refinements in the controller file
Displaying refinements
Displaying clusters
Cluster properties
JSP code for displaying clusters
Clustering overlap properties
Displaying records and dimension refinements
Term Discovery Advanced Topics
Term filtering with pre-tagged records
Filtering only pre-existing terms
Filtering both sets of terms uniformly
Filtering only the new terms
Tuning aids for the filtering parameters
Using STATEFUL mode for tuning
Using corpus-filtering logging statistics
Term Discovery Sample Files
Modified nav_controls.jsp file
New nav_clusters.jsp file
Search Terms