Call the
CasCrawler.getCrawlConfig() method to retrieve the
configuration settings of a crawl.
The syntax of the method is:
CasCrawler.getCrawlConfig(CrawlId crawlId, Boolean fillInDefaults)
Where:
crawlIdis aCrawlIdobject that contains the name of the crawl for which the configuration is to be returned.fillInDefaultsis a Boolean flag that, if set totrue, fills in the default value for any setting that has not been specified. If a setting is a password,truereturns the name but not the value. If the flag is set tofalse, it does not modify the value for any setting.
If you retrieve a crawl configuration that contains a
ModuleProperty for a password property, the crawl
configuration retrieves the value as a zero length list.
The method returns a
CrawlConfig object, which contains the following:
sourceConfig - a
SourceConfigobject that contains the seeds, filters, and specific information about the systems from which content is fetched, such as CMS information or whether file properties from the native file system should be gathered for file system crawls.manipulatorConfig - a list of
ManipulatorConfigobjects. Each ManipulatorConfig specifies a manipulation that is performed in a particular crawl.textExtractionConfig - a
TextExtractionConfigobject that contains the text extraction options, such as whether text extraction should be enabled and the number of retry attempts.outputConfig - an
OutputConfigobject that contains the output options, such as whether the records are written to a Record Store instance or a record output file, the path of the output directory and the output format (binary or XML).crawlthreads - a property indicating the number of threads per crawl.
To get the configuration settings of a crawl:
Make sure that you have created a connection to the CAS Server. (A
CasCrawlerobject namedcrawleris used in this example.)Set the name for the crawl by first instantiating a
CrawlIdobject and then setting its Id.For example:
// Create a new crawl Id with the name set to Demo. CrawlId crawlId = new CrawlId("Demo");Call the
CasCrawler.getCrawlConfig()method with the crawl ID and the default settings Boolean flag.For example:
CrawlConfig crawlConfig = crawler.getCrawlConfig(crawlId, true);
Process the returned
CrawlConfigaccording to the requirements of your application.
The
CasCrawler.getCrawlConfig() method throws a
CrawlNotFoundException if the specified crawl (the
crawlId parameter) does not exist or is
otherwise not found. To catch an exception, use a
try block with the appropriate
catch clause.
Note that for CMS crawls (which require a username and password), the
retrieved password will be returned as a
null value from the server.

