Call the
CasCrawler.getCrawlConfig()
method to retrieve the
configuration settings of a crawl.
The syntax of the method is:
CasCrawler.getCrawlConfig(CrawlId crawlId, Boolean fillInDefaults)
Where:
crawlId
is aCrawlId
object that contains the name of the crawl for which the configuration is to be returned.fillInDefaults
is a Boolean flag that, if set totrue
, fills in the default value for any setting that has not been specified. If a setting is a password,true
returns the name but not the value. If the flag is set tofalse
, it does not modify the value for any setting.
If you retrieve a crawl configuration that contains a
ModuleProperty
for a password property, the crawl
configuration retrieves the value as a zero length list.
The method returns a
CrawlConfig
object, which contains the following:
sourceConfig - a
SourceConfig
object that contains the seeds, filters, and specific information about the systems from which content is fetched, such as CMS information or whether file properties from the native file system should be gathered for file system crawls.manipulatorConfig - a list of
ManipulatorConfig
objects. Each ManipulatorConfig specifies a manipulation that is performed in a particular crawl.textExtractionConfig - a
TextExtractionConfig
object that contains the text extraction options, such as whether text extraction should be enabled and the number of retry attempts.outputConfig - an
OutputConfig
object that contains the output options, such as whether the records are written to a Record Store instance or a record output file, the path of the output directory and the output format (binary or XML).crawlthreads - a property indicating the number of threads per crawl.
To get the configuration settings of a crawl:
Make sure that you have created a connection to the CAS Server. (A
CasCrawler
object namedcrawler
is used in this example.)Set the name for the crawl by first instantiating a
CrawlId
object and then setting its Id.For example:
// Create a new crawl Id with the name set to Demo. CrawlId crawlId = new CrawlId("Demo");
Call the
CasCrawler.getCrawlConfig()
method with the crawl ID and the default settings Boolean flag.For example:
CrawlConfig crawlConfig = crawler.getCrawlConfig(crawlId, true);
Process the returned
CrawlConfig
according to the requirements of your application.
The
CasCrawler.getCrawlConfig()
method throws a
CrawlNotFoundException
if the specified crawl (the
crawlId
parameter) does not exist or is
otherwise not found. To catch an exception, use a
try
block with the appropriate
catch
clause.
Note that for CMS crawls (which require a username and password), the
retrieved password will be returned as a
null
value from the server.