Oracle Commerce Guided Search - Retrieving crawl configurations

Retrieving crawl configurations

Call the CasCrawler.getCrawlConfig() method to retrieve the configuration settings of a crawl.

The syntax of the method is:

CasCrawler.getCrawlConfig(CrawlId crawlId, Boolean fillInDefaults)

crawlId is a CrawlId object that contains the name of the crawl for which the configuration is to be returned.
fillInDefaults is a Boolean flag that, if set to true, fills in the default value for any setting that has not been specified. If a setting is a password, truereturns the name but not the value. If the flag is set to false, it does not modify the value for any setting.

If you retrieve a crawl configuration that contains a ModuleProperty for a password property, the crawl configuration retrieves the value as a zero length list.

The method returns a CrawlConfig object, which contains the following:

sourceConfig - a SourceConfig object that contains the seeds, filters, and specific information about the systems from which content is fetched, such as CMS information or whether file properties from the native file system should be gathered for file system crawls.
manipulatorConfig - a list of ManipulatorConfig objects. Each ManipulatorConfig specifies a manipulation that is performed in a particular crawl.
textExtractionConfig - a TextExtractionConfig object that contains the text extraction options, such as whether text extraction should be enabled and the number of retry attempts.
outputConfig - an OutputConfig object that contains the output options, such as whether the records are written to a Record Store instance or a record output file, the path of the output directory and the output format (binary or XML).
crawlthreads - a property indicating the number of threads per crawl.
loggingLevel - a property indicating the logging level.

To get the configuration settings of a crawl:

Make sure that you have created a connection to the CAS Server. (A CasCrawler object named crawler is used in this example.)
Set the name for the crawl by first instantiating a CrawlId object and then setting its Id.
For example:
```
// Create a new crawl Id with the name set to Demo.
CrawlId crawlId = new CrawlId("Demo");
```
Call the CasCrawler.getCrawlConfig() method with the crawl ID and the default settings Boolean flag.
For example:
```
CrawlConfig crawlConfig = crawler.getCrawlConfig(crawlId, true);
```
Process the returned CrawlConfig according to the requirements of your application.

The CasCrawler.getCrawlConfig() method throws a CrawlNotFoundException if the specified crawl (the crawlId parameter) does not exist or is otherwise not found. To catch an exception, use a try block with the appropriate catch clause.

Note that for CMS crawls (which require a username and password), the retrieved password will be returned as a null value from the server.

Copyright © Legal Notices