The CAS Server API allows users to build client programs that invoke the CAS Server to programmatically modify and control a variety of file system and CMS crawling operations.

The CAS Server Java Client (as coded in the CasServerSampleClient.java source file) demonstrates a number of basic crawling operations.

The CAS Server Java Client is intended to provide a working example of a client that communicates with a running CAS Server and issues file system crawling requests. The sample client program is therefore a template that you can use as a basis for your own client program.

The package includes all the libraries needed to build clients. It also includes an Ant build script (that can compile and run the sample program) as well as Eclipse .project and .classpath files for the sample client.

The sample client application performs the following actions:

Note that a default time limit of 10 seconds is set on both crawls, which means that in most cases the crawl output will not contain all the files on your file system.

The output files are written to the workspace/output/SampleClientTestCrawl directory, using a non-compressed XML file format. You can use a text editor to view the contents of the output.

The Ant build.xml file can compile and run the sample client program.

As with any Ant build file, you can run a specific target or run them all at once. Before starting Eclipse, you should run at least the compile target so that Eclipse can find the generated Web service stubs.

The file has the following targets:

To run the Ant build script:

The demo file system crawl (named SampleClientTestCrawl) will use C:\ on Windows and / on UNIX as the seed. When the demo crawl finishes, the CAS Service's workspace/output/SampleClientTestCrawl directory should contain two XML-format output files: CrawlerOutput-FULL.xml will have the content of the second crawl (i.e., the updated crawl with file filters), while the timestamped file in the archive directory will have the content from the first crawl.

Assuming that you have opened the CasServerSampleClient.java source in Eclipse or another editor, you should note certain important operations of the Main class.

  1. The values for the host and port of the CAS Service are set by first reading the commandline.properties file. If they do not exist, defaults of localhost and 8500 are used.

    String host = System.getproperty(CAS_HOST_PROPERTY);
    if (host == null || "".equals(host)) {
    			host = "localhost";
    		}
    		if (port == null || "".equals(port)) {
    			port = "8500";
    		}
  2. Arguments are created for the WSDL URL (the service definition interface) and the QName.

    final URL wsdlUrl = new URL("http://" + host + ":" + port + "/cas?wsdl");
    final QName name = new QName("http://endeca.com/itl/cas/2011-12", "CasCrawlerService");
  3. Using the WSDL URL and QName values, create a Web service locator and then use the CasCrawlerService.getCasCrawlerPort() method to get a handle to the CAS Service port.

    CasCrawlerService service =  new CasCrawlerService(wsdlUrl, name);
    CasCrawler crawler = service.getCasCrawlerPort();
  4. Using a CrawlId object, set the name of the crawl.

    CrawlId crawlId = new CrawlId();
    crawlId.setId("SampleClientTestCrawl");
  5. Using the sampleCreateCrawl method, create the new file system crawl. Text extraction is not enabled, which means that a probe crawl will be run. Note that the CasCrawler.createCrawl() method actually creates the crawl.

    System.out.println("Creating Crawl with CrawlId '" + crawlId.getId() + "' ...");
    sampleCreateCrawl(crawler, crawlId);
  6. Using the sampleRunFullCrawl method, run the probe crawl, specifying a maximum of 10 seconds for the crawl duration. The CasCrawler.startCrawl() method is used to actually start the crawl, and then the CasCrawler.stopCrawl() method is used to stop crawl after 10 seconds has elapsed.

    System.out.println("Running probe crawl...");
    sampleRunFullCrawl(crawler, crawlId, 10);
  7. Using the sampleUpdateCrawlAddingFiltersAndTextExtraction method, enable text extraction and set wildcard (htm*) filters that are evaluated against the Endeca.FileSystem.Extension record property. The original crawl configuration is retrieved with the CasCrawler.getCrawlConfig() method and the updated configuration is sent to the CAS Server with the CasCrawler.updateConfig() method.

    System.out.println("Adding filters and enabling text extraction...");
    sampleUpdateCrawlAddingFiltersAndTextExtraction(crawler, crawlId);
  8. Using the sampleRunFullCrawl method, run a second full crawl that does text extraction and uses the added filters. As with the previous crawl, a maximum of 10 seconds is specified for the crawl duration.

    System.out.println("Running full crawl...");
    sampleRunFullCrawl(crawler, crawlId, 10);
  9. Using the sampleDeleteCrawl method, delete the SampleClientTestCrawl demo crawl. Note that the class uses the CasCrawler.deleteCrawl() method to actually delete the crawl.

    System.out.println("Deleting crawl...");
    sampleDeleteCrawl(crawler, crawlId);

The sample client program also shows the use of other CAS Server API functions, such as the CasCrawler.listCrawls(), CasCrawler.getStatus() and CasCrawler.getMetrics() methods.

You can modify the file and add other crawling operations, such as changing the output options (to send output to a Record Store instance), adding other types of filters (including date and regex filters), enabling archive expansion, and even returning information about the CAS Server. You can also use the sample code as a basis for creating and running CMS crawls.

These operations, and the methods that call them, are described elsewhere in this guide.


Copyright © Legal Notices