The CAS Server API allows users to build client programs that invoke the CAS Server to programmatically modify and control a variety of file system and CMS crawling operations.
This topic describes the contents of the CAS Server Java Client directory.
The CAS Server Java Client (in the /sample
directory) has the following directory structure:
/cas-server-java-client /lib /src .classpath .project build.xml
The contents are as follows:
lib
– Contains the Java libraries for the CAS Server Java client application.src
– Contains the Java source file for the CAS Server Java Client application..project
– The Eclipse project file for the recordstore-java-client project.build.xml
– The Ant build file for the Record Store Java client application.
The CAS Server Java Client (as coded in the
CasServerSampleClient.java
source file) demonstrates a
number of basic crawling operations.
The CAS Server Java Client is intended to provide a working example of a client that communicates with a running CAS Server and issues file system crawling requests. The sample client program is therefore a template that you can use as a basis for your own client program.
The package includes all the libraries needed to build clients. It also
includes an Ant build script (that can compile and run the sample program) as
well as Eclipse
.project
and
.classpath
files for the sample client.
The sample client application performs the following actions:
Creates a new file system crawl (named SampleClientTestCrawl), with the current working directory of the sample client (
.\
on Windows or./
on UNIX) as the seed.Updates the crawl configuration by adding file filters and enabling document conversion.
Runs a second full crawl, this time using the new filters and extracting text from documents.
Note that a default time limit of 10 seconds is set on both crawls, which means that in most cases the crawl output will not contain all the files on your file system.
The output files are written to the
workspace/output/SampleClientTestCrawl
directory,
using a non-compressed XML file format. You can use a text editor to view the
contents of the output.
The Ant build.xml
file can compile and run the sample client program.
As with any Ant build file, you can run a specific target or run them all at once. Before starting Eclipse, you should run at least the compile
target so that Eclipse can find the generated Web service stubs.
The file has the following targets:
To run the Ant build script:
From a command prompt, navigate to the
cas-server-java-client
directory and issue the following command to compile and run the sample client demo:ant run-demo [--host <
host name
>] [--port <port number
>]Note
You can issue the
ant compile
command if you just want to compile (but not run) the sample client program.
The demo file system crawl (named SampleClientTestCrawl
) will use C:\
on Windows and /
on UNIX as the seed. When the demo crawl finishes, the CAS Service's workspace/output/SampleClientTestCrawl
directory should contain two XML-format output files: CrawlerOutput-FULL.xml
will have the content of the second crawl (i.e., the updated crawl with file filters), while the timestamped file in the archive
directory will have the content from the first crawl.
If you use Eclipse for your projects, the sample client package includes Eclipse .project
and .classpath
files.
To load the sample client project:
Assuming that you have opened the
CasServerSampleClient.java
source in Eclipse or another
editor, you should note certain important operations of the
Main
class.
The values for the host and port of the CAS Service are set by first reading the
commandline.properties
file. If they do not exist, defaults oflocalhost
and8500
are used.String host = System.getproperty(CAS_HOST_PROPERTY); if (host == null || "".equals(host)) { host = "localhost"; } if (port == null || "".equals(port)) { port = "8500"; }
Arguments are created for the WSDL URL (the service definition interface) and the QName.
final URL wsdlUrl = new URL("http://" + host + ":" + port + "/cas?wsdl"); final QName name = new QName("http://endeca.com/itl/cas/2011-12", "CasCrawlerService");
Using the WSDL URL and QName values, create a Web service locator and then use the
CasCrawlerService.getCasCrawlerPort()
method to get a handle to the CAS Service port.CasCrawlerService service = new CasCrawlerService(wsdlUrl, name); CasCrawler crawler = service.getCasCrawlerPort();
Using a
CrawlId
object, set the name of the crawl.CrawlId crawlId = new CrawlId(); crawlId.setId("SampleClientTestCrawl");
Using the
sampleCreateCrawl
method, create the new file system crawl. Text extraction is not enabled, which means that a probe crawl will be run. Note that theCasCrawler.createCrawl()
method actually creates the crawl.System.out.println("Creating Crawl with CrawlId '" + crawlId.getId() + "' ..."); sampleCreateCrawl(crawler, crawlId);
Using the
sampleRunFullCrawl
method, run the probe crawl, specifying a maximum of 10 seconds for the crawl duration. TheCasCrawler.startCrawl()
method is used to actually start the crawl, and then theCasCrawler.stopCrawl()
method is used to stop crawl after 10 seconds has elapsed.System.out.println("Running probe crawl..."); sampleRunFullCrawl(crawler, crawlId, 10);
Using the
sampleUpdateCrawlAddingFiltersAndTextExtraction
method, enable text extraction and set wildcard (htm*
) filters that are evaluated against theEndeca.FileSystem.Extension
record property. The original crawl configuration is retrieved with theCasCrawler.getCrawlConfig()
method and the updated configuration is sent to the CAS Server with theCasCrawler.updateConfig()
method.System.out.println("Adding filters and enabling text extraction..."); sampleUpdateCrawlAddingFiltersAndTextExtraction(crawler, crawlId);
Using the
sampleRunFullCrawl
method, run a second full crawl that does text extraction and uses the added filters. As with the previous crawl, a maximum of 10 seconds is specified for the crawl duration.System.out.println("Running full crawl..."); sampleRunFullCrawl(crawler, crawlId, 10);
Using the
sampleDeleteCrawl
method, delete the SampleClientTestCrawl demo crawl. Note that the class uses theCasCrawler.deleteCrawl()
method to actually delete the crawl.System.out.println("Deleting crawl..."); sampleDeleteCrawl(crawler, crawlId);
The sample client program also shows the use of other CAS Server API
functions, such as the
CasCrawler.listCrawls()
,
CasCrawler.getStatus()
and
CasCrawler.getMetrics()
methods.
You can modify the file and add other crawling operations, such as changing the output options (to send output to a Record Store instance), adding other types of filters (including date and regex filters), enabling archive expansion, and even returning information about the CAS Server. You can also use the sample code as a basis for creating and running CMS crawls.
These operations, and the methods that call them, are described elsewhere in this guide.