Developing Portlets and Integration Web Services: Crawlers and Search Services  

Deploying Custom Crawlers

After coding your crawler as explained on the previous pages, you must deploy your code. For details on configuring the necessary objects in the portal, see the next page (Configuring Custom Crawlers).

Deploying Crawler Code

The deployment process is different for Java and .NET. This page summarizes the deployment process for crawlers. For detailed information on deployment, see Deploying Web Services.

Java

Follow the instructions below to deploy Java crawlers.

  1. Compile the class that implements the EDK interface and copy the entire package structure to the appropriate location in your Web application (usually the \WEB-INF\classes directory).

  2. Update the web.xml file in the WEB-INF directory by adding the class to the appropriate *Impl keys. For example, for a crawler, add your class to ContainerProviderImpl and DocumentProviderImpl as shown below.

...
<env-entry>
<env-entry-name>ContainerProviderImpl</env-entry-name>
<env-entry-value>com.plumtree.remote.crawler.helloworld.CrawlContainer</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>

<env-entry>
<env-entry-name>DocumentProviderImpl</env-entry-name>
<env-entry-value>com.plumtree.remote.crawler.helloworld.CrawlDocument</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
...

Note:The *Impl key in the web.xml file must reference the fully-qualified name of the class.
For crawlers, you must enter the name of both provider classes required by the service. If the service uses SCI, you must also enter the fully-qualified name of the appropriate implementation of the IAdminEditor interface.

  1. Start your application server. (In most cases, you must restart your application server after copying a file.)

  2. Test the directory by opening the following page in a Web browser: http://<hostname:port>/edk/services/<servicetype>ProviderSoapBinding (e.g.,
    http://localhost:8080/edk/ContainerProviderSoapBinding and http://localhost:8080/edk/DocumentProviderSoapBinding. The browser should display the following message: "Hi there, this is an AXIS service! Perhaps there will be a form for invoking the service here..."

  3. Configure the new service in the portal:

    1. Open the portal in a browser and navigate to the Administration folder where you want the new service to be stored.

    2. Click Create Object... and choose the appropriate Web Service type from the menu.

    3. On the Main Page of the Web Service Editor, enter the service provider URLs from Step 4. For SCI pages, enter the Service Configuration page URL(s) on the Advanced URLs page of the Web Service Editor. (URLs are relative to the Remote Server.)

  4. If the service uses DocFetch, see Deploying DocFetch Code below.

.NET

Add a line to the deployment file (web.config) that specifies the fully qualified name of the class. For a crawler, you must enter values for these parameters, as shown in the code that follows:

o ContainerProviderImpl

o DocumentProviderImpl

o ContainerProviderAssembly

o DocumentProviderAssembly

...
<appSettings>
<add key="ContainerProviderAssembly" value="CompanyStoreCWS"/>
<add key="ContainerProviderImpl" value="Plumtree.CompanyStore.CWS.CompanyStoreContainer"/>
<add key="DocumentProviderAssembly" value="CompanyStoreCWS"/>
<add key="DocumentProviderImpl" value="Plumtree.CompanyStore.CWS.CompanyStoreDocument"/>
...

In addition, you must deploy DocFetch and SCI if they are used by your crawler.

Deploying DocFetch Code

Because the DocFetch mechanism uses a separately deployed servlet or server page to handle click-through requests, deployment is slightly different and requires some extra steps. (In many cases, DocFetch is not used; for details, see Accessing Secured Content.)

Java

Deploy the associated crawler as explained above. Open the WEB-INF\web.xml file and add the fully-qualified name of your class in the DocFetchProvider initialization parameter, as shown in the code that follows:

...
<servlet>
<servlet-name>DocFetch</servlet-name>
<servlet-class>com.plumtree.remote.docfetch.DocFetch</servlet-class>

<!-- Modify the param-value below to reference your class -->
<init-param>
<param-name>DocFetchProvider</param-name>
<param-value>com.mycompany.MyDocFetchProvider</param-value>
</init-param>

</servlet>
...

.NET

Add a line to the deployment file (web.config) that specifies the fully qualified name of your class and the associated assembly (DocFetchImpl and DocFetchAssembly).

You must also add three additional parameters to the web.config deployment descriptor:

...
<appSettings>
<add key="DocFetchAssembly" value="MyDocFetch" />
<add key="DocFetchImpl" value="com.mycompany.MyDocFetchProvider" />
<add key="DocFetchURL" value="iis/docfetch.aspx"/>
<add key="IndexFilePath" value="D:\\root\\config\\mydomain"/>
<add key="IndexURLPrefix" value="http://yourhost/IISVirtualDirectory"/>
...

Next: Configuring Custom Crawlers