Implementing Content Crawler Click-Through

The content crawler's click-through implementation must return content in a readable format and set the content type and file name using the appropriate headers.

The following example uses a file, but the crawled resource could be any type of content. If the content is not in a file, the click-through servlet should create a representation with as little extraneous information as possible in a temporary file (for example, for a database, you would retrieve the record and transform it to HTML). See Creating Temporary Files for Indexing. You can also use the Oracle WebCenter Interaction Development Kit (IDK) DocFetch mechanism to handle indexing and click-through; see Implementing Content Crawler DocFetch.

Create the clickThroughServlet, and add a mapping in web.xml.
Complete the implementation of IDocument.getMetaData. Set the ClickThoughURL value to an URL constructed using the following steps:
1. Construct the base URL of the application using the same approach as in the index servlet.
2. Add the servlet mapping to the clickThroughServlet.
3. Add any query string parameters required to access the document from the clickThroughServlet (or aspx page). Remember: The click-through page will have access to Content Source parameters (as administrative preferences), but no access to content crawler settings.
To authenticate to the back-end resource, you can use basic authentication, User Preferences, User Info, or credentials from the Content Source. Below are suggestions for each; security will need to be tailored to your content crawler
- Use Basic Authentication to use the same credentials used to log in to the portal. For example, if the portal uses AD credentials, Basic Auth could be used to access NT files.
- Use (encrypted) User Preferences if the authentication source is different from the one used to log in to the portal. For example, if the portal log in uses IPlanet, but you need to access an NT or Documentum file.
- Use (encrypted) User Info if the encrypted credentials are stored in another profile source and imported using a profile job.
- Use Content Source credentials when there a limited connections, for example with a database.
Extract the parameters from the query string as required.

Display the page.

If there is already an HTML representation of the page, authenticate to the page. If the site is using basic authentication and you are using basic authentication headers, simply redirect to that page. If the site is using basic authentication and you are not using basic authentication, users must log in unless the site and the portal are using the same SSO solution. If the site is using form-based authentication, post to the site and follow the redirect.

If there is not an HTML representation of the page, retrieve the resource and stream it out to the client as shown in the sample code below (Java). If you use a temporary file, put the code in a try-catch-finally block, and delete the file in the finally block.

//get the content type, passed as a query string parameter
String contentType = request.getParameter('contentType')

//if this is a file, get the file name 
String filename = request.getParameter('filename');

//set the content type on the response 
response.setContentType(contentType);

//set the content disposition header to tell the browser the file name
response.setHeader('Content-Disposition', 'inline; filename=' + filename);

//set the header that tells the gateway to stream this through the gateway
response.setHeader('PTGW-Streaming', 'Yes');

//get the content - for a file, get a file input stream based on the path (shown below)
//other repositories may simply provide an input stream
//NOTE: this code contains no error checking
String filePath = request.getParameter('filePath');
File file = new File(filePath);
FileInputStream fileStream = new FileInputStream(file);

//create a byte buffer for reading the file in 40k chunks
int BUFFER_SIZE = 40 * 1024;
byte[] buf = new byte[BUFFER_SIZE];

//start reading the file 
int bytesRead = fileStream.read(buf);
ServletOutputStream out = response.getOutputStream();

//start writing out the body 
out.write(buf, 0, bytesRead);

//continue writing until the input stream returns -1
while ((bytesRead = fileStream.read(buf)) != -1
{
    out.write(buf, 0, bytesRead);
}

Parent topic: About Content Crawler Click-Through

Oracle WebCenter Interaction Web Service Development Guide

Implementing Content Crawler Click-Through