Sun Java System Web Server 7.0 Update 7 Administrator's Guide

Unsupported Formats

Following file formats are not supported for searching and indexing.

  1. MSWORD.



However customers can still download the relevant filter plugins from nutch 1.0 and place them in appropriate directories. But this is not supported.

Here are the steps:

  1. Download parse-msexcel, parse-msword,parse-mspowerpoint copy into <INSTALL_ROOT>/lib/htmlconvert/plugins directory and update the htmlconvert/conf/nutch-site.xml to include these plugins.


  2. Download lib-jakarta-poi from nutch1.0 and replace the existing libraries.