Oracle Internet File System Developer's Guide
Release 1.1

A75172-04

Library

Service

Contents

Index

Prev Next

5
Using Parsers

This chapter covers the following topics:

What Is a Parser?

A parser is a Java class that extracts attributes from a local file and stores the information in the repository. More specifically, in the case of a document, a parser:

Note: :

Significant improvements have been made to the XML parsing capabilities of Oracle iFS for version 1.1. For more information, see What's New in Oracle iFS 1.1.

Standard Oracle iFS Parsers vs. Custom Parsers

Whether your application requires a custom parser depends upon the format of the documents produced by the application:


Document Format Produced  Parser Options 

Standard XML documents 

Use the Oracle iFS standard XML parser out-of-the-box. 

Custom format 

Write a custom parser. 

Whether or not you must explicitly invoke a parser depends on how the documents produced by your application are entered into the Oracle Internet File System:

If your application defines a custom class that produces documents in a special format that is not XML, you will need to create a custom parser using the classes and methods provided as part of the Oracle iFS Java API. This custom parser will create Oracle iFS repository objects of your custom class. For example, assume you have defined a Memo class that subclasses the Document class. The Memo class includes the following custom attributes: To, From, Date, and Text (the content of the memo). To store Memo objects in Oracle iFS requires a parser. If the Memo documents are in XML, you can use the Oracle iFS SimpleXmlParser to extract the attributes. If the Memo documents are stored in a special format, you will need to create a custom parser and specify how it is to extract the attributes.

Using the Standard Parsers

Out-of-the-box, Oracle iFS includes several standard parsers that will meet most needs of developers creating new applications in Oracle iFS.

The following table lists the Oracle iFS standard parser classes.


Class  Description 

SimpleXmlParser 

Creates an object in the Oracle iFS repository from an XML document body. Used as the default parser for all XML documents stored in Oracle iFS. SimpleXmlParser extends XmlParser. 

XmlParser 

A base class for custom XML parser development. 

ClassSelectionParser 

Adds custom attributes to all files of a specified format. Performs no actual parsing. 

Parsing Options

To understand the parsing options provided by Oracle iFS, consider XML files in 3 categories:

XML files meant to configure Oracle iFS

XML files to be parsed

XML files to be stored without parsing

Using the SimpleXmlParser

If your customization requirements are minimal, you can define a custom class using XML to add custom attributes. Once you create the type definition file, including any custom attributes, the SimpleXmlParser automatically recognizes the custom attributes and parses them correctly.

When custom XML documents are added to Oracle iFS using any of the protocols or user interfaces, those documents are automatically parsed by the SimpleXmlParser, without any further custom coding.

Specifically, the SimpleXmlParser works as described above for FTP, SMB, the Windows interface, and the Oracle iFS Web interface using Upload via Drag and Drop. If you prefer to use the Oracle iFS Web interface Upload via Browse facility, you need to also click the checkbox for Parse File on Upload.

If your XML documents are added to Oracle iFS by an application, the application must explicitly invoke the SimpleXmlParser.

The ClassSelectionParser

The ClassSelectionParser is unique in that it does not perform any actual parsing. Rather, the ClassSelectionParser allows you to add one or more custom attributes to files with a specific file extension, such as all .doc files, before the files are stored in the repository. The ClassSelectionParser provides the mechanism for mapping a class to a specific file format. For more information, see "Using the ClassSelectionParser".

Using the ClassSelectionParser

Implementing a ClassSelectionParser is a three-step process:

  1. Create a Class Definition

  2. Register the Extension with the ClassSelectionParser

  3. Register the Class

Create a Class Definition

The first step in implementing a ClassSelectionParser is to create the custom class definition. This example defines a custom class for presentation slides, and describes one additional attribute, "NumberOfSlides," to be added to all files with the file extension .ppt.

<?xml version = '1.0' standalone = 'yes'?>
<!--Presentation.xml-->
<ClassObject>
  <Name>Presentation</Name>
   <Superclass RefType='Name'>Document</Superclass>
  <Description>Custom Class for Presentations</Description>
  <Attributes>
    <Attribute>
      <Name>NumberOfSlides</Name>
      <DataType>INTEGER</DataType>
    </Attribute>
  </Attributes>
</ClassObject>

Register the Extension with the ClassSelectionParser

Once the custom class has been created, associate the file extension (ppt) with the parser (ClassSelectionParser) by the usual registration process. You can register the parser using Oracle iFS Manager or XML.

<?xml version = '1.0' standalone = 'yes'?>
<!--RegisterPPTParser.xml-->
<PropertyBundle>
  <Update Reftype='ValueDefault'>ParserLookupByFileExtension</Update>
  <Properties>
    <Property Action = 'add'>
      <Name>ppt</Name>
      <Value Datatype='String'>
oracle.ifs.beans.parsers.ClassSelectionParser </Value> </Property> </Properties> </PropertyBundle>

Register the Class

Once the parser has been registered, you must register the custom class by adding an entry to the IFS.PARSER.ObjectTypeLookupByFileExtension PropertyBundle. Just as registering a parser requires adding an entry to a PropertyBundle, so registering a class also requires adding an entry to a PropertyBundle. In this case, the registration process associates the file extension (.ppt) with the custom class (Presentation). You only need to specify the actual class name, not the fully qualified path name.

Registering the class completes the process necessary to invoke the ClassSelectionParser. If this step is omitted, the class associated with the ClassSelectionParser defaults to Document; no parsing will occur.

<?xml version = '1.0' standalone = 'yes'?>
<!--RegisterPPTObjectType.xml-->
<PropertyBundle>
  <Update Reftype='ValueDefault'>IFS.PARSER.ObjectTypeLookupByFileExtension 
</Update>
  <Properties>
    <Property Action = 'Add'>
      <Name>ppt</Name>
      <Value Datatype='String'>Presentation</Value>
    </Property>
  </Properties>
</PropertyBundle>

How Does XML Parsing Work?

When you place an XML representation of a document in Oracle iFS, the SimpleXmlParser is called to create the document object. The following table provides an overview of how parsing an XML document works.


Step  Who  Does What   Result 

1. 

User 

Loads a local file using any iFS interface or supported protocol. 

MyDocument.xml is loaded into iFS.  

2. 

Interface or
Protocol 

Performs parser lookup based
on the file extension. 

If there is no corresponding parser, the document is simply stored "as is," with the content and attributes from Step 1.

Or, if there is a parser defined for the file extension, that parser is invoked. 

3. 

SimpleXmlParser 

Parses the XML file. 

Creates an object of the type defined in the XML file's <ClassName> tag:
- A new class, if the value is
ClassObject.
- An instance of an Oracle iFS standard
class, such as Document.
- An instance of a custom class. 

To illustrate this sequence, consider the following example.

  1. An end user drags a document instance, such as claim3.xml, into an Oracle iFS folder, /ifs/system/claims.

  2. SMB performs a parser lookup based on the file extension, .xml. Because this is an XML file, the parser lookup finds a match and invokes the SimpleXmlParser.

  3. Because the claim custom class definition file was previously stored in Oracle iFS, and because the XML file's Root Element has the same name as the name of the claim custom type, the SimpleXmlParser recognizes claim2.xml as an instance of claim.xml. The SimpleXmlParser parses claim2.xml, creating an object called claim2.

Using a Custom Parser

If you want to parse non-XML documents, such as .doc or .xls documents, you must write a custom parser to create database objects from these documents. To create a custom parser, you can either subclass an existing Oracle iFS parser or create a custom class from scratch, implementing the oracle.ifs.beans.parsers.Parser interface.

The Parser class creates one or more objects. In most cases, the Parser class is used to create the following objects:

A parser determines which type of object to create based on the InputStream or Reader object passed to it. If the InputStream or Reader describes more than one type of object, the parser can either:

Overview of a Parser Application

A parser application includes four components:

The following table describes each component.


Component  Description/Sample 

Application 

The application creates an instance of the parser required, then calls the parser, specifying the document representation (required), the name of the ParserCallback object (optional), and the Options object (optional). 

Parser 

The parser executes whatever custom code is needed to create the parsed object, then stores the parsed object in the repository. 

ParserCallback 

The application may optionally specify a ParserCallback object. The ParserCallback object's preOperation() or postOperation() methods specify additional processing that is executed before, after, or both before and after the parsing operation takes place. 

ParserLookupBy
FileExtension
PropertyBundle 

Oracle iFS looks up the name of the parser for this document class in the ParserLookupByFileExtension PropertyBundle. 

Writing a Parser Application

Writing a parser application include the following stages:

  1. Write the Parser Class

  2. Deploy the Parser

  3. Invoke the Parser (in the parser application)

  4. Write a ParserCallback (optional)

For a short parser example, see "Sample Code: A Custom Parser", in this chapter.

For more information about parsers, see the following classes in the Oracle iFS Javadoc:

Write the Parser Class

The purpose of a parser is to identify the properties in a file, and use the properties to create a database object.

When creating a custom parser, you can choose from two approaches:

Whichever approach you choose, writing a custom parser means implementing the Parser interface, either directly or indirectly. The Parser interface includes one overloaded method, parse(), which accepts two types of input:

Once the parse() method has been called, the balance of the code of the parser itself examines each line and places its content into the appropriate attribute of the object the parser is creating. The syntax and arguments for parse() are described below.

To write a custom parser, you must write two methods:

Write a Constructor

Every parser must implement the standard constructor for a parser. The standard constructor takes one parameter, as shown in the following table.


Parameter  Datatype  Description 

session 

LibrarySession 

The LibrarySession of the current user. 

Sample Code: A Constructor

public SimplestParser(LibrarySession lib) throws IfsException

Write a parse() Method

The following table describes the parameters of the parse() method.


Parameter  Datatype  Description 

stream 

InputStream 

An InputStream for the parser to read. Use an InputStream for data that is not character-based, such as audio and video data. 

reader 

Reader 

Alternatively, a Reader for the parser to read. A Reader should be used for character-based data. 

callback 

ParserCallback 

Optional parameter. May be null. If specified, the ParserCallback object includes methods that specify processing to be implemented before parsing, after parsing, or both.  

options 

Hashtable 

Optional parameter. May be null. If specified, the Options parameter further controls the behavior of the parser through a set of optional name/value pairs. Commonly used to specify character encoding. 

For sample code for the parse() method, see "Sample Code: A Custom Parser" in this chapter.

Overview of a Custom Parser

For a custom parser, see "Sample Code: A Custom Parser". This SimplestParser extracts the text between the <TITLE> tags of an HTML document and stores that information in a custom field. This requires that a subclass of Document, named CUSTOM with the the attribute TITLE, be registered on the server with the file extension .cus. Customizing the file extension is for illustration purposes, only; you could register the file extension as .htm to use a similar parser for real HTML documents.

This is a simplified example and does not take into consideration versioned documents, nor does it address issues concerning local character sets.

Sample Code: A Custom Parser

package simparser;

// These classes provide the building blocks for an iFS document.

import oracle.ifs.beans.Attribute;
import oracle.ifs.beans.Document;
import oracle.ifs.beans.DocumentDefinition;
import oracle.ifs.beans.Format;
import oracle.ifs.beans.LibraryObject;
import oracle.ifs.common.Collection;
import oracle.ifs.common.AttributeValue;

// These classes are used to instantiate a folder object to store the document.

import oracle.ifs.beans.Folder;
import oracle.ifs.beans.FolderPathResolver;

// These classes are used to obtain information about the user at runtime.

import oracle.ifs.beans.DirectoryUser;
import oracle.ifs.beans.PrimaryUserProfile;
import oracle.ifs.beans.LibrarySession;

// These classes are the base classes for creating a parser.

import oracle.ifs.beans.parsers.Parser;
import oracle.ifs.beans.parsers.ParserCallback;
import java.util.Hashtable;

// These are standard Java objects used to process the document content.

import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.BufferedReader;

// This class is used to report exceptions to iFS methods.

import oracle.ifs.common.IfsException;

public class SimplestParser implements Parser
{
  private String title;
  private LibrarySession m_librarySession;
  private Document newDoc;
  private Folder currentFolder;
  private Folder homeFolder;

// The constructor argument captures the current library session, which is
// used to pass information about the user and environment at runtime.

  public SimplestParser(LibrarySession lib) throws IfsException
  {
    m_librarySession = lib;
  }

/* This parser is called by the host protocol at runtime, passing a Reader
 * object with the contents of the document being parsed. The callback is
 * an optional argument that enables the parser to respond to the calling
 * method. The Hashtable is used to store three key parameters:
 * CURRENT_PATH_OPTION: the current working directory.
 * CURRENT_NAME_OPTION: the name of the file being parsed.
 * UPDATE_OBJECT_OPTION: indicates if the document being parsed is
 *                       replacing an object that already exists.
 */
  public LibraryObject parse(Reader htmlStream, ParserCallback callback,
        Hashtable options) throws IfsException
  {
  try
  {

/*  Instantiate a FolderPathResolver, then pass it the CURRENT_PATH_OPTION as
 *  a string. Set the currentFolder variable to the path where the document
 *  to be parsed was inserted.
 */

    FolderPathResolver fpr = new FolderPathResolver(m_librarySession);
    fpr.setRelativePath(options.get(CURRENT_PATH_OPTION).toString());
    currentFolder = fpr.getCurrentDirectory();

/*  Instantiate the string variable documentContent.
 *  Instantiate a BufferedReader object named dataStream and populate it
 *  with the document content passed in to the method.
 */

    String documentContent = "";
    BufferedReader dataStream =
                           new BufferedReader(htmlStream);

// Read the buffered data into the documentContent variable one line at a time.

    for (String line = dataStream.readLine();line != null;
                           line = dataStream.readLine())
    {
      documentContent = documentContent + line + "\n";
    }

//  Send the resulting string to the parseTitle method to extract the title.

    String docTitle = parseTitle(documentContent);

//  Instantiate a DocumentDefinition object.

    DocumentDefinition docDef = new DocumentDefinition(m_librarySession);

//  Instantiate a Collection object and populate it with the list of
//  format extensions. Set the format in the document definition.

    Collection allFormats = m_librarySession.getFormatExtensionCollection();
    docDef.setFormat((Format) allFormats.getItems("cus"));

//  The Classname is the name of the subclass we've defined (CUSTOM).

    docDef.setClassname("Customer");

//  Set the Name attribute in the document definition to the variable passed
//  to the parser in the options Hashtable.

    docDef.setAttribute("NAME", AttributeValue.newAttributeValue
                    (options.get(CURRENT_NAME_OPTION)));

//  Set the custom attribute "TITLE" to the docTitle variable returned by the
//  parseTitle method.

    docDef.setAttribute("TITLE", AttributeValue.newAttributeValue(docTitle));

//  Set the content of the document to the String documentContent.

    docDef.setContent(documentContent);

//  Instantiate a new Document using the DocumentDefinition just defined.

    Document newDoc = (Document) m_librarySession.createPublicObject(docDef);

/*  Check to see if the UPDATE_OBJECT_OPTION variable is set. If so, update
 *  the document (update). If not, create a new document (addItem).
*/
    if(options.get(UPDATE_OBJECT_OPTION) != null)
    {
      Document currentDoc = (Document) currentFolder.findPublicObjectByPath
                      (docDef.getAttribute("NAME").toString());
      currentDoc.update(docDef);
    }
    else
    {
      currentFolder.addItem(newDoc);
    }
  }

// Catch any exceptions. Set VerboseMessage to true to get a more complete
// report of the methods that threw the exception.

  catch (IfsException ifsExceptionCaught)
  {
    ifsExceptionCaught.setVerboseMessage(true);
    ifsExceptionCaught.printStackTrace();
  }
  catch (Exception exceptionCaught)
  {
    exceptionCaught.printStackTrace();
  }
  return newDoc;
  }

/* parse method called when the protocol sends the file content as an
 * InputStream. This method converts the InputStream to a BufferedReader
 * and forwards it to the first parse method (keeps code concise).
*/

  public LibraryObject parse(InputStream htmlStream, ParserCallback callback,
                             Hashtable options)
  {

// Convert the InputStream htmlStream to the BufferedReader named redirect.

    BufferedReader redirect =
        new BufferedReader(new InputStreamReader(htmlStream));

// Send the resulting BufferedReader to the first parse method.

    try {
      Document newDoc = (Document) parse(redirect,callback,options);
    }

// Catch and report (in verbose mode) any exceptions.

    catch (IfsException ifsExceptionCaught)
    {
      ifsExceptionCaught.setVerboseMessage(true);
      ifsExceptionCaught.printStackTrace();
    }
    catch (Exception exceptionCaught)
    {
      exceptionCaught.printStackTrace();
    }
    return newDoc;
  }

/*  This is the actual custom parsing routine. It searches the text String
 *  for the tag <TITLE>, starts at the 7th character (the length of the
 *  <TITLE> tag and extracts a substring of all the information through
 *  the last character before the </TITLE> tag.
 */

  private String parseTitle (String parseString){
    try
    {
      title = parseString.substring((parseString.indexOf("<TITLE>")+ 7),
                  parseString.indexOf("</TITLE>"));
    }
    catch (Exception e)
    {
      title = "Untitled";
      e.printStackTrace();
    }
    return title;
  }
}

 

Deploy the Parser

For the protocol servers and other standard Oracle iFS components to access your custom parser, the folder tree containing the class for the parser must reside in the Oracle iFS CLASSPATH. Oracle iFS includes a special directory for this purpose. This directory, called custom_classes, is already in the CLASSPATH environment variable that the Oracle iFS server software uses.

To deploy a parser:

  1. Compile the parser, creating a .class file.

  2. Place the folder tree that contains the resulting .class file in the directory $ORACLE_HOME/ifs/custom_classes on the server where Oracle iFS is installed.



    Note:

    The compiled Java code must be copied to the native file system of the server, not to the Oracle iFS repository. 


Register the Parser

The purpose of registering a parser is to map a certain file extension to a specific parser. Once this mapping is created, whenever a file with that extension is imported by an Oracle iFS client or protocol, the file will be passed to the custom parser before it is stored in the repository. You can register a parser in either of two ways:


Facility  Advantages/Restrictions 

Oracle iFS Manager 

Use Oracle iFS Manager for simplicity and ease-of-use. Using Oracle iFS Manager, you can only register a parser that exists on the same instance of Oracle iFS as the Oracle iFS Manager facility. 

XML 

Use XML if you prefer to register a parser using a script, or if you need to deploy the parser on a separate Oracle iFS instance. 

Each registered parser has two attributes:


Attribute  Datatype  Description  Example 

Extension 

String 

File extension. 

cus 

ClassName 

String 

Fully qualified classname of the parser. 

ifs.demo.SimplestParser.parser.
SimplestParser
 

The underlying mechanism for storing the mappings between file extensions and parsers is a PropertyBundle object called "ParserLookupByFileExtension." A PropertyBundle is a list of name/value pairs stored as an array of Property objects. Each Property object stores the mapping between a file extension and a parser as a Name/Value pair:

Registering a Parser Using Oracle iFS Manager

To register a parser using Oracle iFS Manager, follow these steps:

  1. From the Oracle iFS Manager Object menu, choose Register.

  2. From the Select Object Type window, choose Parser Lookup.

  3. From the Parser Lookup Registry window, choose Add.

  4. In the Parser Lookup Entry window, fill in the text boxes for the attributes.

  5. Click OK.

Registering a Parser Using XML

To register a parser using XML, write an XML file to add a new Property object to the ParserLookupByFileExtension PropertyBundle, specifying the file extension and class name of the parser.

<?xml version="1.0" standalone="yes"?>
<!--SimplestParser.xml-->
<PROPERTYBUNDLE>
   <UPDATE RefType="valuedefault">ParserLookupByFileExtension</UPDATE>
   <PROPERTIES>
      <PROPERTY ACTION="add">
         <NAME>po</NAME>
         <VALUE 
DataType="String">ifs.demo.SimplestParser.parser.SimplestParser</VALUE>
      </PROPERTY>
   </PROPERTIES>
</PROPERTYBUNDLE>

Invoke the Parser

When an application program inserts content into the repository, the application is responsible for invoking the appropriate parser, either a standard Oracle iFS parser or a custom parser. (The protocols automatically call an Oracle iFS parser when they are used to insert documents into the repository.)

In order to parse a document, an application must:

Write a ParserCallback

When a custom application calls a parser, the application may, optionally, pass in a ParserCallback object. A ParserCallback allows an application to provide additional processing before or after the actual parsing. The parser must, therefore, check to see if this optional parameter has been passed in.

The ParserCallback interface specifies three methods that allow an application to interact with a parser:

The preOperation() Method


The application can use preOperation() to alter the LibraryObjectDefinition before the parser uses it to update the repository, in the following ways:

Sample Code: preOperation()

public LibraryObjectDefinition preOperation (LibraryObject lo,
LibraryObjectDefinition def)
throws IfsException

The postOperation() method


The application can use postOperation() to access the repository object that was created or updated by the parser, or to perform operations after the creation of a LibraryObject.

Parameter Name  Datatype  Description 

lo 

LibraryObject 

The LibraryObject that was created or updated by the parse operation.  

Sample Code: postOperation()

public void postOperation (LibraryObject lo)
throws IfsException

The signalException() method


The application can implement signalException() to intercept any exceptions that occur during parsing. The options are:

Sample Code: signalException()

public void signalException(IfsException e)
throws IfsException

Sample Code: ParserCallback Implementation

The following sample code provides a brief example of implementing the ParserCallback interface:


Prev Next
Oracle
Copyright © 2000 Oracle Corporation.

All Rights Reserved.

Library

Service

Contents

Index