Oracle Internet File System Developer's Guide Release 1.1 A75172-04 |
|
This chapter covers the following topics:
A parser is a Java class that extracts attributes from a local file and stores the information in the repository. More specifically, in the case of a document, a parser:
Significant improvements have been made to the XML parsing capabilities of Oracle iFS for version 1.1. For more information, see What's New in Oracle iFS 1.1.
Whether your application requires a custom parser depends upon the format of the documents produced by the application:
Document Format Produced | Parser Options |
---|---|
Standard XML documents |
Use the Oracle iFS standard XML parser out-of-the-box. |
Custom format |
Write a custom parser. |
Whether or not you must explicitly invoke a parser depends on how the documents produced by your application are entered into the Oracle Internet File System:
If your application defines a custom class that produces documents in a special format that is not XML, you will need to create a custom parser using the classes and methods provided as part of the Oracle iFS Java API. This custom parser will create Oracle iFS repository objects of your custom class. For example, assume you have defined a Memo class that subclasses the Document class. The Memo class includes the following custom attributes: To, From, Date, and Text (the content of the memo). To store Memo objects in Oracle iFS requires a parser. If the Memo documents are in XML, you can use the Oracle iFS SimpleXmlParser to extract the attributes. If the Memo documents are stored in a special format, you will need to create a custom parser and specify how it is to extract the attributes.
Out-of-the-box, Oracle iFS includes several standard parsers that will meet most needs of developers creating new applications in Oracle iFS.
The following table lists the Oracle iFS standard parser classes.
To understand the parsing options provided by Oracle iFS, consider XML files in 3 categories:
XML files meant to configure Oracle iFS
XML files to be parsed
XML files to be stored without parsing
If your customization requirements are minimal, you can define a custom class using XML to add custom attributes. Once you create the type definition file, including any custom attributes, the SimpleXmlParser automatically recognizes the custom attributes and parses them correctly.
When custom XML documents are added to Oracle iFS using any of the protocols or user interfaces, those documents are automatically parsed by the SimpleXmlParser, without any further custom coding.
Specifically, the SimpleXmlParser works as described above for FTP, SMB, the Windows interface, and the Oracle iFS Web interface using Upload via Drag and Drop
. If you prefer to use the Oracle iFS Web interface Upload via Browse
facility, you need to also click the checkbox for Parse File on Upload
.
If your XML documents are added to Oracle iFS by an application, the application must explicitly invoke the SimpleXmlParser.
The ClassSelectionParser is unique in that it does not perform any actual parsing. Rather, the ClassSelectionParser allows you to add one or more custom attributes to files with a specific file extension, such as all .doc
files, before the files are stored in the repository. The ClassSelectionParser provides the mechanism for mapping a class to a specific file format. For more information, see "Using the ClassSelectionParser".
Implementing a ClassSelectionParser is a three-step process:
The first step in implementing a ClassSelectionParser is to create the custom class definition. This example defines a custom class for presentation slides, and describes one additional attribute, "NumberOfSlides," to be added to all files with the file extension .ppt
.
<?xml version = '1.0' standalone = 'yes'?> <!--Presentation.xml--> <ClassObject> <Name>Presentation</Name> <Superclass RefType='Name'>Document</Superclass> <Description>Custom Class for Presentations</Description> <Attributes> <Attribute> <Name>NumberOfSlides</Name> <DataType>INTEGER</DataType> </Attribute> </Attributes> </ClassObject>
Once the custom class has been created, associate the file extension (ppt
) with the parser (ClassSelectionParser
) by the usual registration process. You can register the parser using Oracle iFS Manager or XML.
<?xml version = '1.0' standalone = 'yes'?> <!--RegisterPPTParser.xml--> <PropertyBundle> <Update Reftype='ValueDefault'>ParserLookupByFileExtension</Update> <Properties> <Property Action = 'add'> <Name>ppt</Name> <Value Datatype='String'>
oracle.ifs.beans.parsers.ClassSelectionParser </Value> </Property> </Properties> </PropertyBundle>
Once the parser has been registered, you must register the custom class by adding an entry to the IFS.PARSER.ObjectTypeLookupByFileExtension PropertyBundle. Just as registering a parser requires adding an entry to a PropertyBundle, so registering a class also requires adding an entry to a PropertyBundle. In this case, the registration process associates the file extension (.ppt
) with the custom class (Presentation
). You only need to specify the actual class name, not the fully qualified path name.
Registering the class completes the process necessary to invoke the ClassSelectionParser. If this step is omitted, the class associated with the ClassSelectionParser defaults to Document; no parsing will occur.
<?xml version = '1.0' standalone = 'yes'?> <!--RegisterPPTObjectType.xml--> <PropertyBundle> <Update Reftype='ValueDefault'>IFS.PARSER.ObjectTypeLookupByFileExtension </Update> <Properties> <Property Action = 'Add'> <Name>ppt</Name> <Value Datatype='String'>Presentation</Value> </Property> </Properties> </PropertyBundle>
When you place an XML representation of a document in Oracle iFS, the SimpleXmlParser is called to create the document object. The following table provides an overview of how parsing an XML document works.
To illustrate this sequence, consider the following example.
claim3.xml
, into an Oracle iFS folder, /ifs/system/claims
.
.xml
. Because this is an XML file, the parser lookup finds a match and invokes the SimpleXmlParser.
claim2.xml
as an instance of claim.xml. The SimpleXmlParser parses claim2.xml
, creating an object called claim2
.
If you want to parse non-XML documents, such as .doc
or .xls
documents, you must write a custom parser to create database objects from these documents. To create a custom parser, you can either subclass an existing Oracle iFS parser or create a custom class from scratch, implementing the oracle.ifs.beans.parsers.Parser
interface.
The Parser class creates one or more objects. In most cases, the Parser class is used to create the following objects:
A parser determines which type of object to create based on the InputStream or Reader object passed to it. If the InputStream or Reader describes more than one type of object, the parser can either:
A parser application includes four components:
The following table describes each component.
Writing a parser application include the following stages:
For a short parser example, see "Sample Code: A Custom Parser", in this chapter.
For more information about parsers, see the following classes in the Oracle iFS Javadoc:
oracle.ifs.beans.parsers.SimpleXmlParser
oracle.ifs.beans.parsers.XmlParser
oracle.ifs.beans.parsers.SimpleTextParser
The purpose of a parser is to identify the properties in a file, and use the properties to create a database object.
When creating a custom parser, you can choose from two approaches:
oracle.ifs.beans.parsers.Parser
interface.)
oracle.ifs.beans.parsers.Parser
.
Whichever approach you choose, writing a custom parser means implementing the Parser interface, either directly or indirectly. The Parser interface includes one overloaded method, parse(), which accepts two types of input:
Once the parse() method has been called, the balance of the code of the parser itself examines each line and places its content into the appropriate attribute of the object the parser is creating. The syntax and arguments for parse() are described below.
To write a custom parser, you must write two methods:
Every parser must implement the standard constructor for a parser. The standard constructor takes one parameter, as shown in the following table.
Parameter | Datatype | Description |
---|---|---|
|
LibrarySession |
The LibrarySession of the current user. |
public SimplestParser(LibrarySession lib) throws IfsException
The following table describes the parameters of the parse() method.
For sample code for the parse() method, see "Sample Code: A Custom Parser" in this chapter.
For a custom parser, see "Sample Code: A Custom Parser". This SimplestParser extracts the text between the <TITLE> tags of an HTML document and stores that information in a custom field. This requires that a subclass of Document, named CUSTOM with the the attribute TITLE, be registered on the server with the file extension .cus. Customizing the file extension is for illustration purposes, only; you could register the file extension as .htm to use a similar parser for real HTML documents.
This is a simplified example and does not take into consideration versioned documents, nor does it address issues concerning local character sets.
package simparser; // These classes provide the building blocks for an iFS document. import oracle.ifs.beans.Attribute; import oracle.ifs.beans.Document; import oracle.ifs.beans.DocumentDefinition; import oracle.ifs.beans.Format; import oracle.ifs.beans.LibraryObject; import oracle.ifs.common.Collection; import oracle.ifs.common.AttributeValue; // These classes are used to instantiate a folder object to store the document. import oracle.ifs.beans.Folder; import oracle.ifs.beans.FolderPathResolver; // These classes are used to obtain information about the user at runtime. import oracle.ifs.beans.DirectoryUser; import oracle.ifs.beans.PrimaryUserProfile; import oracle.ifs.beans.LibrarySession; // These classes are the base classes for creating a parser. import oracle.ifs.beans.parsers.Parser; import oracle.ifs.beans.parsers.ParserCallback; import java.util.Hashtable; // These are standard Java objects used to process the document content. import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; import java.io.BufferedReader; // This class is used to report exceptions to iFS methods. import oracle.ifs.common.IfsException; public class SimplestParser implements Parser { private String title; private LibrarySession m_librarySession; private Document newDoc; private Folder currentFolder; private Folder homeFolder; // The constructor argument captures the current library session, which is // used to pass information about the user and environment at runtime. public SimplestParser(LibrarySession lib) throws IfsException { m_librarySession = lib; } /* This parser is called by the host protocol at runtime, passing a Reader * object with the contents of the document being parsed. The callback is * an optional argument that enables the parser to respond to the calling * method. The Hashtable is used to store three key parameters: * CURRENT_PATH_OPTION: the current working directory. * CURRENT_NAME_OPTION: the name of the file being parsed. * UPDATE_OBJECT_OPTION: indicates if the document being parsed is * replacing an object that already exists. */ public LibraryObject parse(Reader htmlStream, ParserCallback callback, Hashtable options) throws IfsException { try { /* Instantiate a FolderPathResolver, then pass it the CURRENT_PATH_OPTION as * a string. Set the currentFolder variable to the path where the document * to be parsed was inserted. */ FolderPathResolver fpr = new FolderPathResolver(m_librarySession); fpr.setRelativePath(options.get(CURRENT_PATH_OPTION).toString()); currentFolder = fpr.getCurrentDirectory(); /* Instantiate the string variable documentContent. * Instantiate a BufferedReader object named dataStream and populate it * with the document content passed in to the method. */ String documentContent = ""; BufferedReader dataStream = new BufferedReader(htmlStream); // Read the buffered data into the documentContent variable one line at a time. for (String line = dataStream.readLine();line != null; line = dataStream.readLine()) { documentContent = documentContent + line + "\n"; } // Send the resulting string to the parseTitle method to extract the title. String docTitle = parseTitle(documentContent); // Instantiate a DocumentDefinition object. DocumentDefinition docDef = new DocumentDefinition(m_librarySession); // Instantiate a Collection object and populate it with the list of // format extensions. Set the format in the document definition. Collection allFormats = m_librarySession.getFormatExtensionCollection(); docDef.setFormat((Format) allFormats.getItems("cus")); // The Classname is the name of the subclass we've defined (CUSTOM). docDef.setClassname("Customer"); // Set the Name attribute in the document definition to the variable passed // to the parser in the options Hashtable. docDef.setAttribute("NAME", AttributeValue.newAttributeValue (options.get(CURRENT_NAME_OPTION))); // Set the custom attribute "TITLE" to the docTitle variable returned by the // parseTitle method. docDef.setAttribute("TITLE", AttributeValue.newAttributeValue(docTitle)); // Set the content of the document to the String documentContent. docDef.setContent(documentContent); // Instantiate a new Document using the DocumentDefinition just defined. Document newDoc = (Document) m_librarySession.createPublicObject(docDef); /* Check to see if the UPDATE_OBJECT_OPTION variable is set. If so, update * the document (update). If not, create a new document (addItem). */ if(options.get(UPDATE_OBJECT_OPTION) != null) { Document currentDoc = (Document) currentFolder.findPublicObjectByPath (docDef.getAttribute("NAME").toString()); currentDoc.update(docDef); } else { currentFolder.addItem(newDoc); } } // Catch any exceptions. Set VerboseMessage to true to get a more complete // report of the methods that threw the exception. catch (IfsException ifsExceptionCaught) { ifsExceptionCaught.setVerboseMessage(true); ifsExceptionCaught.printStackTrace(); } catch (Exception exceptionCaught) { exceptionCaught.printStackTrace(); } return newDoc; } /* parse method called when the protocol sends the file content as an * InputStream. This method converts the InputStream to a BufferedReader * and forwards it to the first parse method (keeps code concise). */ public LibraryObject parse(InputStream htmlStream, ParserCallback callback, Hashtable options) { // Convert the InputStream htmlStream to the BufferedReader named redirect. BufferedReader redirect = new BufferedReader(new InputStreamReader(htmlStream)); // Send the resulting BufferedReader to the first parse method. try { Document newDoc = (Document) parse(redirect,callback,options); } // Catch and report (in verbose mode) any exceptions. catch (IfsException ifsExceptionCaught) { ifsExceptionCaught.setVerboseMessage(true); ifsExceptionCaught.printStackTrace(); } catch (Exception exceptionCaught) { exceptionCaught.printStackTrace(); } return newDoc; } /* This is the actual custom parsing routine. It searches the text String * for the tag <TITLE>, starts at the 7th character (the length of the * <TITLE> tag and extracts a substring of all the information through * the last character before the </TITLE> tag. */ private String parseTitle (String parseString){ try { title = parseString.substring((parseString.indexOf("<TITLE>")+ 7), parseString.indexOf("</TITLE>")); } catch (Exception e) { title = "Untitled"; e.printStackTrace(); } return title; } }
For the protocol servers and other standard Oracle iFS components to access your custom parser, the folder tree containing the class for the parser must reside in the Oracle iFS CLASSPATH. Oracle iFS includes a special directory for this purpose. This directory, called custom_classes
, is already in the CLASSPATH environment variable that the Oracle iFS server software uses.
To deploy a parser:
.class
file.
.class
file in the directory $ORACLE_HOME/ifs/custom_classes
on the server where Oracle iFS is installed.
The purpose of registering a parser is to map a certain file extension to a specific parser. Once this mapping is created, whenever a file with that extension is imported by an Oracle iFS client or protocol, the file will be passed to the custom parser before it is stored in the repository. You can register a parser in either of two ways:
Each registered parser has two attributes:
Attribute | Datatype | Description | Example |
---|---|---|---|
|
String |
File extension. |
|
|
String |
Fully qualified classname of the parser. |
|
The underlying mechanism for storing the mappings between file extensions and parsers is a PropertyBundle object called "ParserLookupByFileExtension." A PropertyBundle is a list of name/value pairs stored as an array of Property objects. Each Property object stores the mapping between a file extension and a parser as a Name/Value pair:
cus
.
ifs.demo.SimplestParser.parser.SimplestParser
.
To register a parser using Oracle iFS Manager, follow these steps:
To register a parser using XML, write an XML file to add a new Property object to the ParserLookupByFileExtension PropertyBundle, specifying the file extension and class name of the parser.
<?xml version="1.0" standalone="yes"?> <!--SimplestParser.xml--> <PROPERTYBUNDLE> <UPDATE RefType="valuedefault">ParserLookupByFileExtension</UPDATE> <PROPERTIES> <PROPERTY ACTION="add"> <NAME>po</NAME> <VALUE DataType="String">ifs.demo.SimplestParser.parser.SimplestParser</VALUE> </PROPERTY> </PROPERTIES> </PROPERTYBUNDLE>
When an application program inserts content into the repository, the application is responsible for invoking the appropriate parser, either a standard Oracle iFS parser or a custom parser. (The protocols automatically call an Oracle iFS parser when they are used to insert documents into the repository.)
In order to parse a document, an application must:
When a custom application calls a parser, the application may, optionally, pass in a ParserCallback object. A ParserCallback allows an application to provide additional processing before or after the actual parsing. The parser must, therefore, check to see if this optional parameter has been passed in.
The ParserCallback interface specifies three methods that allow an application to interact with a parser:
public LibraryObjectDefinition preOperation (LibraryObject lo,
LibraryObjectDefinition def)
throws IfsException
Parameter Name | Datatype | Description |
---|---|---|
|
LibraryObject |
The LibraryObject that was created or updated by the parse operation. |
public void postOperation (LibraryObject lo)
throws IfsException
Parameter Name | Datatype | Description |
---|---|---|
e |
IfsException |
The potential exception. |
public void signalException(IfsException e)
throws IfsException
The following sample code provides a brief example of implementing the ParserCallback interface:
/*---FolderParsedObject.java---*/ private static class FolderParsedObject implements ParserCallback { private Folder m_TargetFolder; public FolderParsedObject(Folder f) { m_TargetFolder = f; } public LibraryObjectDefinition preOperation(LibraryObject parm1, LibraryObjectDefinition parm2) throws IfsException { return parm2; } public void postOperation(LibraryObject newObject) throws IfsException { m_TargetFolder.addItem((PublicObject) newObject); } public void signalException(IfsException e) throws IfsException { throw e; } }
|
Copyright © 2000 Oracle Corporation. All Rights Reserved. |
|