D Pre/Post Processing Support for XML and Complex File Drivers

This appendix describes how to configure and implement the pre and post processing stages for the Oracle Data Integrator driver for XML and Complex Files.

This appendix includes the following sections:

D.1 Overview

You can now customize the way data is fed to the XML and Complex File drivers. You can set up intermediate processing stages to process the data that is retrieved from an external endpoint using Oracle Data Integrator, or to write the data out to an external endpoint.

You can configure one Terminal stage and zero or multiple Junction stages. The terminal stage can read data from external endpoints and write data to external endpoints. The terminal stage reads the source data from an external endpoint and passes it to the junction stage for processing. The junction stages can be configured to process the data passed by the terminal stage.

The source data can be in any format, not necessarily XML or Complex File, until it reaches the XML driver or the Complex File driver. However, when the data is finally handed off to the XML driver or the Complex File driver, the data must be in the required format. That is, when the data is handed off to the XML driver, it must be a valid XML that adheres to the XSD that has been configured for the data server. Similarly, when the data is handed off to the Complex File driver, the data must exactly match the pattern as defined by the nXSD file.

D.2 Configuring the processing stages

The complete configuration of the intermediate processing stages to the ODI JDBC driver in the form an XML file. The XSD for the configuration XML file must also be included.

For an input pipeline configuration, the first stage would be the one that first processes the input. The last stage would be the one that feeds data to the driver. This last stage must provide an output that adheres to the format expected by the XML or the Complex File driver.

For an output pipeline configuration, the last stage would be the one that writes out the output. The first stage would be the one that accepts the data from the driver. This data would have the same shape as the XSD of the dataserver.

After you create the XML file that contains the configuration, ensure that the pipeline_config_file or pcf property of the XML driver or the Complex File driver points to the absolute file location of the XML file.

Example D-1 shows a sample configuration XML file.

Example D-1 Sample Configuration XML File

<?xml version="1.0" encoding="UTF-8"?>
<pipeline xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="pre-post.xsd">
 
   <input-stages>
      <io-stage name="restInput">
         <codeDefinition>
            <javaClass>com.company.org.InputProcessor</javaClass>
         </codeDefinition>
         <debugOutput>http://tempuri.org</debugOutput>
      </io-stage>
      <stage name="BufferInputStage">
         <codeDefinition>
            <javaClass>com.company.org.BufferingClass</javaClass>
         </codeDefinition>
         <props>
            <property name="bufferSizeBytes">2340</property>
         </props>
      </stage>
      <stage name="UnzipStage">
         <codeDefinition>
            <code>[Groovy text in Base64 encoded form]</code>
         </codeDefinition>
      </stage>
   </input-stages>
   <output-stages>
      <io-stage name="restOut">
         <codeDefinition>
            <javaClass>com.company.org.OutputProcessor</javaClass>
         </codeDefinition>
         <debugOutput>http://tempuri.org</debugOutput>
      </io-stage>
      <stage name="SevenZipOutputStage">
         <codeDefinition>
            <code>[Groovy text in Base64 encoded form]</code>
         </codeDefinition>
      </stage>
      <stage name="BufferOutputStage">
         <codeDefinition>
            <javaClass>com.company.org.PushOutput</javaClass>
         </codeDefinition>
         <debugOutput>/scratch/jsmith/view_storage/tmp/bufferout.txt</debugOutput>
      </stage>
   </output-stages>
</pipeline>

D.3 Implementing the processing stages

Pre or post data processing support for XML driver and Complex File driver may be implemented in three different ways.

  • Groovy Code

    By supplying the Groovy code directly into the configuration XML file. This Groovy code is a part of the dataserver configuration and cannot be re-used. You can supply the Groovy code as a Base64 encoded string or as a plain text string within a CDATA section.

    For an example, see Section D.7, "Example: Groovy Code Embedded in Configuration XML File".

  • Java Class

    By providing the fully qualified name of a Java class. This Java class must be available on the ODI Agent classpath at runtime.

    For ODI Studio it might be made into a JAR and placed in USER_HOME/odi/oracledi/userlib directory.

    For Standalone or Collocated agents this JAR must either be placed in DOMAIN_HOME/lib directory or should be coded into the classpath using one of the scripts.

    For JEE Agents it must be deployed as a shared library and ODI Agent application must depend on this shared library.

    For an example, see Section D.6, "Example: Java Class for Reading Data From HTTP Source Requiring Authentication".

  • Groovy Script

    By providing the name of a Groovy script. All the requirements of Java class apply to this Groovy script as well. As an exception you may provide either the name of the script, for example, MyGroovySource.groovy or an absolute path to the script, for example, /home/groupuser/name/MyCustomGroovy.groovy.

    In the former case, the script is looked up as a Java Class resource using the ClassLoader. The usual locator pattern for class resources applies for this. For example, if the file is not in a JAR, the file name must be provided as /MyGroovySource.groovy. If it is in a subdirectory of a JAR, then the locator will be /com/foo/MyGroovySource.groovy. If using absolute path, the Groovy script is accessed as a plain Java File.

    For examples, see the following sections:

Note:

Take a note of the following:
  • The changes in the embedded Groovy code or Groovy script file located via absolute path will not be picked up unless the XML driver schema is dropped. In the case of Java class or Groovy script file located via classpath, you must restart the JVM to pick up the changes.

  • The inline Groovy code, Groovy script, or Java class must all conform to the Java interfaces as provided in the Public APIs. ODI driver will apply chaining to the resultant code with the ordering as set up in the configuration and the data will flow through the multiple stages as configured.

D.4 Example: Groovy Script for Reading XML Data From Within a ZIP File

Following is an example of a Groovy script to read XML data from within a ZIP file.

Example D-2 Groovy Script: Read XML Data from within a ZIP file

import java.io.IOException
import java.io.InputStream;
import java.util.Properties;
import java.util.logging.Logger;
import oracle.odi.jdbc.drivers.common.pipeline.api.Stage;
import oracle.odi.jdbc.drivers.common.pipeline.api.TerminalStreamInputStage;

class FileFromZip extends TerminalStreamInputStage {
   public FileFromZip(Properties pStageProperties, String pDataserverUrl,
                             Properties pDataserverProperties, String pJavaEncoding,
     Logger pLogger, String pDebugLocation, String pDebugEncoding, String pStageName) {

   super(pStageProperties, pDataserverUrl, pDataserverProperties,
    pJavaEncoding, pLogger, pDebugLocation, pDebugEncoding, pStageName);
   }

   @Override
   public InputStream readSource() throws IOException {
     def zipFile = new java.util.zip.ZipFile(new File(getStageProperties().get("ZIP_FILE")))
     def zipEntry = zipFile.entries().find { !it.directory && getStageProperties().get("XML_FILE").equalsIgnoreCase(it.name)}
     return zipFile.getInputStream(zipEntry)
   }

   @Override
   public void close() throws IOException {
     // TODO Auto-generated method stub
   }

}

D.5 Example: Groovy Script for Transforming XML Data and Writing to a Different Format

Following is an example of a Groovy script to transform XML data and write it out to a different format.

Example D-3 Groovy Script: Transform XML data and write it to a different format

package oracle.odi.jdbc.driver
 
import groovy.xml.MarkupBuilder;
 
import java.io.IOException;
import java.io.OutputStream
import java.util.Properties;
import java.util.logging.Logger;

import oracle.odi.jdbc.drivers.common.pipeline.api.JunctionStreamOutputStage;
import oracle.odi.jdbc.drivers.common.pipeline.api.Stage;
 
class TransformXmlOutput extends JunctionStreamOutputStage {
 
        private OutputStream output
        
  public TransformXmlOutput(Properties pStageProperties, String pDataserverUrl,
        Properties pDataserverProperties, String pJavaEncoding, Logger pLogger, String pDebugLocation,
        String pDebugEncoding, String pStageName) {
                super(pStageProperties, pDataserverUrl, pDataserverProperties, pJavaEncoding, pLogger,
                pDebugLocation, pDebugEncoding, pStageName);
        }

  @Override
        public OutputStream writeOutput(OutputStream out) {
                System.out.println("In TransformXmlOutput writeOutput")
                def Writer w = new BufferedWriter(new OutputStreamWriter(out))
                System.out.println("Created writer")
                output = pipeInput { input ->
                        // Perform transformation
                        System.out.println("Piping")
                        def builder = new MarkupBuilder (w);
                        def cars = new XmlSlurper().parse(input)
 
                        System.out.println("Parsed XML")
 
                        builder.mkp.xmlDeclaration(version: "1.0", encoding: "utf-8")
 
                        builder.html(xmlns:"http://www.w3.org/1999/xhtml") {
                                head {
                                        title "Cars collection"
                                }
                                body {
                                        h1("Cars")
                                        ul(){
                                                cars.car.each{car ->
                                                                li(car.@name.toString() + "," + car.country + "," + car.description + ", Age: " + (2012 - car.@year.toInteger()) + " years")
                                                }
                                        }
                                }
                            }
                        w.flush()
                        System.out.println("Closing connectedStage")
                        closeConnectedStage();
                }
        }
 
        @Override
        public void close() throws IOException {
                System.out.println("Closing TransformXmlOutput")
                if(output!= null) {
                        output.flush();
                        output.close()
                }
        }
 
        public static OutputStream pipeInput(Closure read) {
 
                PipedInputStream input = new PipedInputStream()
                PipedOutputStream output = new PipedOutputStream(input)
                getThreadsSource.submit {
                        try{
                                read(input)
                        } catch (Exception e) {
                                System.out.println("Exception in thread")
                                e.printStackTrace();
                                throw e;
                        } finally {
                                output.flush()
                        }
                   }
                return output
          }
}

D.6 Example: Java Class for Reading Data From HTTP Source Requiring Authentication

Following is an example of a Java class to read data from an HTTP source that requires authentication.

Example D-4 Java Class: Read Data From HTTP Source Requiring Authentication

/**
 * 
 */
package oracle.odi.jdbc.driver.xml;
 
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.util.Properties;
import java.util.logging.Logger;
 
import oracle.odi.jdbc.drivers.common.pipeline.api.TerminalStreamInputStage;
 
/**
 * @author jsmith 
 *
 */
public class FromHttpBasicAuthJava extends TerminalStreamInputStage {
 
   /**
    * @param pStageProperties
    * @param pDataserverUrl
    * @param pDataserverProperties
    * @param pJavaEncoding
    * @param pLogger
    * @param pDebugLocation
    * @param pDebugEncoding
    * @param pStageName
    */
    public FromHttpBasicAuthJava(Properties pStageProperties, String pDataserverUrl,
            Properties pDataserverProperties, String pJavaEncoding,
            Logger pLogger, String pDebugLocation, String pDebugEncoding,
            String pStageName) {
        super(pStageProperties, pDataserverUrl, pDataserverProperties,
                pJavaEncoding, pLogger, pDebugLocation, pDebugEncoding,
                pStageName);
    }
 
    /* (non-Javadoc)
     * @see oracle.odi.jdbc.drivers.common.pipeline.api.TerminalStreamInputStage#readSource()
     */
    @Override
    public InputStream readSource() throws IOException {
        String username = (String)(getStageProperties().get("username"));
        String password = (String)(getStageProperties().get("password"));
        byte[] credential = org.apache.commons.codec.binary.Base64.encodeBase64(
                (username + ":" + password).getBytes());
 
        //pass encoded user name and password as header
        URL url = new URL ("http://localhost:18000/get");
        URLConnection conn = url.openConnection();
        conn.setRequestProperty ("Authorization", "Basic " + new String(credential));
        urlStream = conn.getInputStream();
        StringBuilder result = new StringBuilder();
        byte[] read;
        int bytesRead;
        while(true) {
            read = new byte[1024];
            if((bytesRead = urlStream.read(read)) == -1) {
                break;
            } else
                result.append(new String(read, 0, bytesRead));
        }
        
        return new ByteArrayInputStream(result.toString().getBytes());
    }
 
    /* (non-Javadoc)
     * @see oracle.odi.jdbc.drivers.common.pipeline.api.Stage#close()
     */
    @Override
    public void close() throws IOException {
        if(urlStream != null)
            urlStream.close();
    }
 
    private InputStream urlStream = null;
}

D.7 Example: Groovy Code Embedded in Configuration XML File

Following is an example of a configuration XML with Groovy code embedded as Base64 string.

Example D-5 Configuration XML file with Groovy code embedded as Base64 string

<?xml version="1.0" encoding="UTF-8"?>
 
<pipeline xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="pre-post.xsd">
 
  <input-stages>
    <io-stage name="fromZip">
      <codeDefinition>
<code>
CgppbXBvcnQgamF2YS5pby5JT0V4Y2VwdGlvbgppbXBvcnQgamF2YS5pby5JbnB1dFN0cmVhbTsKaW1wb3J0IGphdmEudXRpbC5Qcm9wZXJ0aWVzOwppbXBvcnQgamF2YS51dGlsLmxvZ2dpbmcuTG9nZ2VyOwoKaW1wb3J0IG9yYWNsZS5vZGkuamRiYy5kcml2ZXJzLmNvbW1vbi5waXBlbGluZS5hcGkuU3RhZ2U7CmltcG9ydCBvcmFjbGUub2RpLmpkYmMuZHJpdmVycy5jb21tb24ucGlwZWxpbmUuYXBpLlRlcm1pbmFsU3RyZWFtSW5wdXRTdGFnZTsKCmNsYXNzIEZpbGVGcm9tRnJvbVppcCBleHRlbmRzIFRlcm1pbmFsU3RyZWFtSW5wdXRTdGFnZSB7CgoJcHVibGljIEZpbGVGcm9tRnJvbVppcChQcm9wZXJ0aWVzIHBTdGFnZVByb3BlcnRpZXMsIFN0cmluZyBwRGF0YXNlcnZlclVybCwKCQkJUHJvcGVydGllcyBwRGF0YXNlcnZlclByb3BlcnRpZXMsIFN0cmluZyBwSmF2YUVuY29kaW5nLAoJCQlMb2dnZXIgcExvZ2dlciwgU3RyaW5nIHBEZWJ1Z0xvY2F0aW9uLCBTdHJpbmcgcERlYnVnRW5jb2RpbmcsIFN0cmluZyBwU3RhZ2VOYW1lKSB7CgkJc3VwZXIocFN0YWdlUHJvcGVydGllcywgcERhdGFzZXJ2ZXJVcmwsIHBEYXRhc2VydmVyUHJvcGVydGllcywKCQkJCXBKYXZhRW5jb2RpbmcsIHBMb2dnZXIsIHBEZWJ1Z0xvY2F0aW9uLCBwRGVidWdFbmNvZGluZywgcFN0YWdlTmFtZSk7Cgl9CgoJQE92ZXJyaWRlCglwdWJsaWMgSW5wdXRTdHJlYW0gcmVhZFNvdXJjZSgpIHRocm93cyBJT0V4Y2VwdGlvbiB7CgkJZGVmIHppcEZpbGUgPSBuZXcgamF2YS51dGlsLnppcC5aaXBGaWxlKG5ldyBGaWxlKGdldFN0YWdlUHJvcGVydGllcygpLmdldCgiWklQX0ZJTEUiKSkpCgkJZGVmIHppcEVudHJ5ID0gemlwRmlsZS5lbnRyaWVzKCkuZmluZCB7ICFpdC5kaXJlY3RvcnkgJiYgZ2V0U3RhZ2VQcm9wZXJ0aWVzKCkuZ2V0KCJYTUxfRklMRSIpLmVxdWFsc0lnbm9yZUNhc2UoaXQubmFtZSl9CgkJcmV0dXJuIHppcEZpbGUuZ2V0SW5wdXRTdHJlYW0oemlwRW50cnkpCgl9CgoJQE92ZXJyaWRlCglwdWJsaWMgdm9pZCBjbG9zZSgpIHRocm93cyBJT0V4Y2VwdGlvbiB7CgkJLy8gVE9ETyBBdXRvLWdlbmVyYXRlZCBtZXRob2Qgc3R1YgoKCX0KCn0K
</code>
      </codeDefinition>
      <props>
        <property name="ZIP_FILE">/home/myuser/files/personal.zip</property>
        <property name="XML_FILE">personal.xml</property>
      </props>
    </io-stage>
  </input-stages>
</pipeline>