This appendix includes the following sections:
Please note the following when working on Big Data integration projects:
Before ODI 12c (18.104.22.168) any Groovy, Jython, Beanshell code in ODI Procedures/Custom KMs were not able to access Hadoop/Pig classes, unless these JARs were added to ODI class path.
Starting with ODI 12c (22.214.171.124), the ODI Procedures/Custom KMs can access Hadoop/Pig classes as long as they exist in the paths configured on Hadoop/Pig data servers.
A new property
oracle.odi.prefer.dataserver.packages is exposed on Hadoop and Pig data servers, as well as Hive data servers. This property lets you specify which packages are loaded child-first rather than parent-first.
Note: Upgraded repositories will not show this property on upgraded Hadoop/Pig data servers. Only new data servers will show this property.
In JEE environment, Agent application may be redeployed. However due to Pig's shutdown hook, Logging leak, and other undiscovered leaks, the execution classloader created will not get GC'd. Hence, in ODI 12c (12.2.1), if using Big Data features, the JEE Agent application must not be re-deployed, instead a server restart is required.
Any package filter applied to a data server must be as specific as possible. Do not try to make things easier by specifying the widest possible filter. For example, if you specify
org.apache as a filter element, you will get ClassCastException on Beanshell instantiation, XML parsers instantiation, and so on. This happens because according to Java Language Specification two class instances are castable only if they are same type declaration and are loaded by the same classloader. In this example, your interface will be under some sub-package of
org.apache, for example,
org.apache.util.IMyInterface. The interface class loaded by the Studio classloader/web application classloader is the casting target. When the implementation class is instantiated via reflection, the instance class's interface class is also loaded by the execution classloader. When JNIEnv code does the checking to see if the caster and castee share a same type declaration, it will turn out to be false since the LHS has Studio/web-application classloader and RHS has execution classloader.
Execution classloader instances are cached. Changing the data server package filter or data server classpath results in the creation of a new classloader instance. The old classloader may not be GC'd immediately (or even ever). This can lead to running out of heap space. The only solution is a JVM restart.
When using SDK to create Pig, Hadoop, or any other data server having package filtering property set on it, adding more data server properties requires attention to one detail. You must retrieve the current set of properties, add your properties to it and then set it on the data server. Otherwise, the filtering property will be lost.