8 Using Oracle R Advanced Analytics for Hadoop

This chapter describes R support for big data. It contains the following sections:

Note:

Oracle R Advanced Analytics for Hadoop was previously called Oracle R Connector for Hadoop or ORCH. ORCH is still mentioned in this document and in the product for backward compatibility.

8.1 About Oracle R Advanced Analytics for Hadoop

Oracle R Advanced Analytics for Hadoop provides:

  • A general computation framework, in which you can use the R language to write your custom logic as mappers or reducers. The code executes in a distributed, parallel manner using the available compute and storage resources on the Hadoop cluster.

  • An R interface to manipulate Hive tables, which is similar to the transparency layer of Oracle R Enterprise but with a restricted set of functionality.

  • A set of pre-packaged parallel-distributed algorithms.

  • Support for Apache Spark, with which you can execute predictive analytics functions on a Hadoop cluster using YARN to dynamically form a Spark cluster or on a dedicated stand-alone Spark cluster. You can switch on or off Spark execution using spark.connect() and spark.disconnect() functions.

  • The ability to use Spark to execute neural network analytical function (orch.neural), for significantly improved performance over MapReduce execution.

8.1.1 Oracle R Advanced Analytics for Hadoop Architecture

Oracle R Advanced Analytics for Hadoop:

  • is built upon Hadoop streaming, a utility that is a part of Hadoop distribution and allows creation and execution of Map or Reduce jobs with any executable or script as mapper or reducer.

  • is designed for R users to work with Hadoop cluster in a client-server configuration. Client configurations must conform to the requirements of the Hadoop distribution that Oracle R Advanced Analytics for Hadoop is deployed in.

  • uses command line interfaces to HDFS and HIVE to communicate from client nodes to Hadoop clusters.

  • builds the logic required to transform an input stream of data into R data frame object to be readily consumed by user-provided mapper and reducer functions written into R.

  • allows R users to move data from an Oracle Database table or view into Hadoop as an HDFS file, using the Sqoop utility. Similarly data can be moved back from an HDFS file into Oracle Database, using the Sqoop utility or Oracle Loader for Hadoop, depending on the size of data being moved and security requirements

  • support's R's binary RData representation for input and output, for performance sensitive analytic workloads. Conversion utilities from delimiter separated representation to and from RData representation is available as part of Oracle R Advanced Analytics for Hadoop.

  • includes a Hadoop Abstraction Layer (HAL) which manages the similarities and differences across various Hadoop distributions. ORCH will auto-detect the Hadoop version at startup.

8.1.2 Oracle R Advanced Analytics for Hadoop packages and functions

Oracle R Advanced Analytics for Hadoop includes a collection of R packages that provides:

  • Interfaces to work with the:

    • Apache Hive tables

    • Apache Hadoop compute infrastructure

    • local R environment

    • Oracle Database tables

    • -Proprietary binary RData representations

    • Apache Spark RDD objects

  • Predictive analytic techniques for:

    • linear regression

    • generalized linear models

    • neural networks

    • matrix completion using low rank matrix factorization

    • nonnegative matrix factorization

    • k-means clustering

    • principal components analysis

    • multivariate analysis

    ORAAH 2.6 introduces full stack of predictive modeling algorithms on Spark. This includes integration of many Spark MLlib capabilities, including Linear Model techniques (Linear Regression, LASSO, Ridge Regression), as well as GLM, SVM, k-Means, Gaussian Mixture clustering, Decision Trees, Random Forests and Gradient Boosted Trees, PCA and SVD. Existing ORAAH custom Spark algorithms are enhanced with the addition of Linear Models and Stepwise capability for both LM and GLM.

    While these techniques have R interfaces, Oracle R Advanced Analytics for Hadoop implement them in either Java or R as distributed, parallel MapReduce jobs, thereby leveraging all nodes of your Hadoop cluster.

You install and load this package as you would any other R package. Using simple R functions, you can perform tasks like these:

  • Access and transform HDFS data using a Hive-enabled transparency layer

  • Use the R language for writing mappers and reducers

  • Copy data between R memory, the local file system, HDFS, Hive, and Oracle Database instances

  • Manipulate Hive data transparently from R

  • Execute R programs as Hadoop MapReduce jobs and return the results to any of those locations

    • With Oracle R Advanced Analytics for Hadoop, MapReduce jobs can be submitted from R for both non-cluster (local) execution and Hadoop cluster execution

    • When Oracle R Enterprise and Oracle R Advanced Analytics for Hadoop are used together on a database server, you can schedule database jobs using the DBMS_SCHEDULER to execute scripts containing ORCH functions

To use Oracle R Advanced Analytics for Hadoop, you should be familiar with MapReduce programming, R programming, and statistical methods.

8.1.3 Oracle R Advanced Analytics for Hadoop APIs

Oracle R Advanced Analytics for Hadoop provides access from a local R client to Apache Hadoop using functions with these prefixes:

  • hadoop: Identifies functions that provide an interface to Hadoop MapReduce

  • hdfs: Identifies functions that provide an interface to HDFS

  • orch: Identifies a variety of functions; orch is a general prefix for ORCH functions

  • ore: Identifies functions that provide an interface to a Hive data store

Oracle R Advanced Analytics for Hadoop uses data frames as the primary object type, but it can also operate on vectors and matrices to exchange data with HDFS. The APIs support the numeric, integer, and character data types in R.

All of the APIs are included in the ORCH library. The functions are listed in "Oracle R Advanced Analytics for Hadoop Functions".

See Also:

The R Project website at http://www.r-project.org/

8.1.4 Inputs to Oracle R Advanced Analytics for Hadoop

Oracle R Advanced Analytics for Hadoop can work with delimited text files resident in an HDFS directory, HIVE tables, or binary RData representations of data. If the input data to an Oracle R Advanced Analytics for Hadoop orchestrated map-reduce computation does not reside in HDFS, a copy of the data in HDFS is created automatically prior to launching the computation.

Before Oracle R Advanced Analytics for Hadoop can work with delimited text files it determines metadata associated with the files and captures the same in a file stored alongside of the data files. This file is named __ORCHMETA__. The metadata contains information such as:

  • If the file contains key(s), then the delimiter that is the key separator

  • The delimiter that is the value separator

  • Number and data types of columns in the file

  • Optional names of columns

  • Dictionary information for categorical columns

  • Other Oracle R Advanced Analytics for Hadoop-specific system data

Oracle R Advanced Analytics for Hadoop runs an automatic metadata discovery procedure on HDFS objects as part of hdfs.attach() invocation to create the metadata file. When working with HIVE tables, __ORCHMETA__ file is created automatically from the HIVE table definition2.

Oracle R Advanced Analytics for Hadoop can optionally convert input data into R's binary RData representation for I/O performance that is on par with a pure Java based map-reduce implementation.

Oracle R Advanced Analytics for Hadoop captures row streams from HDFS files and delivers them formatted as a data frame object (or optionally matrix, vector, or list objects generated from the data frame object or AS IS, if RData representation is used) to the mapped function written in R. To accomplish this, Oracle R Advanced Analytics for Hadoop must recognize the tokens and data types of the tokens that become columns of a data frame. Oracle R Advanced Analytics for Hadoop uses R's facilities to parse and interpret tokens in input row streams. If missing values are not represented using R's “NA" token, they can be explicitly identified by the na.strings argument of hdfs.attach().

Delimited text files with the same key and value separator are preferred over files with a different key delimiter and value delimiter. The Read performance of files with the same key and value delimiter is roughly 2x better than that of files with different key and value delimiter.

The key delimiter and value delimiter can be specified through the key.sep and val.sep arguments of hdfs.attach() or when running a MapReduce job for its output HDFS data.

Binary RData representation is the most performance efficient representation of input data in Oracle R Advanced Analytics for Hadoop. When possible, users are encouraged to use this binary data representation for performance sensitive analytics.

8.2 Access to HDFS Files

For Oracle R Advanced Analytics for Hadoop to access the data stored in HDFS, the input files must comply with the following requirements:

  • All input files for a MapReduce job must be stored in one directory as the parts of one logical file. Any valid HDFS directory name and file name extensions are acceptable.

  • Any file in that directory with a name beginning with an underscore (_) is ignored.

All delimiters are supported, and key and value delimiters can be different.

You can also convert a delimited file into binary format, using the Rdata representation from R, for the best I/O performance.

8.3 Access to Apache Hive

Apache Hive provides an alternative storage and retrieval mechanism to HDFS files through a querying language called HiveQL, which closely resembles SQL. Hive uses MapReduce for distributed processing. However, the data is structured and has additional metadata to support data discovery. Oracle R Advanced Analytics for Hadoop uses the data preparation and analysis features of HiveQL, while enabling you to use R language constructs.

8.3.1 ORCH Functions for Hive

ORCH provides these conversion functions to help you move data between HDFS and Hive:

hdfs.toHive
hdfs.fromHive

8.3.2 ORE Functions for Hive

You can connect to Hive and analyze and transform Hive table objects using R functions that have an ore prefix, such as ore.connect. If you are also using Oracle R Enterprise, then you will recognize these functions. The ore functions in Oracle R Enterprise create and manage objects in an Oracle database, and the ore functions in Oracle R Advanced Analytics for Hadoop create and manage objects in a Hive database. You can connect to one database at a time, either Hive or Oracle Database, but not both simultaneously.

Note:

For information about requirements and instructions to set up and use Oracle R Enterprise, refer to Oracle R Enterprise library at: https://docs.oracle.com/cd/E83411_01/index.htm.

For example, the ore.connect(type="HIVE") establishes a connection with the default HIVE database.ore.hiveOptions(dbname='dbtmp') and allows you to change the default database, while ore.showHiveOptions() allows you to examine the current default HIVE database.

See Table 8-7 for a list of ORE as.ore.* and is.ore.* functions.

8.3.3 Generic R Functions Supported in Hive

Oracle R Advanced Analytics for Hadoop also overloads the following standard generic R functions with methods to work with Hive objects.

Character methods

casefold, chartr, gsub, nchar, substr, substring, tolower, toupper

This release does not support grepl or sub.

Frame methods
  • attach, show

  • [, $, $<-, [[, [[<-

  • Subset functions: head, tail

  • Metadata functions: dim, length, NROW, nrow, NCOL, ncol, names, names<-, colnames, colnames<-

  • Conversion functions: as.data.frame, as.env, as.list

  • Arithmetic operators: +, -, *, ^, %%, %/%, /

  • Compare, Logic, xor, !

  • Test functions: is.finite, is.infinite, is.na, is.nan

  • Mathematical transformations: abs, acos, asin, atan, ceiling, cos, exp, expm1, floor, log, log10, log1p, log2, logb, round, sign, sin, sqrt, tan, trunc

  • Basic statistics: colMeans, colSums, rowMeans, rowSums, Summary, summary, unique

  • by, merge

  • unlist, rbind, cbind, data.frame, eval

This release does not support dimnames, interaction, max.col, row.names, row.names<-, scale, split, subset, transform, with, or within.

Logical methods

ifelse, Logic, xor, !

Matrix methods

Not supported

Numeric methods
  • Arithmetic operators: +, -, *, ^, %%, %/%, /

  • Test functions: is.finite, is.infinite, is.nan

  • abs, acos, asin, atan, ceiling, cos, exp, expm1, floor, log, log1p, log2, log10, logb, mean, round, sign, sin, sqrt, Summary, summary, tan, trunc, zapsmall

This release does not support atan2, besselI, besselK, besselJ, besselY, diff, factorial, lfactorial, pmax, pmin, or tabulate.

Vector methods
  • show, length, c

  • Test functions: is.vector, is.na

  • Conversion functions: as.vector, as.character, as.numeric, as.integer, as.logical

  • [, [<-, |

  • by, Compare, head, %in%, paste, sort, table, tail, tapply, unique

This release does not support interaction, lengthb, rank, or split.

The following example shows simple data preparation and processing.

Example 8-1 Using R to Process Data in Hive Tables

# Connect to Hive
ore.connect(type="HIVE")

# Attach the current envt. into search path of R
ore.attach()

# create a Hive table by pushing the numeric columns of the iris data set
IRIS_TABLE <- ore.push(iris[1:4])

# Create bins based on Petal Length
 IRIS_TABLE$PetalBins = ifelse(IRIS_TABLE$Petal.Length < 2.0, "SMALL PETALS",
+                        ifelse(IRIS_TABLE$Petal.Length < 4.0, "MEDIUM PETALS",
+                        ifelse(IRIS_TABLE$Petal.Length < 6.0,
+                               "MEDIUM LARGE PETALS", "LARGE PETALS")))

#PetalBins is now a derived column of the HIVE object
> names(IRIS_TABLE)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "PetalBins"   

# Based on the bins, generate summary statistics for each group
aggregate(IRIS_TABLE$Petal.Length, by = list(PetalBins = IRIS_TABLE$PetalBins),
+           FUN = summary)
1        LARGE PETALS    6 6.025000 6.200000 6.354545 6.612500  6.9    0
2 MEDIUM LARGE PETALS    4 4.418750 4.820000 4.888462 5.275000  5.9    0
3       MEDIUM PETALS    3 3.262500 3.550000 3.581818 3.808333  3.9    0
4        SMALL PETALS    1 1.311538 1.407692 1.462000 1.507143  1.9    0
Warning message:
ORE object has no unique key - using random order 

8.3.4 Support for Hive Data Types

Oracle R Advanced Analytics for Hadoop can access any Hive table containing columns with string and numeric data types such as tinyint, smallint, bigint, int, float, and double.

There is no support for these complex data types:

array
binary
map
struct
timestamp
union

If you attempt to access a Hive table containing an unsupported data type, you will receive an error message. To access the table, you must convert the column to a supported data type.

To convert a column to a supported data type:

  1. Open the Hive command interface:

    $ hive
    hive>
    
  2. Identify the column with an unsupported data type:

    hive> describe table_name;
    
  3. View the data in the column:

    hive> select column_name from table_name;
    
  4. Create a table for the converted data, using only supported data types.

  5. Copy the data into the new table, using an appropriate conversion tool.

The first example below shows the conversion of an array. The other two examples show the conversion of timestamp data.

Example 8-2 Converting an Array to String Columns

R> ore.sync(table="t1")
   Warning message:
   table t1 contains unsupported data types 
     .
     .
     .
hive> describe t1;
OK
      col1   int
      col2   array<string>

hive> select * from t1;
OK
1      ["a","b","c"]
2      ["d","e","f"]
3      ["g","h","i"]

hive> create table t2 (c1 string, c2 string, c2 string);
hive> insert into table t2 select col2[0], col2[1], col2[2] from t1;
     .
     .
     .
R> ore.sync(table="t2")
R> ore.ls()
[1] "t2"
R> t2$c1
[1] "a" "d" "g" 

The following example uses automatic conversion of the timestamp data type into string. The data is stored in a table named t5 with a column named tstmp.

Example 8-3 Converting a Timestamp Column

hive> select * from t5;


hive> create table t6 (timestmp string); 
hive> insert into table t6 SELECT tstmp from t5;
 

The following example uses the Hive get_json_object function to extract the two columns of interest from the JSON table into a separate table for use by Oracle R Advanced Analytics for Hadoop.

Example 8-4 Converting a Timestamp Column in a JSON File

hive> select * from t3;
OK
      {"custId":1305981,"movieId":null,"genreId":null,"time":"2010-12-30:23:59:32","recommended":null,"activity":9}

hive> create table t4 (custid int, time string);
 
hive> insert into table t4 SELECT cast(get_json_object(c1, '$.custId') as int), cast(get_json_object(c1, '$.time') as string) from t3;

8.3.5 Usage Notes for Hive Access

The Hive command language interface (CLI) is used for executing queries and provides support for Linux clients. There is no JDBC or ODBC support.

The ore.create function creates Hive tables only as text files. However, Oracle R Advanced Analytics for Hadoop can access Hive tables stored as either text files or sequence files.

You can use the ore.exec function to execute Hive commands from the R console. For a demo, run the hive_sequencefile demo.

Oracle R Advanced Analytics for Hadoop can access tables and views in the default Hive database only. To allow read access to objects in other databases, you must expose them in the default database. For example, you can create views.

Oracle R Advanced Analytics for Hadoop does not have a concept of ordering in Hive. An R frame persisted in Hive might not have the same ordering after it is pulled out of Hive and into memory. Oracle R Advanced Analytics for Hadoop is designed primarily to support data cleanup and filtering of huge HDFS data sets, where ordering is not critical. You might see warning messages when working with unordered Hive frames:

Warning messages:
1: ORE object has no unique key - using random order 
2: ORE object has no unique key - using random order 

To suppress these warnings, set the ore.warn.order option in your R session:

R> options(ore.warn.order = FALSE)

8.3.6 Example: Loading Hive Tables into Oracle R Advanced Analytics for Hadoop

The following example provides an example of loading a Hive table into an R data frame for analysis. It uses these Oracle R Advanced Analytics for Hadoop functions:

hdfs.attach
ore.attach
ore.connect
ore.create
ore.hiveOptions
ore.sync

Example 8-5 Loading a Hive Table

# Connect to HIVE metastore and sync the HIVE input table into the R session.
ore.connect(type="HIVE")
ore.sync(table="datatab")
ore.attach()
 
# The "datatab" object is a Hive table with columns named custid, movieid, activity, and rating.
# Perform filtering to remove missing (NA) values from custid and movieid columns 
# Project out three columns: custid, movieid and rating
t1 <- datatab[!is.na(datatab$custid) &
    !is.na(datatab$movieid) & 
    datatab$activity==1, c("custid","movieid", "rating")]
 
# Set HIVE field delimiters to ','. By default, it is Ctrl+a for text files but
# ORCH 2.0 supports only ',' as a file separator.
ore.hiveOptions(delim=',')

# Create another Hive table called "datatab1" after the transformations above.
ore.create (t1, table="datatab1")
 
# Use the HDFS directory, where the table data for datatab1 is stored, to attach
# it to ORCH framework. By default, this location is "/user/hive/warehouse"
dfs.id <- hdfs.attach("/user/hive/warehouse/datatab1")

# dfs.id can now be used with all hdfs.*, orch.* and hadoop.* APIs of ORCH for further processing and analytics.

8.4 Access to Oracle Database

Oracle R Advanced Analytics for Hadoop provides a basic level of database access. You can move the contents of a database table to HDFS, and move the results of HDFS analytics back to the database.

You can then perform additional analysis on this smaller set of data using a separate product named Oracle R Enterprise. It enables you to perform statistical analysis on database tables, views, and other data objects using the R language. You have transparent access to database objects, including support for Business Intelligence and in-database analytics.

Access to the data stored in an Oracle database is always restricted to the access rights granted by your DBA.

Oracle R Enterprise is included in the Oracle Advanced Analytics option to Oracle Database Enterprise Edition. It is not included in the Oracle Big Data Connectors.

8.4.1 Usage Notes for Oracle Database Access

Oracle R Advanced Analytics for Hadoop uses Sqoop to move data between HDFS and Oracle Database. Sqoop imposes several limitations on Oracle R Advanced Analytics for Hadoop:

  • You cannot import Oracle tables with BINARY_FLOAT or BINARY_DOUBLE columns. As a work-around, you can create a view that casts these columns to NUMBER data type.

  • All column names must be in upper case.

8.4.2 Scenario for Using Oracle R Advanced Analytics for Hadoop with Oracle R Enterprise

The following scenario may help you identify opportunities for using Oracle R Advanced Analytics for Hadoop with Oracle R Enterprise.

Using Oracle R Advanced Analytics for Hadoop, you can look for files that you have access to on HDFS and execute R calculations on data in one such file. You can also upload data stored in text files on your local file system into HDFS for calculations, schedule an R script for execution on the Hadoop cluster using DBMS_SCHEDULER, and download the results into a local file.

Using Oracle R Enterprise, you can open the R interface and connect to Oracle Database to work on the tables and views that are visible based on your database privileges. You can filter out rows, add derived columns, project new columns, and perform visual and statistical analysis.

Again using Oracle R Advanced Analytics for Hadoop, you might deploy a MapReduce job on Hadoop for CPU-intensive calculations written in R. The calculation can use data stored in HDFS or, with Oracle R Enterprise, in an Oracle database. You can return the output of the calculation to an Oracle database and to the R console for visualization or additional processing.

8.5 Oracle R Advanced Analytics for Hadoop Functions

The Oracle R Advanced Analytics for Hadoop functions are described in R Help topics. This section groups them into functional categories and provides brief descriptions.

8.5.1 Native Analytical Functions

The following table describes the native analytic functions.

Table 8-1 Functions for Statistical Analysis

Function Description

orch.cor

Generates a correlation matrix with a Pearson's correlation coefficients.

orch.cov

Generates a covariance matrix.

orch.getXlevels

Creates a list of factor levels that can be used in the xlev argument of a model.matrix call. It is equivalent to the .getXlevels function in the stats package.

orch.glm

Fits and uses generalized linear models on data stored in HDFS.

orch.kmeans

Perform k-means clustering on a data matrix that is stored as a file in HDFS.

orch.lm

Fits a linear model using tall-and-skinny QR (TSQR) factorization and parallel distribution. The function computes the same statistical parameters as the Oracle R Enterprise ore.lm function.

orch.lmf

Fits a low rank matrix factorization model using either the jellyfish algorithm or the Mahout alternating least squares with weighted regularization (ALS-WR) algorithm.

orch.neural

Provides a neural network to model complex, nonlinear relationships between inputs and outputs, or to find patterns in the data.

orch.nmf

Provides the main entry point to create a nonnegative matrix factorization model using the jellyfish algorithm. This function can work on much larger data sets than the R NMF package, because the input does not need to fit into memory.

orch.nmf.NMFalgo

Plugs in to the R NMF package framework as a custom algorithm. This function is used for benchmark testing.

orch.princomp

Analyzes the performance of principal component.

orch.recommend

Computes the top n items to be recommended for each user that has predicted ratings based on the input orch.mahout.lmf.asl model.

orch.sample

Provides the reservoir sampling.

orch.scale

Performs scaling.

8.5.2 Using the Hadoop Distributed File System (HDFS)

The following table describes the functions that execute HDFS commands from within the R environment.

Table 8-2 Functions for Using HDFS

Function Description

hdfs.cd

Sets the default HDFS path.

hdfs.cp

Copies an HDFS file from one location to another.

hdfs.describe

Returns the metadata associated with a file in HDFS.

hdfs.exists

Verifies that a file exists in HDFS.

hdfs.head

Copies a specified number of lines from the beginning of a file in HDFS.

hdfs.id

Converts an HDFS path name to an R dfs.id object.

hdfs.ls

Lists the names of all HDFS directories containing data in the specified path.

hdfs.mkdir

Creates a subdirectory in HDFS relative to the current working directory.

hdfs.mv

Moves an HDFS file from one location to another.

hdfs.parts

Returns the number of parts composing a file in HDFS.

hdfs.pwd

Identifies the current working directory in HDFS.

hdfs.rm

Removes a file or directory from HDFS.

hdfs.rmdir

Deletes a directory in HDFS.

hdfs.root

Returns the HDFS root directory.

hdfs.setroot

Sets the HDFS root directory.

hdfs.size

Returns the size of a file in HDFS.

hdfs.tail

Copies a specified number of lines from the end of a file in HDFS.

8.5.3 Using Apache Hive

The following table describes the functions available in Oracle R Advanced Analytics for Hadoop for use with Hive. .

Table 8-3 Functions for Using Hive

Function Description

hdfs.fromHive

Converts a Hive table to a HDFS identifier in ORCH.

hdfs.toHive

Converts an HDFS object identifier to a Hive table represented by an ore.frame object.

ore.create

Creates a database table from a data.frame or ore.frame object.

ore.drop

Drops a database table or view.

ore.get

Retrieves the specified ore.frame object.

ore.pull

Copies data from a Hive table to an R object.

ore.push

Copies data from an R object to a Hive table.

ore.recode

Replaces the values in an ore.vector object.

Related Topics

8.5.4 Using Aggregate Functions in Hive

The following table describes the aggregate functions from the OREstats package that Oracle R Advanced Analytics for Hadoop supports for use with Hive data.

Table 8-4 Oracle R Enterprise Aggregate Functions

Function Description

aggregate

Splits the data into subsets and computes summary statistics for each subset.

fivenum

Returns Tukey's five-number summary (minimum, lower hinge, median, upper hinge, and maximum) for the input data.

IQR

Calculates an interquartile range.

median

Calculates a sample median.

quantile

Generates sample quantiles that correspond to the specified probabilities.

sd

Calculates the standard deviation.

varFoot 1

Calculates the variance.

Footnote 1

For vectors only

8.5.5 Making Database Connections

The following table describes the functions for establishing a connection to Oracle Database.

Table 8-5 Functions for Using Oracle Database

Function Description

orch.connect

Establishes a connection to Oracle Database.

orch.connected

Checks whether Oracle R Advanced Analytics for Hadoop is connected to Oracle Database.

orch.dbcon

Returns a connection object for the current connection to Oracle Database, excluding the authentication credentials.

orch.dbinfo

Displays information about the current connection.

orch.disconnect

Disconnects the local R session from Oracle Database.

orch.reconnect

Reconnects to Oracle Database with the credentials previously returned by orch.disconnect.

8.5.6 Copying Data and Working with HDFS Files

The following table describes the functions for copying data between platforms, including R data frames, HDFS files, local files, and tables in an Oracle database.

Table 8-6 Functions for Copying Data

Function Description

hdfs.attach

Copies data from an unstructured data file in HDFS into the R framework. By default, data files in HDFS are not visible to the connector. However, if you know the name of the data file, you can use this function to attach it to the Oracle R Advanced Analytics for Hadoop name space.

hdfs.download

Copies a file from HDFS to the local file system.

hdfs.get

Copies data from HDFS into a data frame in the local R environment. All metadata is extracted and all attributes, such as column names and data types, are restored if the data originated in an R environment. Otherwise, generic attributes like val1 and val2 are assigned.

hdfs.pull

Copies data from HDFS into an Oracle database. This operation requires authentication by Oracle Database. See orch.connect.

hdfs.push

Copies data from an Oracle database to HDFS. This operation requires authentication by Oracle Database. See orch.connect.

hdfs.put

Copies data from an R in-memory object (data.frame) to HDFS. All data attributes, like column names and data types, are stored as metadata with the data.

hdfs.sample

Copies a random sample of data from a Hadoop file into an R in-memory object. Use this function to copy a small sample of the original HDFS data for developing the R calculation that you ultimately want to execute on the entire HDFS data set on the Hadoop cluster.

hdfs.upload

Copies a file from the local file system into HDFS.

is.hdfs.id

Indicates whether an R object contains a valid HDFS file identifier.

8.5.7 Converting to R Data Types

The following table describes functions for converting and testing data types. The Oracle R Enterprise OREbase package provides these functions.

Table 8-7 Functions for Converting and Testing Data Types

Function Description

as.ore

Coerces an in-memory R object to an ORE object.

as.ore.character

Coerces an in-memory R object to an ORE character object.

as.ore.date

Coerces an in-memory R object to an ORE date object.

as.ore.datetime

Coerces an in-memory R object to an ORE datetime object.

as.ore.difftime

Coerces an in-memory R object to an ORE difftime object.

as.ore.factor

Coerces an in-memory R object to an ORE factor object.

as.ore.frame

Coerces an in-memory R object to an ORE frame object.

as.ore.integer

Coerces an in-memory R object to an ORE integer object.

as.ore.list

Coerces an in-memory R object to an ORE list object.

as.ore.logical

Coerces an in-memory R object to an ORE logical object.

as.ore.matrix

Coerces an in-memory R object to an ORE matrix object.

as.ore.numeric

Coerces an in-memory R object to an ORE numeric object.

as.ore.object

Coerces an in-memory R object to an ORE object.

as.ore.vector

Coerces an in-memory R object to an ORE vector object.

is.ore

Tests whether the specified value is an object of a particular Oracle R Enterprise class.

is.ore.character

Tests whether the specified value is a character.

is.ore.date

Tests whether the specified value is a date.

is.ore.datetime

Tests whether the specified value is a datetime type.

is.ore.difftime

Tests whether the specified value is a difftime type.

is.ore.factor

Tests whether the specified value is a factor.

is.ore.frame

Tests whether the specified value is a frame.

is.ore.integer

Tests whether the specified value is an integer.

is.ore.list

Tests whether the specified value is a list.

is.ore.logical

Tests whether the specified value is a logical type.

is.ore.matrix

Tests whether the specified value is a matrix.

is.ore.numeric

Tests whether the specified value is numeric.

is.ore.object

Tests whether the specified value is an object.

is.ore.vector

Tests whether the specified value is a vector.

8.5.8 Using MapReduce

The following table describes functions that you use when creating and running MapReduce programs.

Table 8-8 Functions for Using MapReduce

Function Description

hadoop.exec

Starts the Hadoop engine and sends the mapper, reducer, and combiner R functions for execution. You must load the data into HDFS first.

hadoop.jobs

Lists the running jobs, so that you can evaluate the current load on the Hadoop cluster.

hadoop.run

Starts the Hadoop engine and sends the mapper, reducer, and combiner R functions for execution. If the data is not already stored in HDFS, then hadoop.run first copies the data there.

orch.dryrun

Switches the execution platform between the local host and the Hadoop cluster. No changes in the R code are required for a dry run.

orch.export

Makes R objects from a user's local R session available in the Hadoop execution environment, so that they can be referenced in MapReduce jobs.

orch.keyval

Outputs key-value pairs in a MapReduce job.

orch.keyvals

Outputs a set of key-value pairs in a MapReduce job.

orch.pack

Compresses one or more in-memory R objects that the mappers or reducers must write as the values in key-value pairs.

orch.tempPath

Sets the path where temporary data is stored.

orch.unpack

Restores the R objects that were compressed with a previous call to orch.pack.

orch.create.parttab

Enables partitioned Hive tables to be used with ORCH MapReduce framework.

8.5.9 Debugging Scripts

The following table lists the functions available to help you debug your R program scripts.

Table 8-9 Functions for Debugging Scripts

Function Description

orch.dbg.lasterr

Returns the last error message.

orch.dbg.off

Turns off debugging mode.

orch.dbg.on

Turns on debugging mode, which prints out the interactions between Hadoop and Oracle R Advanced Analytics for Hadoop including the R commands.

orch.dbg.output

Directs the output from the debugger.

orch.version

Identifies the version of the ORCH package.

orch.debug

Enables R style debugging of MapReduce R scripts.

8.6 Demos of Oracle R Advanced Analytics for Hadoop Functions

Oracle R Advanced Analytics for Hadoop provides an extensive set of demos, which you can access in the same way as any other R demos.

The demo function lists the functions available in ORCH:

R>  demo(package="ORCH")
Demos in package 'ORCH':
 
hdfs_cpmv               ORCH's copy and move APIs
hdfs_datatrans          ORCH's HDFS data transfer APIs
hdfs_dir                ORCH's HDFS directory manipulation APIs
hdfs_putget             ORCH's get and put API usage
hive_aggregate          Aggregation in HIVE
hive_analysis           Basic analysis & data processing operations
hive_basic              Basic connectivity to HIVE storage
hive_binning            Binning logic
hive_columnfns          Column function
hive_nulls              Handling of NULL in SQL vs. NA in R
     .
     .
     .

To run a demo from this list, use this syntax:

demo("demo_name", package="ORCH")

For example, this package runs the Hive binning demo:

R> demo("hive_binning", package = "ORCH")
 
 
 demo('hive_binning', package = 'ORCH')
 
 
        demo(hive_binning)
        ---- ~~~~~~~~~~~~
 
> #
> #     ORACLE R CONNECTOR FOR HADOOP DEMOS
> #
> #     Name: hive_binning.R
> #     Description: Demonstrates binning logic in R
> #
> #
     .
     .
     .

If an error occurs, exit from R without saving the workspace image and start a new session. You should also delete the temporary files created in both the local file system and the HDFS file system:

# rm -r /tmp/orch*
# hdfs dfs -rm -r /tmp/orch*

Upon completion run these:

  1. hadoop.exec to cleanup or remove all empty part files and Hadoop log files.

  2. hadoop.run to allow overwriting of HDFS objects with the same name.

8.7 Security Notes for Oracle R Advanced Analytics for Hadoop

Oracle R Advanced Analytics for Hadoop can invoke the Sqoop utility to connect to Oracle Database either to extract data or to store results.

Sqoop is a command-line utility for Hadoop that imports and exports data between HDFS or Hive and structured databases. The name Sqoop comes from “SQL to Hadoop." The following explains how Oracle R Advanced Analytics for Hadoop stores a database user password and sends it to Sqoop.

Oracle R Advanced Analytics for Hadoop stores a user password only when the user establishes the database connection in a mode that does not require reentering the password each time. The password is stored encrypted in memory. See the Help topic for orch.connect.

Oracle R Advanced Analytics for Hadoop generates a configuration file for Sqoop and uses it to invoke Sqoop locally. The file contains the user's database password obtained by either prompting the user or from the encrypted in-memory representation. The file has local user access permissions only. The file is created, the permissions are set explicitly, and then the file is open for writing and filled with data.

Sqoop uses the configuration file to generate custom JAR files dynamically for the specific database job and passes the JAR files to the Hadoop client software. The password is stored inside the compiled JAR file; it is not stored in plain text.

The JAR file is transferred to the Hadoop cluster over a network connection. The network connection and the transfer protocol are specific to Hadoop, such as port 5900.

The configuration file is deleted after Sqoop finishes compiling its JAR files and starts its own Hadoop jobs.

8.8 Third-Party Licenses for ORAAH

Oracle R Advanced Analytics for Hadoop depends on the following third-party products:

8.8.1 ANTLR 4.7

Copyright (c) 2015 Terence Parr, Sam Harwell

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

8.8.2 Scala 12.11.11

http://www.scala-lang.org/license.html

Copyright (c) 2002-2013 EPFL
 Copyright (c) 2011-2013 Typesafe, Inc.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
• Neither the name of the EPFL nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
 
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

8.8.3 Scala 12.11.12

https://github.com/scala/scala/blob/2.11.x/doc/LICENSE.md

Scala License
Copyright (c) 2002-2018 EPFL
Copyright (c) 2011-2018 Lightbend, Inc.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the EPFL nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The following third party code may be included in the distribution as part of Scala:
Apache 2.0  License
This license is used by the following third-party libraries:
-- jansi
-- akka
-- ant
BSD License
This license is used by the following third-party libraries:
-- jline
BSD 3-Clause License
This license is used by the following third-party libraries:
-- asm
MIT License
This license is used by the following third-party libraries:
-- jquery
-- jquery-ui
-- jquery-layout
-- sizzle
-- tools tooltip
Public Domain
The following libraries are freely available in the public domain:
-- forkjoin
____________
jline BSD License
---------------------
https://github.com/jline/jline3/blob/master/LICENSE.txt

Copyright (c) 2002-2018, the original author or authors.
All rights reserved.

http://www.opensource.org/licenses/bsd-license.php

Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the following
conditions are met:

Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with
the distribution.

Neither the name of JLine nor the names of its contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.

ASM
https://asm.ow2.io/license.html

ASM: a very small and fast Java bytecode manipulation framework
Copyright (c) 2000-2011 INRIA, France Telecom
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
  notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
  notice, this list of conditions and the following disclaimer in the
  documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holders nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.

jquery, sizzle, tooltip MIT License
------------------------------------------
https://github.com/jquery/jquery/blob/master/LICENSE.txt
https://github.com/jquery/sizzle/blob/master/LICENSE.txt
Copyright JS Foundation and other contributors, https://js.foundation/

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

_______________
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS



8.8.4 MPICH 3.3a2

 COPYRIGHT
   3 
   4 The following is a notice of limited availability of the code, and disclaimer
   5 which must be included in the prologue of the code and in all source listings
   6 of the code.
   7 
   8 Copyright Notice
   9  + 2002 University of Chicago
  10 
  11 Permission is hereby granted to use, reproduce, prepare derivative works, and
  12 to redistribute to others.  This software was authored by:
  13 
  14 Mathematics and Computer Science Division
  15 Argonne National Laboratory, Argonne IL 60439
  16 
  17 (and)
  18 
  19 Department of Computer Science
  20 University of Illinois at Urbana-Champaign
  21 
  22 
  23                               GOVERNMENT LICENSE
  24 
  25 Portions of this material resulted from work developed under a U.S.
  26 Government Contract and are subject to the following license: the Government
  27 is granted for itself and others acting on its behalf a paid-up, nonexclusive,
  28 irrevocable worldwide license in this computer software to reproduce, prepare
  29 derivative works, and perform publicly and display publicly.
  30 
  31                                   DISCLAIMER
  32 
  33 This computer code material was prepared, in part, as an account of work
  34 sponsored by an agency of the United States Government.  Neither the United
  35 States, nor the University of Chicago, nor any of their employees, makes any
  36 warranty express or implied, or assumes any legal liability or responsibility
  37 for the accuracy, completeness, or usefulness of any information, apparatus,
  38 product, or process disclosed, or represents that its use would not infringe
  39 privately owned rights.