Solr Adapter

This adapter provides functions to create full-text indexes and load them into Apache Solr servers. These functions call the Solr org.apache.solr.hadoop.MapReduceIndexerTool at run time to generate a full-text index on HDFS and optionally merge it into Solr servers. You can declare and use multiple custom put functions supplied by this adapter and the built-in put function within a single query. For example, you can load data into different Solr collections or into different Solr clusters.

This adapter is described in the following topics:

Prerequisites for Using the Solr Adapter
Built-in Functions for Loading Data into Solr Servers
Custom Functions for Loading Data into Solr Servers
Examples of Solr Adapter Functions
Solr Adapter Configuration Properties

Prerequisites for Using the Solr Adapter

The first time that you use the Solr adapter, ensure that Solr is installed and configured on your Hadoop cluster as described in "Installing Oracle XQuery for Hadoop".

Configuration Settings

Your Oracle XQuery for Hadoop query must use the following configuration properties or the equivalent annotation:

oracle.hadoop.xquery.solr.loader.zk-host
oracle.hadoop.xquery.solr.loader.collection

If the index is loaded into a live set of Solr servers, then this configuration property or the equivalent annotation is also required:

oracle.hadoop.xquery.solr.loader.go-live

You can set the configuration properties using either the -D or -conf options in the hadoop command when you run the query. See "Running Queries" and "Solr Adapter Configuration Properties"

Example Query Using the Solr Adapter

This example sets OXH_SOLR_MR_HOME and uses the hadoop -D option in a query to set the configuration properties:

$ export OXH_SOLR_MR_HOME=/usr/lib/solr/contrib/mr 
$ hadoop jar $OXH_HOME/lib/oxh.jar -D oracle.hadoop.xquery.solr.loader.zk-host=/solr -D oracle.hadoop.xquery.solr.loader.collection=collection1 -D oracle.hadoop.xquery.solr.loader.go-live=true  ./myquery.xq -output ./myoutput

Built-in Functions for Loading Data into Solr Servers

To use the built-in functions in your query, you must import the Solr module as follows:

import module "oxh:solr";

The Solr module contains the following functions:

solr:put

The solr prefix is bound to the oxh:solr namespace by default.

solr:put

Writes a single document to the Solr index.

This document XML format is specified by Solr at

https://wiki.apache.org/solr/UpdateXmlMessages

Signature

declare %solr:put function
   solr:put($value as element(doc)) external;

Parameters

$value: A single XML element named doc, which contains one or more field elements, as shown here:

<doc>
<field name="field_name_1">field_value_1</field>
     .
     .
     .
<field name="field_name_N">field_value_N</field>
</doc>

Returns

A generated index that is written into the output_dir/solr-put directory, where output_dir is the query output directory

Custom Functions for Loading Data into Solr Servers

You can use the following annotations to define functions that generate full-text indexes and load them into Solr.

Signature

Custom functions for generating Solr indexes must have the following signature:

declare %solr:put [additional annotations] 
   function local:myFunctionName($value as node()) external;

Annotations

%solr:put

Declares the solr put function. Required.

%solr:file(directory_name)

Name of the subdirectory under the query output directory where the index files will be written. Optional, the default value is the function local name.

%solr-property:property_name(value)

Controls various aspects of index generation. You can specify multiple %solr-property annotations.

These annotations correspond to the command-line options of org.apache.solr.hadoop.MapReduceIndexerTool. Each MapReduceIndexerTool? option has an equivalent Oracle XQuery for Hadoop configuration property and a %solr-property annotation. Annotations take precedence over configuration properties. See "Solr Adapter Configuration Properties" for more information about supported configuration properties and the corresponding annotations.

Examples of Solr Adapter Functions

Example 1 Using the Built-in solr:put Function

This example uses the following HDFS text file. The file contains user profile information such as user ID, full name, and age, separated by colons (:).

mydata/users.txt
john:John Doe:45 
kelly:Kelly Johnson:32
laura:Laura Smith: 
phil:Phil Johnson:27

The first query creates a full-text index searchable by name.

import module "oxh:text";
import module "oxh:solr";
for $line in text:collection("mydata/users.txt") 
let $split := fn:tokenize($line, ":") 
let $id := $split[1]
let $name := $split[2]
return solr:put(
<doc>
<field name="id">{ $id }</field>
<field name="name">{ $name }</field>
</doc>
)

The second query accomplishes the same result, but uses a custom put function. It also defines all configuration parameters by using function annotations. Thus, setting configuration properties is not required when running this query.

import module "oxh:text";
declare %solr:put %solr-property:go-live %solr-property:zk-host("/solr") %solr-property:collection("collection1") 
function local:my-solr-put($doc as element(doc)) external;
for $line in text:collection("mydata/users.txt") 
let $split := fn:tokenize($line, ":") 
let $id := $split[1]
let $name := $split[2]
return local:my-solr-put(
<doc>
<field name="id">{ $id }</field>
<field name="name">{ $name }</field>
</doc>
)

Solr Adapter Configuration Properties

The Solr adapter configuration properties correspond to the Solr MapReduceIndexerTool options.

MapReduceIndexerTool is a MapReduce batch job driver that creates Solr index shards from input files, and writes the indexes into HDFS. It also supports merging the output shards into live Solr servers, typically a SolrCloud.

You can specify these properties with the generic -conf and -D hadoop command-line options in Oracle XQuery for Hadoop. Properties specified using this method apply to all Solr adapter put functions in your query. See "Running Queries" and especially "Generic Options" for more information about the hadoop command-line options.

Alternatively, you can specify these properties as Solr adapter put function annotations with the %solr-property prefix. These annotations are identified in the property descriptions. Annotations apply only to the particular Solr adapter put function that contains them in its declaration.