Querying HBase using Spark

Query HBase using Spark.

For more information and examples, see HBase Example Using HBase Spark Connector.

Provide the Spark user to perform CRUD operation in HBase using "hbase" user:

sudo -u hbase bash
kinit -kt /etc/security/keytabs/hbase.headless.keytab <Spark-user>  
hbase shell
grant 'spark', 'RWXCA'
exit

Sign-in to Ranger.
Select the HBase service.
Add or update policy to give access "create,read,write,execute" to the Spark user.

Sign-in with Spark user account and create a table in HBase:

sudo su spark  
(kinit with spark if required)
hbase shell
hbase(main):001:0> create 'person', 'p', 'c'

Start spark-shell:

spark-shell --jars
      /usr/lib/hbase/hbase-spark.jar,/usr/lib/hbase/hbase-spark-protocol-shaded.jar,/usr/lib/hbase/*
      --files /etc/hbase/conf/hbase-site.xml --conf
      spark.driver.extraClassPath=/etc/hbase/conf

Insert and read data using spark-shell:

Inserting data:

val sql = spark.sqlContext
 
import java.sql.Date
 
case class Person(name: String,
email: String,
birthDate: Date,
height: Float)
 
var personDS = Seq(
Person("alice", "alice@alice.com", Date.valueOf("2000-01-01"), 4.5f),
Person("bob", "bob@bob.com", Date.valueOf("2001-10-17"), 5.1f)
).toDS
 
personDS.write.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping",
"name STRING :key, email STRING c:email, " +
"birthDate DATE p:birthDate, height FLOAT p:height")
.option("hbase.table", "person")
.option("hbase.spark.use.hbasecontext", false)
.save()

Results:

shell> scan 'person'
ROW       COLUMN+CELL
 alice    column=c:email, timestamp=1568723598292, value=alice@alice.com
 alice    column=p:birthDate, timestamp=1568723598292, value=\x00\x00\x00\xDCl\x87 \x00
 alice    column=p:height, timestamp=1568723598292, value=@\x90\x00\x00
 bob      column=c:email, timestamp=1568723598521, value=bob@bob.com
 bob      column=p:birthDate, timestamp=1568723598521, value=\x00\x00\x00\xE9\x99u\x95\x80
 bob      column=p:height, timestamp=1568723598521, value=@\xA333
2 row(s)

Reading data back:

val sql = spark.sqlContext

val df = sql.read.format("org.apache.hadoop.hbase.spark")
 .option("hbase.columns.mapping",
   "name STRING :key, email STRING c:email, " +
     "birthDate DATE p:birthDate, height FLOAT p:height")
 .option("hbase.table", "person")
 .option("hbase.spark.use.hbasecontext", false)
 .load()
df.createOrReplaceTempView("personView")

val results = sql.sql("SELECT * FROM personView WHERE name = 'alice'")
results.show()

Oracle Cloud Infrastructure Documentation

Querying HBase using Spark