Consultando HBase usando o Spark
-
Forneça ao usuário do Spark para executar a operação CRUD em HBase usando o usuário "hbase":
sudo -u hbase bash kinit -kt /etc/security/keytabs/hbase.headless.keytab <Spark-user> hbase shell grant 'spark', 'RWXCA' exit
- Entre para o Ranger.
- Selecione o serviço HBase.
- Adicione ou atualize a política para conceder acesso "create,read,write,execute" ao usuário do Spark.
-
Acesse com a conta de usuário do Spark e crie uma tabela em HBase:
sudo su spark (kinit with spark if required) hbase shell hbase(main):001:0> create 'person', 'p', 'c'
-
Inicie
spark-shell
:spark-shell --jars /usr/lib/hbase/hbase-spark.jar,/usr/lib/hbase/hbase-spark-protocol-shaded.jar,/usr/lib/hbase/* --files /etc/hbase/conf/hbase-site.xml --conf spark.driver.extraClassPath=/etc/hbase/conf
-
Insira e leia dados usando
spark-shell
:- Inserção de dados:
val sql = spark.sqlContext import java.sql.Date case class Person(name: String, email: String, birthDate: Date, height: Float) var personDS = Seq( Person("alice", "alice@alice.com", Date.valueOf("2000-01-01"), 4.5f), Person("bob", "bob@bob.com", Date.valueOf("2001-10-17"), 5.1f) ).toDS personDS.write.format("org.apache.hadoop.hbase.spark") .option("hbase.columns.mapping", "name STRING :key, email STRING c:email, " + "birthDate DATE p:birthDate, height FLOAT p:height") .option("hbase.table", "person") .option("hbase.spark.use.hbasecontext", false) .save()
Resultados:
shell> scan 'person' ROW COLUMN+CELL alice column=c:email, timestamp=1568723598292, value=alice@alice.com alice column=p:birthDate, timestamp=1568723598292, value=\x00\x00\x00\xDCl\x87 \x00 alice column=p:height, timestamp=1568723598292, value=@\x90\x00\x00 bob column=c:email, timestamp=1568723598521, value=bob@bob.com bob column=p:birthDate, timestamp=1568723598521, value=\x00\x00\x00\xE9\x99u\x95\x80 bob column=p:height, timestamp=1568723598521, value=@\xA333 2 row(s)
- Lendo dados de volta:
val sql = spark.sqlContext val df = sql.read.format("org.apache.hadoop.hbase.spark") .option("hbase.columns.mapping", "name STRING :key, email STRING c:email, " + "birthDate DATE p:birthDate, height FLOAT p:height") .option("hbase.table", "person") .option("hbase.spark.use.hbasecontext", false) .load() df.createOrReplaceTempView("personView") val results = sql.sql("SELECT * FROM personView WHERE name = 'alice'") results.show()
- Inserção de dados: