1)spark把数据写入到hbase需要用到:PairRddFunctions的saveAsHadoopDataset方法,这里用到了 implicit conversion,需要我们引入
import org.apache.spark.SparkContext._
2)spark写入hbase,实质是借用了org.apache.hadoop.hbase.mapreduce.TableInputFormat这个对象,用其内部的recorderWriter将数据写入hbase
同时,也借用了hadoop的JobConf,配置和写MR的配置方式一样
3)请看下面代码,这里使用sparksql从hive里面读出数据,经过处理,写入到hbase
//创建jobConf
val conf = HBaseConfiguration.create()
val jobConf = new JobConf(conf)
jobConf.setOutputFormat(classOf[TableOutputFormat])
jobConf.set(TableOutputFormat.OUTPUT_TABLE,"test")
//创建hiveContext
val sparkConf = new SparkConf().setAppName("test")
val sc = new SparkContext(sparkConf)
@transient val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.setConf("spark.sql.shuffle.partitions","3")
//保存到hbase
val rdd = sqlContext.sql("select C1,C2,C3 from test")
.map(row => {
val c1 = row(0).asInstanceOf[String]
val c2 = row(1).asInstanceOf[String]
val c3 = row(2).asInstanceOf[String]
val p = new Put(Bytes.toBytes(c1))
p.add(Bytes.toBytes("f"),Bytes.toBytes("c2"),Bytes.toBytes(c2))
p.add(Bytes.toBytes("f"),Bytes.toBytes("c3"),Bytes.toBytes(c3))
(new ImmutableBytesWritable,p)
}).saveAsHadoopDataset(jobConf)