文章目录
- 简介
- nc
- streaming程序
简介
对于spark steaming,相当于小批次的spark rdd的计算,只是不断的启动driver端进行计算,然后driver启动excuter
nc
安装nc
yum install nc -y
[外链图片转存失败(img-u7SDXy3p-1564853035562)(1564849201547.png)]
命令为
nc -lk 8888
注意: 为什么采用local[2],开启两个线程,如图 需要receiver和calcuater,一个接受一个计算
streaming程序
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.{Milliseconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
object StreamingWorldCount {
def main(args: Array[String]): Unit = {
val conf= new SparkConf().setMaster("local[2]").setAppName("StreamingWorldCount")
val sc = new SparkContext(conf)
//streamingContext相当于对sparkContext进行了一次包装,有了Streaming context就相当于对spark创建了抽象DSteaming
val context = new StreamingContext(sc,Milliseconds(3000))
val lines: ReceiverInputDStream[String] = context.socketTextStream("192.168.18.100",8888)
lines.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
//优雅的开始
context.start()
//优雅的结束
context.awaitTermination()
}
}
pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
 
 
                     
            
        













 
                    

 
                 
                    