一.说明
      通常来说,大数据处理hadoop能够处理绝大部分场景,但是有些需求,比如实时的道路交通情况.实时的订单情况等等,这类需要实时运算处理的,hadoop处理起来就相对麻烦.
      处理实时数据的框架很多,这里采用apache storm,队列的话通常采用kafka,但是因为现有的队列是RabbitMq集群,为了这个在单独搭建一套kafka集群有点浪费资源,因此采用RabbitMQ作为队列.而日志的采集依然采用flume.

二.Storm环境搭建
Storm的工作原理请自行百度,
Strom需要zookeeper,首先安装zookeeper,前文有安装描述.此处略
解压Strom的包.

cd storm/conf
 vim storm.yaml

 单机配置如下:
 

  storm.zookeeper.servers: 

 

       - "192.168.130.132" 

 
 
 
 

  nimbus.host: "192.168.130.132" 

 
 
 
 

  storm.local.dir: "/opt/stormDir" 

 
 
 
 
 
 
 

  supervisor.slots.ports: 

 

      - 6700 

 

      - 6701 

 

      - 6702 

 

      - 6703 

 
 
 
 
ui.port: 8089


保存配置


按顺序启动

./bin/storm nimbus & 
 
 
./bin/storm supervisor & 
 
./bin/ 
storm ui &


jps查看是否启动成功.



拓扑的部署方式参考


http://www.blogjava.net/paulwong/archive/2013/09/11/403942.html

三.从RabbitMq->Storm

1.因为Storm官方不支持RabbitMQ,好在GIT上有前辈们的源码.搞定Storm-rabbitmq.jar(在下方的百度云盘中有) 


2.创建scheme


public class RabbitMqScheme implements Scheme{ 

 
 
 
 
@Override 

 
public List<Object> deserialize(byte[] bytes) { 

 
List objs = new ArrayList(); 

 

 

 

 

 

 

 

 

 
} 

 
 
 
 
@Override 

 
public Fields getOutputFields() { 

 
return new Fields("str"); 

 
}




在拓扑的main方法里面,创建拓扑配置之前加入

RabbitMqScheme scheme = new RabbitMqScheme(); 
 
 
 
IRichSpout  spout = new RabbitMQSpout(scheme); 
 
 
 

 
 
ConnectionConfig connectionConfig = new ConnectionConfig("192.168.7.79", 5672, "guest", "guest", "/", 10); // host, port, username, password, virtualHost, heartBeat  
 
 
 
ConsumerConfig spoutConfig = new ConsumerConfigBuilder().connection(connectionConfig)



好了,接下来在自己的bolt里面写逻辑就行了.


四.从flume-RabbitMQ


1.Flume本身不支持RabbitMQ,同样在GIT上找到前辈们的杰作,不过由于支持的rabbitmq的版本比较老,所以在我调试后,也支持了云盘提供的版本.但是目前只支持exchange类型为direct的队列,即routing-key=queueName的情况.



2.Flume配置如下


a1.sources =  r2 

 

  a1.sinks =  k3 

 

  a1.channels = c3 

 


  a1.sources.r2.type = spooldir 

 

  a1.sources.r2.spoolDir = /var/log/flume_spoolDir_for_rabbitmq 

 

  a1.sources.r2.deletePolicy=immediate 

 

  a1.sources.r2.basenameHeader=true 

 

  a1.sources.r2.channels=c3 

 
 
 
 

  a1.channels.c3.type = memory 

 

  a1.channels.c3.capacity = 1000 

 

  a1.channels.c3.transactionCapacity = 200 

 
 
 
 
 
 
 

  a1.sinks.k3.type = com.aweber.flume.sink.rabbitmq.RabbitMQSink 

 

  a1.sinks.k3.host = 192.168.7.79 

 

  a1.sinks.k3.port = 5672 

 

  a1.sinks.k3.virtual-host = / 

 

  a1.sinks.k3.username = guest 

 

  a1.sinks.k3.password = guest 

 

  a1.sinks.k3.exchange = mq-exchange 

 

  a1.sinks.k3.routing-key = test_aaa 

 

  #a1.sinks.k3.publisher-confirms = true 

 
 
 
 

  a1.sinks.k3.channel = c3




配置完毕后保存,往flume监控的文件夹下丢日志文件,可以看到被一行行写入rabbitmq中.




至此,日志从flume->rabbitMQ->storm已经打通.具体示例以后有空会发出来 

 


需要的包:


http://pan.baidu.com/s/1nvzUxi5