通常,对于hadoop或者storm这种任务类型的程序,我们都希望能够在本地进行一次调试,然后再提交到集群上跑任务。

storm和hadoop类似,有本地模式和集群模式。相比hadoop而言,storm的本地模式更加简单,不需要在本地(windows环境)安装任何storm的软件或者工具等(什么都不需要额外安装,只需要maven引入storm的jar即可)。本文就是如何在windows上调试简单storm程序。

1、一个简单的wordcount程序:

1)建立maven项目,pom.xml

<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.10.0</version>
<scope>provided</scope>
</dependency>


2)RandomSentenceSpout:(相当于数据生产者)

package cn.edu.nuc.StormTest.wordcount;

import java.util.Map;
import java.util.Random;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;

public class RandomSentenceSpout extends BaseRichSpout{
/**
*
*/
private static final long serialVersionUID = 1L;
SpoutOutputCollector _collector;
Random _rand;

@Override
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
_collector = collector;
_rand = new Random();
}

@Override
public void nextTuple() {

// 睡眠一段时间后再产生一个数据
Utils.sleep(100);

// 句子数组
String[] sentences = new String[] { "the cow jumped over the moon",
"an apple a day keeps the doctor away",
"four score and seven years ago",
"snow white and the seven dwarfs",
"i am at two with nature" };

// 随机选择一个句子
String sentence = sentences[_rand.nextInt(sentences.length)];

// 发射该句子给Bolt
_collector.emit(new Values(sentence));
}

// 确认函数
@Override
public void ack(Object id) {
}

// 处理失败的时候调用
@Override
public void fail(Object id) {
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// 定义一个字段word
declarer.declare(new Fields("word"));
}
}


3)SplitSentenceBolt:(这里的bolt相当于mapreduce中的map函数)

package cn.edu.nuc.StormTest.wordcount;

import java.util.StringTokenizer;

import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;

public class SplitSentenceBolt extends BaseBasicBolt{
/**
*
*/
private static final long serialVersionUID = 1L;

@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
// 接收到一个句子
String sentence = tuple.getString(0);
// 把句子切割为单词
StringTokenizer iter = new StringTokenizer(sentence);
// 发送每一个单词
while (iter.hasMoreElements()) {
collector.emit(new Values(iter.nextToken()));
}
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// 定义一个字段
declarer.declare(new Fields("word"));
}
}


4)WordCountBolt:(这里的bolt相当于mapreduce中的reduce函数)

package cn.edu.nuc.StormTest.wordcount;

import java.util.HashMap;
import java.util.Map;

import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;

public class WordCountBolt extends BaseBasicBolt{
/**
*
*/
private static final long serialVersionUID = 1L;
Map<String, Integer> counts = new HashMap<String, Integer>();

@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
// 接收一个单词
String word = tuple.getString(0);
// 获取该单词对应的计数
Integer count = counts.get(word);
if (count == null)
count = 0;
// 计数增加
count++;
// 将单词和对应的计数加入map中
counts.put(word, count);
System.out.println("hello word!");
System.out.println(word + " " + count);
// 发送单词和计数(分别对应字段word和count)
collector.emit(new Values(word, count));
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// 定义两个字段word和count
declarer.declare(new Fields("word", "count"));
}
}


5)TopoMain:(任务提交入口类,提供cluster和Local两种运行模式,在本地调试,可以使用local模式)

package cn.edu.nuc.StormTest.wordcount;

import cn.edu.nuc.StormTest.WordCountTopolopgyAllInJava.WordCount;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
import backtype.storm.utils.Utils;

public class TopoMain {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout());
builder.setBolt("split", new SplitSentenceBolt()).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split",new Fields("word"));
Config conf = new Config();
conf.setDebug(false);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
} else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wordcount", conf, builder.createTopology());
Utils.sleep(3000);
cluster.killTopology("wordcount");
cluster.shutdown();
}
}
}


开发完毕后,在eclipse点击运行,可以看到:

7013 [Thread-34-count] INFO  b.s.d.executor - Prepared bolt count:(10)
7021 [Thread-14-count] INFO b.s.d.executor - Preparing bolt count:(2)
7021 [Thread-36-split] INFO b.s.d.executor - Preparing bolt split:(14)
7022 [Thread-36-split] INFO b.s.d.executor - Prepared bolt split:(14)
7022 [Thread-14-count] INFO b.s.d.executor - Prepared bolt count:(2)
hello word!
the 1
hello word!
cow 1
hello word!
jumped 1
hello word!
the 2
hello word!
over 1
hello word!
moon 1


2、上面使用的是继承方式,下面用接口的方式:

1)WordRead:(spout)

package cn.edu.nuc.StormTest.wordcount1;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Map;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.IRichSpout;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;

public class WordReader implements IRichSpout {
private static final long serialVersionUID = 1L;
private SpoutOutputCollector collector;
private FileReader fileReader;
private boolean completed = false;

public boolean isDistributed() {
return false;
}
/**
* 这是第一个方法,里面接收了三个参数,第一个是创建Topology时的配置,
* 第二个是所有的Topology数据,第三个是用来把Spout的数据发射给bolt
* **/
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
try {
//获取创建Topology时指定的要读取的文件路径
this.fileReader = new FileReader(conf.get("wordsFile").toString());
} catch (FileNotFoundException e) {
throw new RuntimeException("Error reading file ["
+ conf.get("wordFile") + "]");
}
//初始化发射器
this.collector = collector;

}
/**
* 这是Spout最主要的方法,在这里我们读取文本文件,并把它的每一行发射出去(给bolt)
* 这个方法会不断被调用,为了降低它对CPU的消耗,当任务完成时让它sleep一下
* **/
public void nextTuple() {
if (completed) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// Do nothing
}
return;
}
String str;
// Open the reader
BufferedReader reader = new BufferedReader(fileReader);
try {
// Read all lines
while ((str = reader.readLine()) != null) {
/**
* 发射每一行,Values是一个ArrayList的实现
*/
this.collector.emit(new Values(str), str);
}
} catch (Exception e) {
throw new RuntimeException("Error reading tuple", e);
} finally {
completed = true;
}

}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));

}
public void close() {
// TODO Auto-generated method stub
}

public void activate() {
// TODO Auto-generated method stub

}
public void deactivate() {
// TODO Auto-generated method stub

}
public void ack(Object msgId) {
System.out.println("OK:" + msgId);
}
public void fail(Object msgId) {
System.out.println("FAIL:" + msgId);

}
public Map<String, Object> getComponentConfiguration() {
// TODO Auto-generated method stub
return null;
}
}

2)WordNormalizer:(bolt,相当于map)

package cn.edu.nuc.StormTest.wordcount1;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.IRichBolt;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;

public class WordNormalizer implements IRichBolt{
/**
*
*/
private static final long serialVersionUID = 1L;
private OutputCollector collector;

public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
}

/**这是bolt中最重要的方法,每当接收到一个tuple时,此方法便被调用
* 这个方法的作用就是把文本文件中的每一行切分成一个个单词,并把这些单词发射出去(给下一个bolt处理)
* **/
public void execute(Tuple input) {
String sentence = input.getString(0);
String[] words = sentence.split(" ");
for (String word : words) {
word = word.trim();
if (!word.isEmpty()) {
word = word.toLowerCase();
// Emit the word
List a = new ArrayList();
a.add(input);
collector.emit(a, new Values(word));
}
}
//确认成功处理一个tuple
collector.ack(input);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));

}
public void cleanup() {
// TODO Auto-generated method stub

}
public Map<String, Object> getComponentConfiguration() {
// TODO Auto-generated method stub
return null;
}
}

3)WordCount:(bolt,相当于reduce)

package cn.edu.nuc.StormTest.wordcount1;

import java.util.HashMap;
import java.util.Map;

import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.IRichBolt;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Tuple;

public class WordCounter implements IRichBolt{
/**
*
*/
private static final long serialVersionUID = 1L;
Integer id;
String name;
Map<String, Integer> counters;
private OutputCollector collector;

public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.counters = new HashMap<String, Integer>();
this.collector = collector;
this.name = context.getThisComponentId();
this.id = context.getThisTaskId();

}
public void execute(Tuple input) {
String str = input.getString(0);
if (!counters.containsKey(str)) {
counters.put(str, 1);
} else {
Integer c = counters.get(str) + 1;
counters.put(str, c);
}
// 确认成功处理一个tuple
collector.ack(input);
}
/**
* Topology执行完毕的清理工作,比如关闭连接、释放资源等操作都会写在这里
* 因为这只是个Demo,我们用它来打印我们的计数器
* */
public void cleanup() {
System.out.println("-- Word Counter [" + name + "-" + id + "] --");
for (Map.Entry<String, Integer> entry : counters.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
counters.clear();
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// TODO Auto-generated method stub

}
public Map<String, Object> getComponentConfiguration() {
// TODO Auto-generated method stub
return null;
}
}

4)主函数:

package cn.edu.nuc.StormTest.wordcount1;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
import backtype.storm.utils.Utils;

public class WordCountTopologyMain {
public static void main(String[] args) throws InterruptedException {
//定义一个Topology
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader",new WordReader(),1);
builder.setBolt("word-normalizer", new WordNormalizer()).shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter(),2).fieldsGrouping("word-normalizer", new Fields("word"));
//配置
Config conf = new Config();
conf.put("wordsFile", "d:/test.txt");
conf.setDebug(false);
//提交Topology
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
//创建一个本地模式cluster
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("Getting-Started-Toplogie", conf,builder.createTopology());
Utils.sleep(3000);
cluster.killTopology("Getting-Started-Toplogie");
cluster.shutdown();
}
}