文章目录
- 1、登录node1节点机,进入目录
- 2、运行storm-starter-topologies程序
- 3、打开浏览器
- 4、单击wordcountTop,查看Topology信息
- 5、查看节点运行状况
- 6、执行strom list命令。
- 7、停止Topology
- 8、Maven管理storm-starter项目
- 9、提交运行
- 10、停止wcTop运行
- 1)创建类SplitSentenceBolt.java
- 2)创建类WordCountBolt.java
- 3)创建类ReportBolt.java
- 4)创建类WordCountTopology.java
- 2、执行mvn eclipse:eclipse命令生成ecllipse项目文件
- 3、打开eclipse,导入项目
- 4、创建包和类
- 5、创建类
- 6、按照步骤5,创建其他类。
运用storm-starter拓扑
先启动zookeeper和storm
1、登录node1节点机,进入目录
[root ~]# cd /home/local/storm/examples/storm-starter/
2、运行storm-starter-topologies程序
[root storm-starter]# storm jar storm-starter-topologies-0.9.6.jar storm.starter.WordCountTopology wordcountTop
出现了以下bug,根据详细解释,是出现了版本问题,我之前是用的apache-storm-2.1.0.tar.gz版本,现卸载,安装apache-storm-0.9.6.tar.gz的版本,安装参照同专栏storm实时流式框架搭建管理,目录为storm配置。
重新安装后,发现UI的默认端口8080被占用了
解决:
最后再次运行命令,成功如下:
3、打开浏览器
4、单击wordcountTop,查看Topology信息
5、查看节点运行状况
节点机运行情况如下,当Storm在运行程序时,计算节点机就会自动启动工作进程下载并执行任务。
[root storm-starter]# jps
core节点是有重复的,若没有出现那么多也ok
登录node2节点机,查看数据日志信息
[root ~]# cd /home/local/storm/logs
[root logs]# tail worker-6703.log.1
6、执行strom list命令。
[root storm-starter]# storm list
7、停止Topology
[root storm-starter]# storm kill wordcountTop
8、Maven管理storm-starter项目
Maven 除了以程序构建能力为特色之外,还提供高级项目管理工具。使用Maven将项目打包
[root ~]# cd /home/local/storm/examples/storm-starter/
[root storm-starter]# mvn package
打包完成后,在storm-starter目录下多出一个target目录,并且在target目录中生成两个jar文件
9、提交运行
[root target]# storm jar storm-starter-0.9.6-jar-with-dependencies.jar storm.starter.WordCountTopology wcTop
10、停止wcTop运行
[root target]# storm kill wcTop
利用eclipse和storm-Starter结合运用
1、将storm-starter编译成eclipse工程
[root ~]# cd /home/local/storm/examples/storm-starter/
[root storm-starter]# mvn eclipse:eclipse
2、运行Eclipse,进行生成jar包操作
[root ~]# /usr/local/eclipse/eclipse &
3、单击OK后,进入工作窗口
4、打开Eclipse后,点击“File -> Import…”
5、打开Import窗口后,点击“ General ->Existing Projects into WorkSpace”
6、单击“Next>”后,选择“/home/local/storm/examples/storm-starter”项目。
7、点击“Finish”后,打开项目窗口
8、点击“File ->Export…”
9、打开Export窗口,选择“JAR file”
10、点击“Next>”后,选择storm-starter项目中的src/jvm、multilang、src全部包,输入JAR file文件名。
点击“Finish”,生成jar包
11、运行storm-wordcount.jar
[root workspace]# storm jar storm-starter.jar storm.starter.WordCountTopology wcTop
拓扑实现词频统计
运行eclipse
1、创建包
右击“storm-starter->src/jvm”,打开对话框,点击“New->Package”
点击“New->Package”后打开对话框,输入Java Package的名称
2、创建类
点击“Finish”后创建gdu包,右击gdu,点击“New->Class”
点击“New->Class”后,打开对话框,输入Java Class的名称
点击完成后,输入代码如下:
package gdu;
import java.util.Map;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;
public class SentenceSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
private String[] sentences = {
"Time, so in pursueing we look back, for a walk, never look back.",
"Set off on a journey, bring sunshine, bring beautiful mood.",
"Youth is a ignorant and moving, always leaves drift from place to place.",
"I am the sun, who don't need to rely on the light.",
"Eternity is not a distance but a decision."
};
private int index = 0;
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
public void open(Map config, TopologyContext
context, SpoutOutputCollector collector) {
this.collector = collector;
}
public void nextTuple() {
this.collector.emit(new Values(sentences[index]));
index++;
if (index >= sentences.length) {
index = 0;
}
Utils.sleep(1);
}
}
3、重复步骤2创建以下几个类
1)创建类SplitSentenceBolt.java
代码如下:
package gdu;
import java.util.Map;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class SplitSentenceBolt extends BaseRichBolt {
private OutputCollector collector;
public void prepare(Map config, TopologyContext
context, OutputCollector collector) {
this.collector = collector;
}
public void execute(Tuple tuple) {
String sentence = tuple.getStringByField("sentence");
String[] words = sentence.split(" ");
for(String word : words){
this.collector.emit(new Values(word));
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
2)创建类WordCountBolt.java
代码如下:
package gdu;
import java.util.HashMap;
import java.util.Map;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class WordCountBolt extends BaseRichBolt {
private OutputCollector collector;
private HashMap<String, Long> counts = null;
public void prepare(Map config, TopologyContext
context, OutputCollector collector) {
this.collector = collector;
this.counts = new HashMap<String, Long>();
}
public void execute(Tuple tuple) {
String word = tuple.getStringByField("word");
Long count = this.counts.get(word);
if(count == null){
count = 0L;
}
count++;
this.counts.put(word, count);
this.collector.emit(new Values(word, count));
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
3)创建类ReportBolt.java
代码如下:
package gdu;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.List;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Tuple;
public class ReportBolt extends BaseRichBolt {
private HashMap<String, Long> counts = null;
public void prepare(Map config, TopologyContext context, OutputCollector collector) {
this.counts = new HashMap<String, Long>();
}
public void execute(Tuple tuple) {
String word = tuple.getStringByField("word");
Long count = tuple.getLongByField("count");
this.counts.put(word, count);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// this bolt does not emit anything
}
public void cleanup() {
System.out.println("--- FINAL COUNTS ---");
List<String> keys = new java.util.ArrayList<String>();
keys.addAll(this.counts.keySet());
Collections.sort(keys);
for (String key : keys) {
System.out.println(key + " : " + this.counts.get(key));
}
System.out.println("--------------");
}
}
4)创建类WordCountTopology.java
代码如下:
package gdu;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
import backtype.storm.utils.Utils;
public class WordCountTopology {
private static final String SENTENCE_SPOUT_ID="sentence-spout";
private static final String SPLIT_BOLT_ID="split-bolt";
private static final String COUNT_BOLT_ID="count-bolt";
private static final String REPORT_BOLT_ID="report-bolt";
private static final String TOPOLOGY_NAME="word-count-topology";
public static void main(String[] args) throws
Exception {
SentenceSpout spout = new SentenceSpout();
SplitSentenceBolt splitBolt = new
SplitSentenceBolt();
WordCountBolt countBolt = new WordCountBolt();
ReportBolt reportBolt = new ReportBolt();
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(SENTENCE_SPOUT_ID, spout);
// SentenceSpout --> SplitSentenceBolt
builder.setBolt(SPLIT_BOLT_ID, splitBolt).shuffleGrouping(SENTENCE_SPOUT_ID);
// SplitSentenceBolt --> WordCountBolt
builder.setBolt(COUNT_BOLT_ID, countBolt).fieldsGrouping(
SPLIT_BOLT_ID, new Fields("word"));
// WordCountBolt --> ReportBolt
builder.setBolt(REPORT_BOLT_ID, reportBolt).globalGrouping(COUNT_BOLT_ID);
Config config = new Config();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(TOPOLOGY_NAME, config,
builder.createTopology());
Utils.sleep(50000);
cluster.killTopology(TOPOLOGY_NAME);
cluster.shutdown();
}
}
4、导出jar包
右击“gdu”,点击“Export…”。
点击“Export…”后打开对话框,选择“Java->JVM file”,点击“Next>”打开jar包对话框,选择“swpt”,JAR file文件名。
点击“Finish”后,生成如下文件:
5、运行测试
[root@node1 workspace]# storm jar gdu.jar \gdu.WordCountTopology wd
拓扑文件单词计数
1、创建一个工程gdu-starter
1)在wordspace下创建相应目录
[root ~]# mkdir -p ~/workspace/gdu-starter/src/jvm
[root ~]# mkdir -p ~/workspace/gdu-starter/multilang/resources
2)在目录gdu-starter下编写一个含有基本组件的pom.xml文件
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>storm.gdu</groupId>
<artifactId>gdu-starter</artifactId>
<packaging>jar</packaging>
<version>0.0.1</version>
<name>gdu-starter</name>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.9.6</version>
<!-- keep storm out of the jar-with-dependencies -->
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/jvm</sourceDirectory>
<resources>
<resource>
<directory>${basedir}/multilang</directory>
</resource>
</resources>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass />
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
<includeProjectDependencies>true</includeProjectDependencies>
<includePluginDependencies>false</includePluginDependencies>
<classpathScope>compile</classpathScope>
<mainClass>${storm.topology}</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
2、执行mvn eclipse:eclipse命令生成ecllipse项目文件
[root@node1 gdu-starter]# mvn eclipse:eclipse
3、打开eclipse,导入项目
点击“File -> Import…”,选择“General->Existing Projects into Workspace”,点击“Next>”,选择项目路径。
4、创建包和类
右击“gdu-starter->src/jvm”,打开对话框,点击“New->Package”,对话框,输入Java Package的名称
5、创建类
右击gdu,点击“New->Class”,输入Java Class的名称。
点击“Finish”后,输入如下代码:
package gdu;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Map;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;
public class WordReader extends BaseRichSpout {
private SpoutOutputCollector collector;
private FileReader fileReader;
private boolean completed = false;
public void ack(Object msgId) {
System.out.println("OK:" + msgId);
}
public void close() {
}
public void fail(Object msgId) {
System.out.println("FAIL:" + msgId);
}
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields("line"));
}
public void nextTuple(){
if(completed){
try{
Thread.sleep(1000);
}catch(InterruptedException e){
}
return;
}
String str;
BufferedReader reader = new BufferedReader(fileReader);
try{
while((str = reader.readLine())!=null){
this.collector.emit(new Values(str));
}
}catch(Exception e){
throw new RuntimeException("Error reading tuple" , e);
}finally{
completed = true;
}
}
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
try {
this.fileReader = new FileReader(conf.get("wordFile").toString());
} catch (FileNotFoundException e) {
throw new RuntimeException("Error reading file ["
+ conf.get("wordFile") + "]");
}
this.collector = collector;
}
}
6、按照步骤5,创建其他类。
1)创建WordNormalizer.java类
代码如下:
package gdu;
import java.util.Map;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class WordNormalizer extends BaseRichBolt {
private OutputCollector collector;
public void prepare(Map config, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
}
public void execute(Tuple tuple) {
String sentence = tuple.getStringByField("line");
String[] words = sentence.split(" ");
for (String word : words) {
this.collector.emit(new Values(word));
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
2)创建WordCounter.java类
代码如下:
package gdu;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.List;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class WordCounter extends BaseRichBolt {
private OutputCollector collector;
Integer id;
String name;
Map<String, Long> counters;
public void prepare(Map config, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
this.counters = new HashMap<String, Long>();
this.name = context.getThisComponentId();
this.id = context.getThisTaskId();
}
public void execute(Tuple tuple) {
String word = tuple.getStringByField("word");
Long count = this.counters.get(word);
if (count == null) {
count = 0L;
}
count++;
this.counters.put(word, count);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// this bolt does not emit anything
}
public void cleanup() {
System.out.println("-- Word Counter [" + name + "-" + id + "] --");
for (Map.Entry<String, Long> entry : counters.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
}
3)创建WordCountTop.java类
代码如下:
package gdu;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
import backtype.storm.utils.Utils;
public class WordCountTop {
public static void main(String[] args) throws Exception {
WordReader spout = new WordReader();
WordNormalizer splitBolt = new WordNormalizer();
WordCounter countBolt = new WordCounter();
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader", spout);
builder.setBolt("word-normalizer", splitBolt).shuffleGrouping(
"word-reader");
builder.setBolt("word-counter", countBolt).fieldsGrouping(
"word-normalizer", new Fields("word"));
Config config = new Config();
config.put("wordFile", args[1]);
config.setDebug(false);
config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count-top", config,
builder.createTopology());
Utils.sleep(10000);
cluster.killTopology("word-count-top");
cluster.shutdown();
}
}
7、创建需要统计词的txt文档
在gdu-starter/multilang/resources/目录下创建文件word.txt,输入如下内容:
[root@node1 gdu-starter]# cd multilang/resources/
[root@node1 resources]# vim word.txt
To see a world in a grain of sand. And a heaven in a wild flower
Hold infinity in the palm of your hand. And eternity in an hour
Life is a chain of moments of enjoyment, not only about survival
No man or woman is worth your tears, and the one who is worth make you cry
I’ve learned… That just one person saying to me, “You’ve made my day!” makes my day.
8、导出jar包
右击“gdu”,点击“Export…”,选择“Java->JVM file”,点击“Next>”打开jar包对话框,选择“gdu”,输入JAR file文件名,点击“Finish”完成后生产jar包。
9、运行程序
[root@node1 workspace]# storm jar gdu1.jar gdu.WordCountTop wTT /root/workspace/gdu-starter/multilang/resources/word.txt