Map |
DataStream<Integer> dataStream = //...
dataStream.map(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) throws Exception {
return 2 * value;
}
});
|
FlatMap |
dataStream.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String value, Collector<String> out)
throws Exception {
for(String word: value.split(" ")){
out.collect(word);
}
}
});
|
Filter |
dataStream.filter(new FilterFunction<Integer>() {
@Override
public boolean filter(Integer value) throws Exception {
return value != 0;
}
});
|
KeyBy |
dataStream.keyBy("someKey") // Key by field "someKey"
dataStream.keyBy(0) // Key by the first element of a Tuple
Attention 如果是以下类型不可以成为key
-
it is a POJO type but does not override the
hashCode() method and relies on the
Object.hashCode() implementation.
-
it is an array of any type.
|
Reduce |
keyedStream.reduce(new ReduceFunction<Integer>() {
@Override
public Integer reduce(Integer value1, Integer value2)
throws Exception {
return value1 + value2;
}
});
|
Fold |
DataStream<String> result =
keyedStream.fold("start", new FoldFunction<Integer, String>() {
@Override
public String fold(String current, Integer value) {
return current + "-" + value;
}
});
|
Aggregations |
在键控数据流上滚动聚合。min和minBy的区别在
于,min返回最小值,而minBy返回该字段中具有
最小值的元素(max和maxBy也是如此)。
keyedStream.sum(0);
keyedStream.sum("key");
keyedStream.min(0);
keyedStream.min("key");
keyedStream.max(0);
keyedStream.max("key");
keyedStream.minBy(0);
keyedStream.minBy("key");
keyedStream.maxBy(0);
keyedStream.maxBy("key");
|
Window |
dataStream.keyBy(0).window(TumblingEventTimeWindows.
of(Time.seconds(5))); // Last 5 seconds of data
|
WindowAll |
Windows可以在常规数据流上定义。Windows根据
一些特性(例如,最近5秒内到达的数据)对所有流
事件进行分组。
Attention:在许多情况下,这是一个非并行转换。所
有记录将在windowAll操作符的一个任务中收集。
dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5))); // Last 5 seconds of data
|
Window Apply |
将一个通用函数应用于整个窗口。下面是一个手动汇总窗口元素的函数。
Note: If you are using a windowAll transformation, you need to use an AllWindowFunction instead.
windowedStream.apply (new WindowFunction<Tuple2<String,Integer>, Integer, Tuple, Window>() {
public void apply (Tuple tuple,
Window window,
Iterable<Tuple2<String, Integer>> values,
Collector<Integer> out) throws Exception {
int sum = 0;
for (value t: values) {
sum += t.f1;
}
out.collect (new Integer(sum));
}
});
// applying an AllWindowFunction on non-keyed window stream
allWindowedStream.apply (new AllWindowFunction<Tuple2<String,Integer>, Integer, Window>() {
public void apply (Window window,
Iterable<Tuple2<String, Integer>> values,
Collector<Integer> out) throws Exception {
int sum = 0;
for (value t: values) {
sum += t.f1;
}
out.collect (new Integer(sum));
}
});
|
Window Reduce |
将函数reduce函数应用于窗口并返回缩简后的值。
windowedStream.reduce (new ReduceFunction<Tuple2<String,Integer>>() {
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> value1, Tuple2<String, Integer> value2) throws Exception {
return new Tuple2<String,Integer>(value1.f0, value1.f1 + value2.f1);
}
});
|
Window Fold |
将函数折叠函数应用于窗口并返回折叠值。将示例函
数应用于序列(1,2,3,4,5)时,将序列折叠成字符串
“start 1-2-3-4-5”:
windowedStream.fold("start", new FoldFunction<Integer, String>() {
public String fold(String current, Integer value) {
return current + "-" + value;
}
});
|
Aggregations
on windows
|
聚合窗口的内容。min和minBy的区别在于,min返
回最小值,而minBy返回该字段中具有最小值的元
素(max和maxBy也是如此)。
windowedStream.sum(0);
windowedStream.sum("key");
windowedStream.min(0);
windowedStream.min("key");
windowedStream.max(0);
windowedStream.max("key");
windowedStream.minBy(0);
windowedStream.minBy("key");
windowedStream.maxBy(0);
windowedStream.maxBy("key");
|
Union |
两个或多个数据流的联合,创建一个包含来自所有流
的所有元素的新流。注意:如果您将一个数据流与它
自己相结合,您将得到结果流中的每个元素两次。
dataStream.union(otherStream1, otherStream2, ...);
|
Window Join |
连接给定键和公共窗口上的两个数据流。
dataStream.join(otherStream)
.where(<key selector>).equalTo(<key selector>)
.window(TumblingEventTimeWindows.of(Time.seconds
(3)))
.apply (new JoinFunction () {...});
|
Interval Join KeyedStream,KeyedStream → DataStream |
Join two elements e1 and e2 of two keyed streams
with a common key over a given time interval, so
that e1.timestamp + lowerBound <= e2.timestamp <= e1.timestamp + upperBound
// this will join the two streams so that
// key1 == key2 && leftTs - 2 < rightTs < leftTs + 2
keyedStream.intervalJoin(otherKeyedStream)
.between(Time.milliseconds(-2), Time.milliseconds(2)) // lower and upper bound
.upperBoundExclusive(true) // optional
.lowerBoundExclusive(true) // optional
.process(new IntervalJoinFunction() {...});
|
Window CoGroup |
将给定键和公共窗口上的两个数据流组合在一起
dataStream.coGroup(otherStream)
.where(0).equalTo(1)
.window(TumblingEventTimeWindows.of(Time.seconds(3)))
.apply (new CoGroupFunction () {...});
|
Connect |
“连接”两个保留其类型的数据流。允许两个流之间
共享状态的连接。
DataStream<Integer> someStream = //...
DataStream<String> otherStream = //...
ConnectedStreams<Integer, String> connectedStreams = someStream.connect(otherStream);
|
CoMap, CoFlatMap |
类似于连接数据流上的映射和平面映射
connectedStreams.map(new CoMapFunction<Integer, String, Boolean>() {
@Override
public Boolean map1(Integer value) {
return true;
}
@Override
public Boolean map2(String value) {
return false;
}
});
connectedStreams.flatMap(new CoFlatMapFunction<Integer, String, String>() {
@Override
public void flatMap1(Integer value, Collector<String> out) {
out.collect(value.toString());
}
@Override
public void flatMap2(String value, Collector<String> out) {
for (String word: value.split(" ")) {
out.collect(word);
}
}
});
|
Split |
根据某些标准将流分成两个或多个流。
SplitStream<Integer> split = someDataStream.split(new OutputSelector<Integer>() {
@Override
public Iterable<String> select(Integer value) {
List<String> output = new ArrayList<String>();
if (value % 2 == 0) {
output.add("even");
}
else {
output.add("odd");
}
return output;
}
});
|
Select |
从拆分流中选择一个或多个流。
SplitStream<Integer> split;
DataStream<Integer> even = split.select("even");
DataStream<Integer> odd = split.select("odd");
DataStream<Integer> all = split.select("even","odd");
|
Iterate DataStream → IterativeStream → DataStream |
通过将一个操作符的输出重定向到前面的某个操作符,在流中创建一个“反馈”循环。这对于定义不断更新模型的算法特别有用。下面的代码从一个流开始,并持续地应用迭代体。大于0的元素被发送回反馈通道,其余的元素被转发到下游。有关完整描述,请参见迭代。
IterativeStream<Long> iteration = initialStream.iterate();
DataStream<Long> iterationBody = iteration.map (/*do something*/);
DataStream<Long> feedback = iterationBody.filter(new FilterFunction<Long>(){
@Override
public boolean filter(Long value) throws Exception {
return value > 0;
}
});
iteration.closeWith(feedback);
DataStream<Long> output = iterationBody.filter(new FilterFunction<Long>(){
@Override
public boolean filter(Long value) throws Exception {
return value <= 0;
}
});
|
Extract Timestamps |
从记录中提取时间戳,以便使用使用事件时间语义的窗口。看到事件时间。
stream.assignTimestamps (new TimeStampExtractor() {...});
|