《Data Algorithm
》读书笔记六 — 移动平均
在谈移动平均之前,首先需要理解时间序列数据。
1.时间序列数据
时间序列数据表示一个变量在一段时间内的值。可以不太严格的把时间序列数据形式化表示为三元组序列:(k,t,v)
一般的,只要在一段时间内记录相同的度量值,就会得到时间序列数据。
多个连续周期的时间序列数据平均值称为移动平均。移动的意思是:随着新的时间序列数据的到来,要不断的重新计算这个平均值,由于会删除最早的值,同时增加最新的值,这个平均值会相应的“移动”。
2. 需求
在本例中,我使用一个模拟的股票数据,计算其在指定窗口中的移动平均问题。
3. 测试数据
3.1.1 测试输入1
GOOG,2004-11-04,184.70
GOOG,2004-11-03,191.67
GOOG,2004-11-02,194.87
GOOG,2013-07-19,896.60
GOOG,2013-07-18,910.68
GOOG,2004-07-17,918.55
3.1.1 测试输出1
GOOG', 2004-11-02', 194.87, 0.0
GOOG', 2004-11-03', 191.67, 193.26999999999998
GOOG', 2004-11-04', 184.7, 188.185
GOOG', 2013-07-17', 918.55, 551.625
GOOG', 2013-07-18', 910.68, 914.615
GOOG', 2013-07-19', 896.6, 903.64
3.1.2 测试输入2
GOOG,2004-11-04,184.70
GOOG,2004-11-03,191.67
GOOG,2004-11-02,194.87
GOOG,2013-07-19,896.60
GOOG,2013-07-18,910.68
GOOG,2004-07-17,918.55
APPL,2013-10-04,483.22
APPL,2013-10-07,485.39
APPL,2013-10-08,484.345
APPL,2013-10-09,483.765
IBM,2013-09-26,189.845
IBM,2013-09-27,188.57
IBM,2013-09-30,186.05
3.1.2 测试输出2
APPL, 2013-10-04, 483.22, 0.0
APPL, 2013-10-07, 485.39, 484.305
APPL, 2013-10-08, 484.345, 484.8675
APPL, 2013-10-09, 483.765, 484.055
GOOG, 2004-11-02, 194.87, 0.0
GOOG, 2004-11-03, 191.67, 193.26999999999998
GOOG, 2004-11-04, 184.7, 188.185
GOOG, 2013-07-17, 918.55, 551.625
GOOG, 2013-07-18, 910.68, 914.615
GOOG, 2013-07-19, 896.6, 903.64
IBM, 2013-09-26, 189.845, 0.0
IBM, 2013-09-27, 188.57, 189.20749999999998
IBM, 2013-09-30, 186.05, 187.31
4.使用普通的java 程序解决移动平均问题
此处略
5.使用 MapReduce job
解决移动平均问题
因为代码较多,这里不一一列出,但是主要的方法都是二次排序,分组等方式。这里列出重要的 Stock
类,以及MoveAvgReducer
类
5.1 Stock
package data_algorithm.chapter_6;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class Stock implements Writable, WritableComparable<Stock> {
private String com_name;// company name
private String date;//the date
private double price;//stock price
private double moveAvg;//move average
public Stock() {
}
public Stock(String com_name, String date, double price) {
this.com_name = com_name;
this.date = date;
this.price = price;
}
public String getCom_name() {
return com_name;
}
public void setCom_name(String com_name) {
this.com_name = com_name;
}
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public double getPrice() {
return price;
}
public void setPrice(double price) {
this.price = price;
}
public double getMoveAvg() {
return moveAvg;
}
public void setMoveAvg(double moveAvg) {
this.moveAvg = moveAvg;
}
@Override
public int compareTo(Stock o) {
int comp = this.com_name.compareTo(o.com_name);
if (comp == 0) {
comp = this.date.compareTo(o.date);
}
return comp;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(com_name);
out.writeUTF(date);
out.writeDouble(price);
}
@Override
public void readFields(DataInput in) throws IOException {
this.com_name = in.readUTF();
this.date = in.readUTF();
this.price = in.readDouble();
}
@Override
public String toString() {
return com_name +
", " + date +
", " + price +
", " + moveAvg ;
}
}
5.2 MoveAvgReducer
package data_algorithm.chapter_6;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class MoveAvgReducer extends Reducer<Stock,DoubleWritable,Stock,NullWritable> {
private int window = 2;//the move windows number
@Override
protected void reduce(Stock key,Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
double res =0 ;
double priorVal = 0; // save the prior value
int count = 0;
String priorKey ="";
for (DoubleWritable dw : values) {
key.setMoveAvg(0);
if (count != 0 && priorKey.equals(key.getCom_name())) {
res = (dw.get() + priorVal) / window;
key.setMoveAvg(res);// set the final move average value
}
priorVal = dw.get();
priorKey = key.getCom_name();
count++;
res = 0;//reset
context.write(key,NullWritable.get());
}
count = 0;
}
}
全部代码可以在我的github中获取。
6 注意事项
6.1
细心的读者可能会发现在Reducer
代码中,有如下的这一行:
key.setMoveAvg(0);
那么这一行到底是用来干什么的呢?是否多余呢?
这一行代码的目的是用于将每个Stock 对象的moveAvg值置成0。这一步看似多余,但是实际上并不多余,因为如果要注释该行代码,则得到了如下的执行结果:
APPL, 2013-10-04, 483.22, 0.0
APPL, 2013-10-07, 485.39, 484.305
APPL, 2013-10-08, 484.345, 484.8675
APPL, 2013-10-09, 483.765, 484.055
GOOG, 2004-11-02, 194.87, 484.055
GOOG, 2004-11-03, 191.67, 193.26999999999998
GOOG, 2004-11-04, 184.7, 188.185
GOOG, 2013-07-17, 918.55, 551.625
GOOG, 2013-07-18, 910.68, 914.615
GOOG, 2013-07-19, 896.6, 903.64
IBM, 2013-09-26, 189.845, 903.64
IBM, 2013-09-27, 188.57, 189.20749999999998
IBM, 2013-09-30, 186.05, 187.31
可以看到,在每类股票的第一个股票时,除了第一类股票,其moveAvg
的值都为非0值。这显然是不正常的,因为我们在程序中设置好了移动平均的窗口为2。所以GOOG, 2004-11-02, 194.87, 484.055
以及 IBM, 2013-09-26, 189.845, 903.64
等值都是错误的。但是为什么还是会赋值呢?【具体的原因我也不清楚,所以就这么在前面冗余了一个 setMoveAvg()
操作】