目录
- 问题描述 1
- 基于的系统/算法 1
- 系统设计/算法设计 2
- 实验流程 7
- 实验 8
- 结论 9
7.实验中的踩坑 10
8.项目运行过程 10 - 问题描述
1.题号:41
2.题目:实时网约车系统设计
3.问题描述:
a) 应用背景:网约车是网络预约出租汽车的简称。在构建多样化服务体系方面,出租车将分为巡游出租汽车和网络预约出租汽车。通过互联网的即时通信模式,可以同时缓解空车司机无法寻找到客户、寻车用户无法找到出租车的问题。本题目目标是,实现并维护一个简易的分布式的系统,让每个寻车用户与租车司机实现实时配对。
b) 设计部分设计一个分布式的存储维护方法,分别保存并处理实时的寻车用户和租车司机的信息,提出一个有效的配对方法使双方的等待时间不至于过长、接车距离不至于过远。
c) 实验部分实现并维护一个简易的分布式的管理系统,在实时的数据流上,让每个寻车用户与租车司机实现实时配对。使用上述有效的配对方法降低等待时间和接车距离。
技术难点:
1.司机与乘客信息以流的形式到达,需要从中分别获取司机以及乘客信息并进行处理。
2.如果把所有的司机以及乘客放在一起处理匹配,数据规模较大,恐怕难以在有限时间内完成,因此可以划分为不同区域进行匹配。
3.不能够保证每一轮匹配中所有乘客与司机都能够成功配对,对于匹配失败的司机及乘客需要进行状态的保存,当进行下一轮匹配时,之前匹配失败的乘客应该优先进行匹配。
4.如果有多个满足要求的司机时,应该优先选择与接车地点距离最短的司机进行配对。
2. 基于的系统/算法
系统:在本次毕业设计中,主要使用的是spark的扩展spark-streaming。Spark-streaming将实时流切分为若干批(每批X秒),将每批数据视为RDD进行处理,从而实现对流的处理。虽然无法实现毫秒级的处理速度,但整个流式计算根据业务的需求可以对中间的结果进行叠加或者存储到外部设备。本文转载自http://www.biyezuopin.vip/onews.asp?id=14808由于在网约车系统中数据是以流形式源源不断到来的,需要对流数据进行处理,在处理过程中,还需要对中间状态以及结果进行保存。同时,由于网约车任务并不需要毫秒级的处理速度,因此,采用了spark-streaming作为本次作业的系统。
利用spark-streaming,监听本地特定端口,并每隔一定时间处理一次接受到的数据。在每轮处理时,将接受到的数据根据类型划分为司机与乘客,并与上一轮保存的状态信息进行合并后一起进行匹配。对于匹配成功的数据,将匹配成功信息写入记录文件中,对于匹配失败的数据,则保存到状态中等待下一轮的匹配。
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.JavaSparkContext$;
import org.apache.spark.api.java.Optional;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.Function3;
import org.apache.spark.rdd.RDD;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.State;
import org.apache.spark.streaming.StateSpec;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import scala.Tuple2;
import scala.Tuple3;
import scala.collection.JavaConverters;
import java.util.*;
import static org.apache.spark.api.java.StorageLevels.MEMORY_AND_DISK_SER;
public class _receive_info {
public static void main(String args[]){
_receive_info receive = new _receive_info();
SparkConf conf = new SparkConf().setMaster("local[4]").setAppName("online_car-hailing");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(3));
JavaReceiverInputDStream<String> lines = jssc.socketTextStream("localhost", 9999);
JavaPairDStream<String,Location> info = lines.mapToPair(line->{
String value[] = line.split("\\s+");
double longitude = Double.valueOf(value[1].split(":")[1]);
double latitude = Double.valueOf(value[2].split(":")[1]);
String typeId[] = value[0].split(":");
long arriveTime = Long.valueOf(value[3].split(":")[1]);
Location loc = new Location(arriveTime,typeId[0],longitude, latitude,Integer.valueOf(typeId[1]));
return new Tuple2<>(value[0],loc);
});
info.persist(MEMORY_AND_DISK_SER);
JavaPairDStream<String,Location>passengers = info.filter((tuple)->(tuple._1().split(":")[0].equals("passenger")));
receive.savePassengerFile(passengers);
JavaPairDStream<String,Location>drivers = info.filter((tuple)->(tuple._1().split(":")[0].equals("driver")));
receive.saveDriverFile(jssc.sparkContext(),drivers);
JavaPairDStream<Area, Tuple2<Long,Map<String,Location>>> passengers_each_area = receive.mapToArea(passengers);
JavaPairDStream<Area, Tuple2<Long,Map<String,Location>>> drivers_each_area = receive.mapToArea(drivers);
Function3<Area, Optional<Tuple2<Long,Map<String,Location>>>, State<Tuple2<Long,Map<String,Location>>>,Map<String,Location>> function = (word, op, state)->{
if(!state.exists()||!op.isPresent()){
state.update(op.get());
}
else{
Map<String,Location> loc = new HashMap<>();
loc.putAll(state.get()._2());
loc.putAll(op.get()._2());
state.update(new Tuple2<>(System.currentTimeMillis(),loc));
}
return state.get()._2();
};
JavaPairDStream<Area, Tuple2<Long,Map<String,Location>>> passenger_info = passengers_each_area.mapWithState(StateSpec.function(function)).stateSnapshots();
JavaPairDStream<Area, Tuple2<Long,Map<String,Location>>> driver_info = drivers_each_area.mapWithState(StateSpec.function(function)).stateSnapshots();
passenger_info.persist(MEMORY_AND_DISK_SER);
driver_info.persist(MEMORY_AND_DISK_SER);
Tuple3<JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>>,JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>>,JavaPairDStream<Area,Set<Matched>>> tuple3;
JavaPairDStream<Area,Set<Matched>> matched=null;
for(int i = 0;i<9;i++){
driver_info = receive.changeArea(driver_info);
tuple3 = receive.match(passenger_info, driver_info);
passenger_info = tuple3._1();
driver_info = tuple3._2();
if(matched==null){
matched = tuple3._3();
}else{
matched = matched.union(tuple3._3());
}
}
driver_info = driver_info.mapToPair(tuple->new Tuple2<>(new Area(tuple._1()._getRealArea()),tuple._2()));
matched.print();
receive.saveMatchedFile(jssc.sparkContext(),matched);
JavaPairDStream<Area, Tuple2<Long,Map<String,Location>>> driverMatched = receive.recoverArea(driver_info);
JavaPairDStream<Area, Tuple2<Long,Map<String,Location>>> passengerMatched = passenger_info;
Function3<Area, Optional<Tuple2<Long,Map<String,Location>>>, State<Tuple2<Long,Map<String,Location>>>,Map<String,Location>> update = (word, op, state)->{
if(!state.exists()){
state.update(op.get());
}
else{
Map<String,Location> locs = op.get()._2();
Map<String,Location> up = new HashMap<>();
for(String t : locs.keySet()){
if(!locs.get(t).getMatched()){
up.put(t, locs.get(t));
}
}
state.update(new Tuple2<>(System.currentTimeMillis(), up));
}
return state.get()._2();
};
passengerMatched.mapWithState(StateSpec.function(update)).stateSnapshots();
driverMatched.mapWithState(StateSpec.function(update)).stateSnapshots();
jssc.checkpoint("checkpoint");
jssc.start();
try {
jssc.awaitTermination();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> changeArea(JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> pair){
return pair.mapToPair(tuple->{
Area next = tuple._1()._getNextArea();
return new Tuple2<>(next, tuple._2());
});
}
private JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> recoverArea(JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> pair){
return pair.mapToPair(tuple->{
Area realArea = new Area(tuple._1()._getRealArea());
return new Tuple2<>(realArea, tuple._2());
});
}
private JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> mapToArea(JavaPairDStream<String,Location> pair){
JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> areas = pair.mapToPair(l->{
Map<String,Location>loc = new HashMap<>();
loc.put(l._1(), l._2());
return new Tuple2<>(new Area(l._2()._getArea()),new Tuple2<>(System.currentTimeMillis(),loc));
});
JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> area = areas.reduceByKey(((location, location2) -> {
Map<String,Location> map = new HashMap<>();
long time;
if(location._1()>location2._1()){
map.putAll(location2._2());
map.putAll(location._2());
time = location._1();
}else{
map.putAll(location._2());
map.putAll(location2._2());
time = location2._1();
}
return new Tuple2<>(time, map);
}));
return area;
}
private void savePassengerFile(JavaPairDStream<String,Location> dstream){
dstream.map(ds->ds._2()).foreachRDD(rdd->{
if(!rdd.isEmpty()){
SparkSession session = SparkSession.builder().config(rdd.context().getConf()).getOrCreate();
Dataset<Row> df = session.createDataFrame(rdd.rdd(), Location.class);
//df.show();
df.write().mode(SaveMode.Append).save("passenger-info");
}
});
}
private void saveDriverFile(JavaSparkContext sc, JavaPairDStream<String,Location> dstream){
dstream.map(ds->ds._2()).foreachRDD(rdd->{
if(!rdd.isEmpty()){
SparkSession session = SparkSession.builder().config(rdd.context().getConf()).getOrCreate();
Dataset<Row> df = session.createDataFrame(rdd.rdd(), Location.class);
//df.show();
df.write().mode(SaveMode.Append).save("driver-info");
}
});
}
private void saveMatchedFile(JavaSparkContext sc, JavaPairDStream<Area,Set<Matched>> matched){
matched.map(tuple->Arrays.asList(tuple._2().toArray())).foreachRDD(rdd->{
if(rdd!=null&&rdd.count()>0) {
List<Object> l = rdd.reduce((list1, list2) -> {
List<Object> list = new ArrayList<>();
if (!(list2 == null)&&!list1.isEmpty() )
list.addAll(list1);
if (!(list2==null)&&!list2.isEmpty() )
list.addAll(list2);
return list;
});
RDD<Object> r = rdd.context().parallelize(JavaConverters.asScalaIteratorConverter(l.iterator()).asScala().toSeq(), 4, JavaSparkContext$.MODULE$.fakeClassTag());
if(!l.isEmpty())
r.saveAsTextFile("match");
}
});
}
private Tuple3<JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>>,JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>>,JavaPairDStream<Area,Set<Matched>>> match(JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>>passenger, JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>>driver){
JavaPairDStream<Area, Tuple2<Tuple2<Long,Map<String,Location>>,Tuple2<Long,Map<String,Location>>>> areaInfo = passenger.join(driver);
JavaPairDStream<Area, Tuple3<Tuple2<Long,Map<String,Location>>,Tuple2<Long,Map<String,Location>>,Set<Matched>>> pair = areaInfo.mapToPair(tuple->{
Set<Matched> matcheds = new HashSet<>();
Map<String,Location> passengers = tuple._2()._1()._2();
Map<String,Location> drivers = tuple._2()._2()._2();
Map<String,Location> m1 = new HashMap<>();
Map<String,Location> m2 = new HashMap<>();
ArrayList<Tuple2<String,Location>> list1 = new ArrayList<>();
ArrayList<Tuple2<String,Location>> list2 = new ArrayList<>();
for(String key: passengers.keySet()){
list1.add(new Tuple2<>(key,passengers.get(key)));
}
for(String key: drivers.keySet()){
list2.add(new Tuple2<>(key,drivers.get(key)));
}
Comparator comparator = new sortByTime();
list1.sort(comparator);
Long driverTime = tuple._2()._2()._1();
for(Tuple2<String,Location> p : list1){
Tuple2<Location,Location> match = null;
double distance = Integer.MAX_VALUE;
String driverInfo = "";
Location passenger_location = p._2();
if(passenger_location.getMatched()){
continue;
}
for(Tuple2<String,Location> d : list2){
Location driver_location = d._2();
if(driver_location.getMatched()){
continue;
}
double dis = passenger_location._far_away_from(driver_location);
if(dis<distance){
driverInfo = d._1();
match = new Tuple2<>(p._2(),d._2());
distance = dis;
}
}
Location matched_passenger = p._2().birth();
if(match!=null){//&&distance<10){
Location matched_driver = match._2().birth();
match._2().setMatched(true);
matched_passenger.setMatched(true);
matched_driver.setMatched(true);
m1.put(p._1(),matched_passenger);
m2.put(driverInfo, matched_driver);
matcheds.add(new Matched(matched_passenger.birth(), matched_driver.birth()));
}
else{
matched_passenger.setMatched(false);
m1.put(p._1(), matched_passenger);
}
}
if(m2.size()<drivers.size()){
for(String d : drivers.keySet()){
if(!m2.containsKey(d)){
Location loc = drivers.get(d).birth();
loc.setMatched(false);
m2.put(d, loc);
}
}
}
return new Tuple2<>(tuple._1(),new Tuple3<>(new Tuple2<>(System.currentTimeMillis(),m1),new Tuple2<>(System.currentTimeMillis(),m2),matcheds));
});
JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> passenger_info = pair.mapToPair(tuple->new Tuple2<>(tuple._1(),tuple._2()._1()));
JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> driver_info = pair.mapToPair(tuple->new Tuple2<>(tuple._1(),tuple._2()._2()));
JavaPairDStream<Area,Set<Matched>> matched = pair.mapToPair(tuple->new Tuple2<>(tuple._1(),tuple._2()._3()));
JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> passengers = passenger.union(passenger_info);
JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> drivers = driver.union(driver_info);
Function2<Tuple2<Long,Map<String,Location>>,Tuple2<Long,Map<String,Location>>,Tuple2<Long,Map<String,Location>>> reduce = ((tuple1,tuple2)->{
Map<String,Location> map1 = tuple1._2();
Map<String,Location> map2 = tuple2._2();
Long time;
Map<String,Location> map;
if(tuple1._1()>tuple2._1()){
time = tuple1._1();
map = map2;
map.putAll(map1);
}
else{
time = tuple2._1();
map = map1;
map.putAll(map2);
}
return new Tuple2<>(time, map);
});
JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> finalPassenger = passengers.reduceByKey(reduce);
JavaPairDStream<Area,Tuple2<Long,Map<String,Location>>> finalDriver = drivers.reduceByKey(reduce);
return new Tuple3<>(finalPassenger,finalDriver,matched);
}
}
class sortByTime implements Comparator{
@Override
public int compare(Object o, Object t1) {
long value1,value2;
if(o instanceof Location && t1 instanceof Location) {
value1 = (((Location) o).getArriveTime());
value2 = ((Location) o).getArriveTime();
return Long.compare(value1, value2);
}
return 0;
}
}























