目录

概述

方案汇总

方法一

方法二

方法三

方法四

方法五

方法六


概述

总体需求:Array("laoduan 30 99", "laozhao 29 9999", "laozhang 28 98", "laoyang 28 99")

排序规则:首先按照颜值的降序,如果颜值相等,再按照年龄的升序。下面列举了各种排序思路。

方案汇总

方法一

建了一个user类,继承了Ordered,里面的参数是整个user,实现了Serializable,因为要进行网络传输,将所有属性全部传进来,重写compare方法,实现排序。

object CustomSort1 {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("CustomSort1").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val users= Array("laoduan 30 99", "laozhao 29 9999", "laozhang 28 98", "laoyang 28 99")  //三个属性分别是:姓名,年龄,属性。排序规则:首先按照颜值的降序,如果颜值相等,再按照年龄的升序
    val lines: RDD[String] = sc.parallelize(users)  //将Driver端的数据并行化变成RDD
    val userRDD: RDD[User] = lines.map(line => {   //切分整理数据
      val fields = line.split(" ")
      val name = fields(0)
      val age = fields(1).toInt
      val fv = fields(2).toInt
      new User(name, age, fv)
    })
    val sorted: RDD[User] = userRDD.sortBy(u => u) //将RDD里面装的User类型的数据进行排序
    val r = sorted.collect()
    println(r.toBuffer)
    sc.stop()
  }
}


class User(val name: String, val age: Int, val fv: Int) extends Ordered[User] with Serializable {

  override def compare(that: User): Int = {
    if(this.fv == that.fv) {
      this.age - that.age
    } else {
      -(this.fv - that.fv)
    }
  }

  override def toString: String = s"name: $name, age: $age, fv: $fv"
}

方法二

在建的类中只传入需要排序的属性,比如姓名不用排序,就不传姓名这个参数

object CustomSort2 {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("CustomSort2").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val users= Array("laoduan 30 99", "laozhao 29 9999", "laozhang 28 98", "laoyang 28 99")
    val lines: RDD[String] = sc.parallelize(users)
    val tpRDD: RDD[(String, Int, Int)] = lines.map(line => {
      val fields = line.split(" ")
      val name = fields(0)
      val age = fields(1).toInt
      val fv = fields(2).toInt
      (name, age, fv)
    })
    val sorted: RDD[(String, Int, Int)] = tpRDD.sortBy(tp => new user(tp._2, tp._3))  //排序(传入了一个排序规则,不会改变数据的格式,只会改变顺序)
    println(sorted.collect().toBuffer)
    sc.stop()
  }
}


class user(val age: Int, val fv: Int) extends Ordered[user] with Serializable {

  override def compare(that: user): Int = {
    if(this.fv == that.fv) {
      this.age - that.age
    } else {
      -(this.fv - that.fv)
    }
  }
}

方法三

使用case修饰class,就不用再new class了,可以直接传入参数

object CustomSort3{

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("CustomSort3").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val users= Array("laoduan 30 99", "laozhao 29 9999", "laozhang 28 98", "laoyang 28 99")
    val lines: RDD[String] = sc.parallelize(users)
    val tpRDD: RDD[(String, Int, Int)] = lines.map(line => {
      val fields = line.split(" ")
      val name = fields(0)
      val age = fields(1).toInt
      val fv = fields(2).toInt
      (name, age, fv)
    })
    val sorted: RDD[(String, Int, Int)] = tpRDD.sortBy(tp => User(tp._2, tp._3))
    println(sorted.collect().toBuffer)
    sc.stop()
  }
}


case class User(age: Int, fv: Int) extends Ordered[User] {
  override def compare(that: User): Int = {
    if(this.fv == that.fv) {
      this.age - that.age
    } else {
      -(this.fv - that.fv)
    }
  }
}

 

方法四

使用隐士类implicit

object CustomSort4 {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("CustomSort4").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val users= Array("laoduan 30 99", "laozhao 29 9999", "laozhang 28 98", "laoyang 28 99")
    val lines: RDD[String] = sc.parallelize(users)
    val tpRDD: RDD[(String, Int, Int)] = lines.map(line => {
      val fields = line.split(" ")
      val name = fields(0)
      val age = fields(1).toInt
      val fv = fields(2).toInt
      (name, age, fv)
    })
    import SortRules.OrderingUser
    val sorted: RDD[(String, Int, Int)] = tpRDD.sortBy(tp => User(tp._2, tp._3))
    
    println(sorted.collect().toBuffer)
    sc.stop()
  }
}

case class User(age: Int, fv: Int)

object SortRules {

  implicit object OrderingUser extends Ordering[User] {
    override def compare(x: User, y: User): Int = {
      if(x.fv == y.fv) {
        x.age - y.age
      } else {
        y.fv - x.fv
      }
    }
  }
}

方法五

元组的比较规则:先比第一,相等再比第二个

object CustomSort5 {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("CustomSort5").setMaster("local[*]")
	val sc = new SparkContext(conf)
    val users= Array("laoduan 30 99", "laozhao 29 9999", "laozhang 28 98", "laoyang 28 99")
    val lines: RDD[String] = sc.parallelize(users)
    val tpRDD: RDD[(String, Int, Int)] = lines.map(line => {
      val fields = line.split(" ")
      val name = fields(0)
      val age = fields(1).toInt
      val fv = fields(2).toInt
      (name, age, fv)
    })    
    //充分利用元组的比较规则,元组的比较规则:先比第一,相等再比第二个
    val sorted: RDD[(String, Int, Int)] = tpRDD.sortBy(tp => (-tp._3, tp._2))
    println(sorted.collect().toBuffer)
    sc.stop()
  }
}

方法六

上面tpRDD.sortBy时,如果不想改变里面的tp => tp,可以采用这种方法

object CustomSort6 {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("CustomSort6").setMaster("local[*]")
 	val sc = new SparkContext(conf)
    val users= Array("laoduan 30 99", "laozhao 29 9999", "laozhang 28 98", "laoyang 28 99")
    val lines: RDD[String] = sc.parallelize(users)
    val tpRDD: RDD[(String, Int, Int)] = lines.map(line => {
      val fields = line.split(" ")
      val name = fields(0)
      val age = fields(1).toInt
      val fv = fields(2).toInt
      (name, age, fv)
    })
   
    //Ordering[(Int, Int)]最终比较的规则格式
    //on[(String, Int, Int)]未比较之前的数据格式
    //(t =>(-t._3, t._2))怎样将规则转换成想要比较的格式
    implicit val rules = Ordering[(Int, Int)].on[(String, Int, Int)](t =>(-t._3, t._2))
    val sorted: RDD[(String, Int, Int)] = tpRDD.sortBy(tp => tp)
    println(sorted.collect().toBuffer)
    sc.stop()
  }
}