mysql 中求top n

TopN:前几条数据
1.TopN age最大的前三个
select * from students order by age desc limit 0,3;
2.分组Top1 按sex分组后,求分组中年龄最大的一个
1.select * from students where age in (select max(age) m from students group by sex);
2.select * from students as stu1 where age=(select max(age) from students as stu2 where stu1.sex=stu2.sex);
2.分组TopN 按sex分组后,求分组中年龄最大的三个
select * from students as stu1 where 3>(select count(*) students as stu2 where stu1.sex=stu2.sex and stu1.age<stu2.age);
hive中求top n
分析需求得出步骤,先根据性别分组,然后根据年龄做降序,取前2条。
select id,age,name,gender,n
from
(
select id,age,name,gender,
ROW_NUMBER() over(PARTITION BY gender order by age desc) as n
from hive_topn
) temp where temp.n<=2;
spark中求topn 同hive
elect id,age,name,gender,n
from
(
select id,age,name,gender,
ROW_NUMBER() over(PARTITION BY gender order by age desc) as n
from hive_topn
) temp where temp.n<=2;
如何使用spark实现topn的获取(描述思路)
方法一:按照key对数据进行聚合(groupbykey)
2、将value转换成数组,利用scala的sortby或者sortwith进行排序
from、where、 group by 、having、order by、join、select 、limit