前文的问题

第二版用时 33秒左右.

在原来的基础上,稍加改进,即可提升三分之一的性能.


1. select query_time,d,max(ts) ts from (
2. select t2.query_time,ts,rn,round(rn/total,10) percent, 
3.      case 
4. .71>=round(rn/total,10) then 0.71 
5. .81>=round(rn/total,10) then 0.81 
6. .91>=round(rn/total,10) then 0.91 
7. end d 
8. from (
9. select query_time,ts,
10. @gid=query_time then @rn:=@rn+1 when @gid:=query_time then @rn:=1 end rn 
11. from (
12. select * from t ,(select @gid:='',@rn:=0) vars order by query_time,ts
13. ) t1
14. ) t2 inner join (
15. select query_time,count(*) total from t group by query_time
16. ) t3 on(t2.query_time=t3.query_time)
17. where round(rn/total,10)>=0.71
18. ) t6
19. where d is not null
20. group by query_time,d



 

where 
   
 round 
 ( 
 rn 
 / 
 total 
 , 
 10 
 ) 
 > 
 = 
 0 
 . 
 71


即 用定义的最小的百分位数进行过滤后,再group by



此时 查询时间可以低至

20.531 s

当然,这个SQL还有进一步提升的空间

计算 某个百分位数的位置,有如下的公式:
loc=1+(n-1)*p,n是元素数,p是分位点。loc大小介于1和n之间

那么SQL可以进行如下优化


select t5.query_time,t5.ts,t2.v from (
     select query_time,total,v, floor(1+(total-1)*v) rn
     from (
          select query_time,count(*) total from t group by query_time
     ) t3, (select 0.71 v,1 seq union all select 0.81,2 union all select 0.91,3) t4
)
 t2 inner join (
     select 
     query_time,
     case when @gid=query_time then @rn:=@rn+1 when @gid:=query_time then @rn:=1 end rn,
     ts
     from (
         select * from t ,(select @gid:='',@rn:=0) vars order by query_time,ts
     ) t1
) t5 on (t2.query_time=t5.query_time and t2.rn=t5.rn )


除了本身简化了SQL复杂度,查询时间也低至 15秒左右