比较实用的阿里云SLS查询语句,通过该语句能掌握应用的运行情况、例如:状态码走势和分布、响应时间走势和分布、搜索引擎的分布情况。内外网请求情况等,对于故障分析、应用健康都能比较好的了解。

  1. 状态码分布走势:
(*)| SELECT sum(IF (status < 600 AND status > 499, 1, 0)) AS code_50x, sum(IF(status < 500 AND status > 399, 1, 0))  AS code_40x, sum(IF(status < 400 AND status > 299, 1, 0))  AS code_30x, sum(IF(status < 300, 1, 0))  AS code_20x, date_format( from_unixtime(__time__ -__time__%60),'%m-%d %H:%i' )  as time group by __time__ - __time__% 60 order by time limit 10000 

参考图:codeline.png

2.搜索引擎分布走势图:

(*)| SELECT sum(IF (http_user_agent LIKE '%pider%' OR http_user_agent LIKE '%bot%', 1, 0))  AS allSpider, sum(IF (http_user_agent LIKE '%Baiduspider%', 1, 0))  AS Baiduspider, sum(IF (http_user_agent LIKE '%Bytespider%', 1, 0))  AS Bytespider, sum(IF (http_user_agent LIKE '%bingbot%', 1, 0))  AS bingbot, sum(IF (http_user_agent LIKE '%Googlebot%', 1, 0))  AS Googlebot,  sum(IF (http_user_agent LIKE '%YisouSpider%', 1, 0))  AS YisouSpider, sum(IF (http_user_agent LIKE '%Applebot%', 1, 0))  AS Applebot,date_format( from_unixtime(__time__ -__time__%60),'%m-%d %H:%i' )  as time group by __time__ - __time__% 60 order by time limit 10000 

参考图:spiderline.png

3.请求时间分布走势:

(*)| SELECT sum(IF (request_time < 1, 1, 0)) AS less1, sum(IF(request_time < 3 AND request_time > 1, 1, 0))  AS Range1To3, sum(IF(request_time < 5 AND request_time > 3, 1, 0))  AS Range3To5, sum(IF(request_time < 10 AND request_time > 5, 1, 0))  AS Range5To10, sum(IF(request_time < 15 AND request_time > 10, 1, 0))  AS Range10To15, sum(IF(request_time > 15, 1, 0))  AS gt15, date_format( from_unixtime(__time__ -__time__%60),'%m-%d %H:%i' )  as time group by __time__ - __time__% 60 order by time limit 10000 

参考图:timedistline.png

4.请求时间分布:

(*)| SELECT  CASE    WHEN request_time < 1 then '<1s'    WHEN request_time < 3 then '<3s'    WHEN request_time < 5 then '<5s'    WHEN request_time < 10 then '<10s'    WHEN request_time < 20 then '<20s'    ELSE 'large'  END AS request_time,  count(*) AS pvGROUP BY  request_time 

参考图:timedist.png

5.内外网请求走势:

(*)| SELECT sum(IF ( ip_to_domain(remote_addr)='intranet', 1, 0))  AS lan,sum(IF( ip_to_domain(remote_addr)='internet', 1, 0))  AS wan,date_format( from_unixtime(__time__ -__time__%60),'%m-%d %H:%i' )  as time group by __time__ - __time__% 60 order by time limit 10000 

参考图:neiwaiwang.png

6.内网请求IPtop分布:

(*)| select count(1) as pv, remote_addr as client_ip where ip_to_domain(remote_addr)='intranet' group by client_ip order by pv desc limit 20 

参考图:neiwangIptop.png

7.公网IPtop分布:

(*)| select count(1) as pv, split_part(http_x_forwarded_for,',',1) as realip  where ip_to_domain(split_part(http_x_forwarded_for,',',1))='internet' AND request_method != 'POST' group by realip order by pv desc limit 20 

参考图:gwipTop.png

8.异常状态码的后端服务器:

(*)| select count(1) as pv,status, upstream_addr as backend_ip where status > 399 group by backend_ip,status order by pv desc limit 20 

参考图:exception_ups.png