现象:

      无线词条页访问失败

 

问题分析:

1、首先,查看日志

日志中非常显眼的是,从Redis读取词条的pv耗时很长,高达20多秒, 看样子是访问Redis速度很慢导致

 

2、查看Redis的慢日志

> SLOWLOG GET

可以得到如下结果:

25) 1) (integer) 160            #日志的唯一标识符(uid)
     2) (integer) 1543060013    #命令执行的时间点,UNIX时间戳
     3) (integer) 22679074      #命令执行的时长,单位是微妙
     4) 1) "ZREVRANGE"          #命令及其参数信息
        2) "baike_lemma_pv"
        3) "0"
        4) "20000000"
        5) "withscores"
26) 1) (integer) 159
     2) (integer) 1542973618
     3) (integer) 22829521
     4) 1) "ZREVRANGE"
        2) "baike_lemma_pv"
        3) "0"
        4) "20000000"
        5) "withscores"
27) 1) (integer) 158
     2) (integer) 1542888649
     3) (integer) 22396426
     4) 1) "ZREVRANGE"
        2) "baike_lemma_pv"
        3) "0"
        4) "20000000"
        5) "withscores"

可以看到有些请求耗时特别长,高达20s,这个时间似乎和日志中的时间比较吻合

 

3、再看下当前连接上Redis的连接数

> info

得到如下结果:

# Clients
connected_clients:626

目前有626个客户端和Redis服务端建立了连接,并排队等待被Redis的单线程处理, 这个数量说多也不多,说少也不少

 

4、解决问题

根据第二步中的慢日志, 可以定位到相关的命令详情如下:

./redis-cli -h 10.134.30.00 -p 12002 -a wen@sog.com ZREVRANGE baike_lemma_pv 0 20000000 withscores | awk '{if(NR % 2 == 1) a=$0; else {a=a"\t"$0; print a; a=""}}' >> ./topPvLemmaId.lst

这个命令是要一次性从ZSET结构中读取2000w的数据,这不耗时才怪呢?这一个命令耗时20s,我们知道Redis是单线程对外服务的,那么服务线程处理这个命令的时候,其他的命令都需要等待

对这个命令做了些改造,其实相关业务只需要score排序头部的十几万数据, 所以改造如下:

#!/bin/bash

save_file=$1
for (( start=1; start<=200000; start+=10000 ))
do
    end=$[$start+10000-1]
    ./redis-cli -h 10.134.30.00 -p 12002 -a wen@sog.com ZREVRANGE baike_lemma_pv $start $end withscores | awk  '{if(NR % 2 == 1) a=$0; else {a=a"\t"$0; if($0 > 70000) print a; a=""}}' >> $save_file
done

当然如果仍然要读取全部的ZSET中的数据, 可以用ZSCAN命令来不停的扫描, 每次扫描读取一部分

 

其实redis的info信息是非常详细的:

# Server
redis_version:2.8.19
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:bcc7511e1a094fe9
redis_mode:standalone
os:Linux 2.6.32.43-tlinux-1.0.22-default x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.6
process_id:17351
run_id:61f2d7ca5a7507914252a0cc53ef7cb96c1f98fb
tcp_port:10002
uptime_in_seconds:66262857
uptime_in_days:766
hz:10
lru_clock:926904
config_file:/data/redis-server/baike_database_master_10002/redis_10002.conf

# Clients
connected_clients:177
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:3870592296
used_memory_human:3.60G
used_memory_rss:4278693888
used_memory_peak:4569335968
used_memory_peak_human:4.26G
used_memory_lua:35840
mem_fragmentation_ratio:1.11
mem_allocator:jemalloc-3.6.0

# Persistence
loading:0
rdb_changes_since_last_save:20833780660
rdb_bgsave_in_progress:0
rdb_last_save_time:1495853683
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:17
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:35
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:1445634916
aof_base_size:1165444727
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:90906524
total_commands_processed:83533940789
instantaneous_ops_per_sec:1601
total_net_input_bytes:3031104314026
total_net_output_bytes:1957795787473
instantaneous_input_kbps:60.44
instantaneous_output_kbps:480.64
rejected_connections:0
sync_full:2
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:16227030786
keyspace_misses:995680173
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:79994

# Replication
role:master
connected_slaves:1
slave0:ip=10.241.00.00,port=1022,state=online,offset=1272093723257,lag=1
master_repl_offset:1272093741351
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1272092692776
repl_backlog_histlen:1048576

# CPU
used_cpu_sys:525197.25
used_cpu_user:548578.62
used_cpu_sys_children:2437.03
used_cpu_user_children:33747.64

# Keyspace
db0:keys=19935578,expires=0,avg_ttl=0

具体的每个字段的含义可参考:

http://www.huangdc.com/337