原文链接:http://blog.51cto.com/john88wang/1745824
一 应用场景描述
线上业务使用RabbitMQ作为消息队列中间件,那么作为运维人员对RabbitMQ的监控就很重要,本文就针对如何从头到尾使用Zabbix来监控RabbitMQ进行说明。
二 RabbitMQ监控要点
RabbitMQ官方提供两种方法来管理和监控RabbitMQ。
1.使用rabbitmqctl管理和监控
Usage:
rabbitmqctl [-n <node>] [-q] <command> [<command options>]
查看虚拟主机
# rabbitmqctl list_vhosts
查看队列
# rabbitmqctl list_queues
查看exchanges
# rabbitmqctl list_exchanges
查看用户
# rabbitmqctl list_users
查看连接
# rabbitmqctl list_connections
查看消费者信息
# rabbitmqctl list_consumers
查看环境变量
# rabbitmqctl environment
查看未被确认的队列
# rabbitmqctl list_queues name messages_unacknowledged
查看单个队列的内存使用
# rabbitmqctl list_queues name memory
查看准备就绪的队列
# rabbitmqctl list_queues name messages_ready
2.使用RabbitMQ Management插件来监控和管理
开启Management插件
# rabbitmq-plugins enable rabbitmq_management
通过这样的网址访问可以看到RabbitMQ的状态
http://172.28.2.157:15672/cli/rabbitmqadmin
下载rabbitmqadmin管理工具
获取vhost列表
# curl -i -u guest:guest http://localhost:15672/api/vhosts
获取频道列表,限制显示格式
# curl -i -u guest:guest "http://localhost:15672/api/channels?sort=message_stats.publish_details.rate&sort_reverse=true&columns=name,message_stats.publish_details.rate,message_stats.deliver_get_details.rate"
显示概括信息
# curl -i -u guest:guest "http://localhost:15672/api/overview"
management_version 管理插件版本
cluster_name 整个RabbitMQ集群的名称,通过rabbitmqctl set_cluster_name 进行设置
publish 发布的消息总数
queue_totals 显示准备就绪的消息,未确认的消息,未提交的消息等
statistics_db_event_queue 显示还未必数据库处理的事件数量
consumers 消费者个数
queues 队列长度
exchanges 队列交换机的数量
connections 连接数
channels 频道数量
显示节点信息
# curl -i -u guest:guest "http://localhost:15672/api/nodes"
disk_free 磁盘剩余空间,以字节表示
disk_free_limit 磁盘报警的阀值
fd_used 使用掉的文件描述符数量
fd_total 可用的文件描述符数量
io_read_avg_time 读操作平均时间,毫秒为单位
io_read_bytes 总共读入磁盘数据大小,以字节为单位
io_read_count 总共读操作的数量
io_seek_avg_time seek操作的平均时间,毫秒单位
io_seek_count seek操作总量
io_sync_avg_time fsync操作的平均时间,毫秒为单位
io_sync_count fsync操作的总量
io_write_avg_time 每个磁盘写操作的平均时间,毫秒为单位
io_write_bytes 写入磁盘数据总量,以字节为单位
io_write_count 磁盘写操作总量
mem_used 内存使用字节
mem_limit 内存报警阀值,默认是总的物理内存的40%
mnesia_disk_tx_count 需要写入到磁盘的Mnesia事务的数量
mnesia_ram_tx_count 不需要写入到磁盘的Mnesia事务的数量
msg_store_write_count 写入到消息存储的消息数量
msg_store_read_count 从消息存储读入的消息数量
proc_used Erlang进程的使用数量
proc_total Erlang进程的最大数量
queue_index_journal_write_count 写入到队列索引日志的记录数量。每条记录表示一个被发布到队列,从消息队列中被投递出或者在消息队列中被q确认的消息
queue_index_read_count 从队列索引读出的记录数量
queue_index_write_count 写入到队列索引的记录数量
sockets_used 以socket方式使用掉的文件描述符数量
partitions
uptime 自从Erlang VM启动时,运行的时间,单位好毫秒
run_queue 等待运行的Erlang进程数量
processors 检测到被Erlang进程使用到的内核数量
net_ticktime 当前设置的内核tick time
查看频道信息
# curl -i -u guest:guest "http://localhost:15672/api/channels"
查看交换机信息
# curl -i -u guest:guest "http://localhost:15672/api/exchanges"
查看队列信息
# curl -i -u guest:guest "http://localhost:15672/api/queues"
查看vhosts信息
# curl -i -u guest:guest "http://localhost:15672/api/vhosts/?name=/"
三 编写监控脚本和添加Zabbix配置文件
监控脚本主要包括三个部分,监控overview,监控当前主机的节点信息,还有监控各个队列
根据网上的脚本进行了修改,新增加了很多监控项目,把原来脚本中的filter去掉了
这里顺便提一下,对于网上的各种代码,不能拿来就用,要结合自身的需求对代码进行分析,也可以提升自己的编码能力,如果只是一味地拿来就用,那永远也得不到提高。
rabbitmq_status.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | #!/usr/bin/env /usr/bin/python '''Python module to query the RabbitMQ Management Plugin REST API and get results that can then be used by Zabbix. https://github.com/jasonmcintosh/rabbitmq-zabbix ''' ''' This script is tested on RabbitMQ 3.5.3 ''' import json import optparse import socket import urllib2 import subprocess import tempfile import os import logging logging.basicConfig(filename = '/opt/logs/zabbix/rabbitmq_zabbix.log' , level = logging.WARNING, format = '%(asctime)s %(levelname)s: %(message)s' ) class RabbitMQAPI( object ): '''Class for RabbitMQ Management API''' def __init__( self , user_name = 'guest' , password = 'guest' , host_name = '', protocol = 'http' , port = 15672 , conf = '/opt/app/zabbix/conf/zabbix_agentd.conf' , senderhostname = None ): self .user_name = user_name self .password = password self .host_name = host_name or socket.gethostname() self .protocol = protocol self .port = port self .conf = conf or '/opt/app/zabbix/conf/zabbix_agentd.conf' self .senderhostname = senderhostname if senderhostname else host_name def call_api( self , path): ''' All URIs will server only resource of type application/json,and will require HTTP basic authentication. The default username and password is guest/guest. /%sf is encoded for the default virtual host '/' ''' url = '{0}://{1}:{2}/api/{3}' . format ( self .protocol, self .host_name, self .port, path) password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password( None , url, self .user_name, self .password) handler = urllib2.HTTPBasicAuthHandler(password_mgr) logging.debug( 'Issue a rabbit API call to get data on ' + path) ######## json.loads() transfer json data to python data ######## json.dump() transfer python data to json data return json.loads(urllib2.build_opener(handler). open (url).read()) def list_queues( self ): ''' curl -i -u guest:guest http://localhost:15672/api/queues return a list ''' queues = [] for queue in self .call_api( 'queues' ): logging.debug( "Discovered queue " + queue[ 'name' ]) element = { '{#VHOSTNAME}' : queue[ 'vhost' ], '{#QUEUENAME}' : queue[ 'name' ] } queues.append(element) logging.debug( 'Discovered queue ' + queue[ 'vhost' ] + '/' + queue[ 'name' ]) return queues def list_nodes( self ): '''Lists all rabbitMQ nodes in the cluster''' nodes = [] for node in self .call_api( 'nodes' ): # We need to return the node name, because Zabbix # does not support @ as an item parameter name = node[ 'name' ].split( '@' )[ 1 ] element = { '{#NODENAME}' : name, '{#NODETYPE}' : node[ 'type' ]} nodes.append(element) logging.debug( 'Discovered nodes ' + name + '/' + node[ 'type' ]) return nodes def check_queue( self ): '''Return the value for a specific item in a queue's details.''' return_code = 0 #### use tempfile module to create a file on memory, will not be deleted when it is closed , because 'delete' argument is set to False rdatafile = tempfile.NamedTemporaryFile(delete = False ) for queue in self .call_api( 'queues' ): self ._get_queue_data(queue, rdatafile) rdatafile.close() return_code = self ._send_queue_data(rdatafile) #### os.unlink is used to remove a file os.unlink(rdatafile.name) return return_code def _get_queue_data( self , queue, tmpfile): '''Prepare the queue data for sending''' ''' ### one single queue's information like this ##### ### curl -i -u guest:guest http://localhost:15672/api/queues dumps a list ### {"memory":32064,"message_stats":{"ack":3870,"ack_details":{"rate":0.0},"deliver":3871,"deliver_details":{"rate":0.0},"deliver_get":3871,"deliver_get_details":{"rate":0.0},"disk_writes":3870,"disk_writes_details":{"rate":0.0},"publish":3870,"publish_details":{"rate":0.0},"redeliver":1,"redeliver_details":{"rate":0.0}},"messages":0,"messages_details":{"rate":0.0},"messages_ready":0,"messages_ready_details":{"rate":0.0},"messages_unacknowledged":0,"messages_unacknowledged_details":{"rate":0.0},"idle_since":"2016-03-01 22:04:22","consumer_utilisation":"","policy":"","exclusive_consumer_tag":"","consumers":4,"recoverable_slaves":"","state":"running","messages_ram":0,"messages_ready_ram":0,"messages_unacknowledged_ram":0,"messages_persistent":0,"message_bytes":0,"message_bytes_ready":0,"message_bytes_unacknowledged":0,"message_bytes_ram":0,"message_bytes_persistent":0,"disk_reads":0,"disk_writes":3870,"backing_queue_status":{"q1":0,"q2":0,"delta":["delta",0,0,0],"q3":0,"q4":0,"len":0,"target_ram_count":"infinity","next_seq_id":3870,"avg_ingress_rate":0.060962064328682466,"avg_egress_rate":0.060962064328682466,"avg_ack_ingress_rate":0.060962064328682466,"avg_ack_egress_rate":0.060962064328682466},"name":"app000","vhost":"/","durable":true,"auto_delete":false,"arguments":{},"node":"rabbit@test2"} ''' for item in [ 'memory' , 'messages' , 'messages_ready' , 'messages_unacknowledged' , 'consumers' ]: #key = rabbitmq.queues[/,queue_memory,queue.helloWorld] key = '"rabbitmq.queues[{0},queue_{1},{2}]"' . format (queue[ 'vhost' ], item, queue[ 'name' ]) ### if item is in queue,value=queue[item],else value=0 value = queue.get(item, 0 ) logging.debug( "SENDER_DATA: - %s %s" % (key,value)) tmpfile.write( "- %s %s\n" % (key, value)) ## This is a non standard bit of information added after the standard items for item in [ 'deliver_get' , 'publish' ]: key = '"rabbitmq.queues[{0},queue_message_stats_{1},{2}]"' . format (queue[ 'vhost' ], item, queue[ 'name' ]) value = queue.get( 'message_stats' , {}).get(item, 0 ) logging.debug( "SENDER_DATA: - %s %s" % (key,value)) tmpfile.write( "- %s %s\n" % (key, value)) def _send_queue_data( self , tmpfile): '''Send the queue data to Zabbix.''' '''Get key value from temp file. ''' args = '/opt/app/zabbix/sbin/zabbix_sender -c {0} -i {1}' if self .senderhostname: args = args + " -s " + self .senderhostname return_code = 0 process = subprocess.Popen(args. format ( self .conf, tmpfile.name), shell = True , stdout = subprocess.PIPE, stderr = subprocess.PIPE) out, err = process.communicate() logging.debug( "Finished sending data" ) return_code = process.wait() logging.info( "Found return code of " + str (return_code)) if return_code ! = 0 : logging.warning(out) logging.warning(err) else : logging.debug(err) logging.debug(out) return return_code def check_aliveness( self ): '''Check the aliveness status of a given vhost. ''' '''virtual host '/' should be encoded as '/%2f' ''' return self .call_api( 'aliveness-test/%2f' )[ 'status' ] def check_overview( self , item): '''First, check the overview specific items''' ''' curl -i -u guest:guest http://localhost:15672/api/overview ''' ## rabbitmq[overview,connections] if item in [ 'channels' , 'connections' , 'consumers' , 'exchanges' , 'queues' ]: return self .call_api( 'overview' ).get( 'object_totals' ).get(item, 0 ) ## rabbitmq[overview,messages] elif item in [ 'messages' , 'messages_ready' , 'messages_unacknowledged' ]: return self .call_api( 'overview' ).get( 'queue_totals' ).get(item, 0 ) elif item = = 'message_stats_deliver_get' : return self .call_api( 'overview' ).get( 'message_stats' , {}).get( 'deliver_get' , 0 ) elif item = = 'message_stats_publish' : return self .call_api( 'overview' ).get( 'message_stats' , {}).get( 'publish' , 0 ) elif item = = 'message_stats_ack' : return self .call_api( 'overview' ).get( 'message_stats' , {}).get( 'ack' , 0 ) elif item = = 'message_stats_redeliver' : |