我们本次分析的代码版本是nova-2011,从代码的分析来看,代码组织结构还是挺规整的。
我们先看目录树,如下:
bin目录之前已经介绍过了,他们是服务启动脚本工具,我们主要看nova目录,nova目录下基本上每个目录就是一个nova子服务,这些子服务都一定程度上继承了nova目录下的各个模块文件。
service.py:用来创建服务,启动服务实例的基础类定义,novs所有的组件服务都通过他实例化
rpc.py:os各个组件之间的通信都是依靠该模块提供的rpc机制实现,主要是rabbitmq服务,里边定义了众多的消费者和生产者类的定义,比如:
class AdapterConsumer(TopicConsumer):
"""Calls methods on a proxy object based on method and args"""
def __init__(self, connection=None, topic="broadcast", proxy=None):
LOG.debug(_('Initing the Adapter Consumer for %s') % topic)
self.proxy = proxy
super(AdapterConsumer, self).__init__(connection=connection,
topic=topic)
基于topic的消息消费者类型,初始化的时候除了连接mq的conn对象和topic之外,还有一个proxy对象,这是什么呢?从代码看,这里传的是compute_manager类的实例化后的对象,该对象定义真正的对vm生命周期的管理操作,所以这个proxy是这个类的实例化的对象值,可以用来操作vm。这个类里有个方法:
@exception.wrap_exception
def receive(self, message_data, message):
"""Magically looks for a method on the proxy object and calls it
Message data should be a dictionary with two keys:
method: string representing the method to call
args: dictionary of arg: value
Example: {'method': 'echo', 'args': {'value': 42}}
"""
LOG.debug(_('received %s') % message_data)
msg_id = message_data.pop('_msg_id', None)
ctxt = _unpack_context(message_data)
method = message_data.get('method')
args = message_data.get('args', {})
message.ack()
if not method:
# NOTE(vish): we may not want to ack here, but that means that bad
# messages stay in the queue indefinitely, so for now
# we just log the message and send an error string
# back to the caller
LOG.warn(_('no method for message: %s') % message_data)
msg_reply(msg_id, _('No method for message: %s') % message_data)
return
node_func = getattr(self.proxy, str(method))
node_args = dict((str(k), v) for k, v in args.iteritems())
# NOTE(vish): magic is fun!
try:
rval = node_func(context=ctxt, **node_args)
if msg_id:
msg_reply(msg_id, rval, None)
except Exception as e:
logging.exception("Exception during message handling")
if msg_id:
msg_reply(msg_id, None, sys.exc_info())
return
当接收到mq传来的消息时,我们看到消息体里有请求上下文,需要执行的方法以及传给方法的参数。我们要执行消息告诉给我们的方式时,肯定是某个manager提供的,所以会有 getattr(self.proxy, str(method))这个操作,方法要么是某个类的方法,要么是单独的模块方法。显然这里调用的是类实例方法,但是从manager的构造来看,getattr是不能直接获取到对应的method的,我们继续查看代码,发现service.py 里的服务创建类里做了一个getattr重写:
def __getattr__(self, key):
manager = self.__dict__.get('manager', None)
return getattr(manager, key)
这也是rpc模块能正确获取到实例方法地址的诀窍之处。
manager.py:这个模块是nova所有组件manager都要继承依赖的操作类,其他组件负责实现增加自己的所有操作,该顶级模块只是定义了简单的几个没有实现的抽象方法,比如:周期函数periodic_tasks,各组件初始化函数:init_host,以及db的实例化操作对象。我们看到每个组件代码目录下都有一个manager.py,继承了上层目录的manager.py,理解起来也不难。
flags.py:该模块定义了nova服务生命周期里需要的环境配置类信息,方便使用一些固化的配置信息
exception.py:这里定义了基本的异常处理类和函数,主要是一堆装饰器函数。
log.py:主要定义nova的日志模块通用的使用方法。
我们再来看消息处理。
消息这块其实不太复杂,主要就是定义好连接mq的信息以及消息发布者和订阅者的处理就算基本搭好了框架。
这一块主要都在rpc.py模块里,如下:
mq的框架模块主要用的carrot。
模块一开始就定义了mq连接类:
class Connection(carrot_connection.BrokerConnection):
"""Connection instance object"""
一个连接大类,连接mq broker服务。
其次分别是消息消费者类和消息生产者类的定义
class Consumer(messaging.Consumer):
"""Consumer base class
Contains methods for connecting the fetch method to async loops
"""
def __init__(self, *args, **kwargs):
for i in xrange(FLAGS.rabbit_max_retries):
if i > 0:
time.sleep(FLAGS.rabbit_retry_interval)
try:
super(Consumer, self).__init__(*args, **kwargs)
self.failed_connection = False
break
except: # Catching all because carrot sucks
fl_host = FLAGS.rabbit_host
fl_port = FLAGS.rabbit_port
fl_intv = FLAGS.rabbit_retry_interval
LOG.exception(_("AMQP server on %(fl_host)s:%(fl_port)d is"
" unreachable. Trying again in %(fl_intv)d seconds.")
% locals())
self.failed_connection = True
if self.failed_connection:
LOG.exception(_("Unable to connect to AMQP server "
"after %d tries. Shutting down."),
FLAGS.rabbit_max_retries)
sys.exit(1)
def fetch(self, no_ack=None, auto_ack=None, enable_callbacks=False):
"""Wraps the parent fetch with some logic for failed connections"""
# TODO(vish): the logic for failed connections and logging should be
# refactored into some sort of connection manager object
...
class Publisher(messaging.Publisher):
"""Publisher base class"""
pass
我们看到publisher类直接继承的是库里的基础类,consumer类不仅做了继承还做了修改,主要是处理异常。他们继承的类模块信息如下:
from carrot import connection as carrot_connection
from carrot import messaging
上边说的publisher和consumer是两个基类,其他的所有消费者类和生产者类都是基于他们做的继承,除此之外,还有两个重要的方法:
call方法:消息发送者可以调用该方法发送消息并且需要等待响应才能结束,其定义如下:
def call(context, topic, msg):
"""Sends a message on a topic and wait for a response"""
LOG.debug(_("Making asynchronous call..."))
msg_id = uuid.uuid4().hex
msg.update({'_msg_id': msg_id})
LOG.debug(_("MSG_ID is %s") % (msg_id))
_pack_context(msg, context)
class WaitMessage(object):
def __call__(self, data, message):
"""Acks message and sets result."""
message.ack()
if data['failure']:
self.result = RemoteError(*data['failure'])
else:
self.result = data['result']
wait_msg = WaitMessage()
conn = Connection.instance(True)
consumer = DirectConsumer(connection=conn, msg_id=msg_id)
consumer.register_callback(wait_msg)
conn = Connection.instance()
publisher = TopicPublisher(connection=conn, topic=topic)
publisher.send(msg)
publisher.close()
try:
consumer.wait(limit=1)
except StopIteration:
pass
consumer.close()
# NOTE(termie): this is a little bit of a change from the original
# non-eventlet code where returning a Failure
# instance from a deferred call is very similar to
# raising an exception
if isinstance(wait_msg.result, Exception):
raise wait_msg.result
return wait_msg.result
cast方法:该方法也是发送一个消息,但是无需等待响应,也即是不需要响应。如下:
def cast(context, topic, msg):
"""Sends a message on a topic without waiting for a response"""
LOG.debug(_("Making asynchronous cast..."))
_pack_context(msg, context)
conn = Connection.instance()
publisher = TopicPublisher(connection=conn, topic=topic)
publisher.send(msg)
publisher.close()
消息的定义基本如上。消息的具体发送和接收以及处理下期待续。