Part 1 

宕机疏散对外呈现的命令行

 

nova evacuate [--password <password>] <server> [<host>]参数详解:<server> 故障计算节点上的虚拟机<host> 目标计算节点的名称或ID。如果没有指定特定的计算节点,则nova scheduler调度器随机选择选择一个可用的计算节点password <password> 设置宕机疏散后虚机的登录密码

Part 2

nova evacuate应用场景

 

nova evacuate 的应用场景主要是,当虚拟机所在nova-compute计算节点出现宕机后,虚拟机可以通过nova evacuate将虚拟机从宕机的nova-compute计算节点迁移至其它可用的计算节点上。当原compute节点再重新恢复后,会对疏散后的虚机进行删除。 

Part 3

nova evacuate代码梳理

 

当nova接受到用户下发的nova evacuate请求时,nova各模块处理的主体流程如下:
  • 1) nova-api服务接受用户请求,对用户的请求参数进行有效性校验,然后向nova-conductor服务发送rpc请求,把处理流程交给nova-conductor服务进行处理

  • 2) nova-conductor服务接受到rpc消息后,根据用户下发的参数进行不同逻辑判断,如果用户没有指定特定的计算节点,那么会进一步的调用nova-scheduler服务,来选择一个可用的计算节点

  • 3) nova-conductor服务给选中的nova-compute计算节点发送cast类型的rpc消息,交由具体的计算节点来进行处理

  • 4) nova-compute节点接受到rpc消息后,进行虚机的创建操作

Part 4

nova-api服务阶段

 

nova-api服务执行api目录下代码阶段

 

  • 1)用户下发 nova evacuate 命令时,使用post方法,给nova-api服务发送http请求,http body体里面,使用的action动作为 evacuate

  • 2)获取http请求body体里面的内容,从而获取 host,force,password,on_shared_storage,这些参数的值

  • 3)如果指定了host 参数的值,那么首先判断该host节点是否存在,如果不存在的话,那么抛出找不到该Host的异常,存在的话,执行第四步

  • 4)如果指定的Host,与虚机所在的host相同,那么抛出异常,不允许指定的计算节点与虚机的host相同

D:\tran_code\nova_v1\nova\api\openstack\compute\evacuate.py    def _evacuate(self, req, id, body):        """Permit admins to evacuate a server from a failed host        to a new one.        """        context = req.environ["nova.context"]        instance = common.get_instance(self.compute_api, context, id)        context.can(evac_policies.BASE_POLICY_NAME,                    target={'user_id': instance.user_id,                            'project_id': instance.project_id})         evacuate_body = body["evacuate"]        host = evacuate_body.get("host")        force = None        ..........        if host is not None:            try:                self.host_api.service_get_by_compute_host(context, host)            except (exception.ComputeHostNotFound,                    exception.HostMappingNotFound):                msg = _("Compute host %s not found.") % host                raise exc.HTTPNotFound(explanation=msg)        if instance.host == host:            msg = _("The target host can't be the same one.")            raise exc.HTTPBadRequest(explanation=msg)         try:            self.compute_api.evacuate(context, instance, host,                                      on_shared_storage, password, force)        except exception.InstanceUnknownCell as e:            raise exc.HTTPNotFound(explanation=e.format_message())        ....... 

nova-api服务执行compute目录模块代码阶段

 

  • 1)获取虚机的host信息

  • 2)判断虚机Host的nova-compue服务是否处于Up状态,如果是Up状态,那么就抛出一个异常(宕机疏散,只有在虚机所在节点宕机的情况下,才进行宕机疏散的),不在执行后续的操作,如果非up,那么执行3以下步骤

  • 3)创建Migration表,来记录该虚机宕机疏散的操作信息,便于后续函数,获取该虚机的信息

  • 4)如果指定了特定host主机,那么把这个host更新到migration的dest_compute字段里面

  • 5)根据虚机的 uuid,获取虚机的request_spec信息。在T版本中,虚机的request_spect内容,存放在nova_api.request_specs表spec字段里面

  • 6)如果指定了目标主机,但是不强制进行宕机疏散的话,把host参数置为none,由nova-scheduler随机选择一个可有的计算节点

这个函数,只用了instance, host, on_shared_storage,admin_password=None, force=None,recreate=True这五个参数,其他的参数没有用,使用默认值,传递给nova conductor服务的rebuild_instance方法代码逻辑如下:
D:\tran_code\nova_v1\nova\compute\api.py    def evacuate(self, context, instance, host, on_shared_storage,                 admin_password=None, force=None):        LOG.debug('vm evacuation scheduled', instance=instance)获取虚机的所在host节点        inst_host = instance.host根据虚机的host来获取其nova-compute service信息        service = objects.Service.get_by_compute_host(context, inst_host)对虚机所在的nova-compute节点状态进行判断,宕机疏散是在虚机所在的nova-compute节点为down的情况下,疏散的,因此如果虚机所在的nova-compute服务为up,会抛出异常        if self.servicegroup_api.service_is_up(service):            LOG.error('Instance compute service state on %s '                      'expected to be down, but it was up.', inst_host)            raise exception.ComputeServiceInUse(host=inst_host)        设置虚机的任务状态为rebuiding,重建状态        instance.task_state = task_states.REBUILDING        instance.save(expected_task_state=[None])        self._record_action_start(context, instance, instance_actions.EVACUATE)创建这个migration记录,是为源计算节点创建一个醒目标志,为了找到它及以后清理,这个参数不会通过参数的形式,下发下去,迁移的类型为evacuate        migration = objects.Migration(context,                                      source_compute=instance.host,                                      source_node=instance.node,                                      instance_uuid=instance.uuid,                                      status='accepted',                                      migration_type='evacuation')如果指定了目标主机,那么把目标主机记录到Migration表里面        if host:            migration.dest_compute = host        migration.create()         compute_utils.notify_about_instance_usage(            self.notifier, context, instance, "evacuate")         try:            request_spec = objects.RequestSpec.get_by_instance_uuid(                context, instance.uuid)        except exception.RequestSpecNotFound:            # Some old instances can still have no RequestSpec object attached            # to them, we need to support the old way            request_spec = None         # NOTE(sbauza): Force is a boolean by the new related API version如果不强制进行宕机疏散并且还强制指定了特定的host主机,那么就走这段逻辑,其他情况是,不走。        if force is False and host:            nodes = objects.ComputeNodeList.get_all_by_host(context, host)            # NOTE(sbauza): Unset the host to make sure we call the scheduler虽然形参赋值了,但是在这里把host赋值为空,让它走nova-scheduler调度            host = None            # FIXME(sbauza): Since only Ironic driver uses more than one            # compute per service but doesn't support evacuations,            # let's provide the first one.            target = nodes[0]            if request_spec:                destination = objects.Destination(                    host=target.host,                    node=target.hypervisor_hostname                )                request_spec.requested_destination = destination        return self.compute_task_api.rebuild_instance(context,                       instance=instance,                       new_pass=admin_password,                       injected_files=None,                       image_ref=None,                       orig_image_ref=None,                       orig_sys_metadata=None,                       bdms=None,                       recreate=True,                       on_shared_storage=on_shared_storage,                       host=host,                       request_spec=request_spec,                       ) 
 

Part 5

nova-conductor服务阶段

 

nova-conductor服务接受到nova-api发送的rpc请求以后,nova-conductor阶段 manager.py阶段处理1) 根据虚机的uuid,获取虚机的migrantion表信息2) 对传入的Host进行不同逻辑判断3) host有值的情景
  • 第一种情景:在虚机原始的host上,使用虚机原始的镜像进行重建rebuild;

  • 第二种情景:指定特定的主机,并且进行强制的宕机疏散。

这两种情况下,node这个参数是为空的4) host无值的情况三种情景:
  • 第一种情景:要么没有指定主机进行宕机疏散;

  • 第二种情景:要么指定主机了,但是没有进行强制宕机疏散;

  • 第三种情景:要么就是在虚机所在主机上,使用新的镜像进行rebuild重建虚机。

在nova-scheduler的过程中,instance的host是会被排除的,避免选择到这个相同的主机,这种情况下,选择目标主机后,host和Node是非空的,host用于设置消息的目标主机路由参数,node用于后续函数中。5) 给nova-compute服务发送rpc请求代码逻辑如下:
D:\tran_code\nova_v1\nova\conductor\manager.py    def rebuild_instance(self, context, instance, orig_image_ref, image_ref,                         injected_files, new_pass, orig_sys_metadata,                         bdms, recreate, on_shared_storage,                         preserve_ephemeral=False, host=None,                         request_spec=None):         with compute_utils.EventReporter(context, 'rebuild_server', instance.uuid):            node = limits = None            try:根据虚机的Uuid,来获取到虚机的migration表信息,如果没有找到,那么抛异常                migration = objects.Migration.get_by_instance_and_status(                    context, instance.uuid, 'accepted')            except exception.MigrationNotFoundByStatus:                LOG.debug("No migration record for the rebuild/evacuate "                          "request.", instance=instance)                migration = None            有两种情况,host变量是被传递的,第一种是虚机的host被传递过去,要在虚机所在的主机上进行重建,这个会跳过nova scheduler调度器;虚机重建有两种情况,一种是虚机使用原始的镜像,另一种是虚机使用非原始镜像第二种情况,在指定特定的目标主机,并且强制疏散的情况下,那么就不通过nova scheduler调度器            if host:                # We only create a new allocation on the specified host if                # we're doing an evacuate since that is a move operation.                if host != instance.host:                    self._allocate_for_evacuate_dest_host(                        context, instance, host, request_spec)            else:在相同的主机上使用新的镜像进行重建或者指定特定的主机,进行宕机疏散,但是不强制没有指定request_spec的情况下,根据虚机的镜像信息,来构造image元数据,并且来构造request_spec信息                if not request_spec:                    filter_properties = {'ignore_hosts': [instance.host]}                    # build_request_spec expects a primitive image dict                    image_meta = nova_object.obj_to_primitive(                        instance.image_meta)                    request_spec = scheduler_utils.build_request_spec(                            context, image_meta, [instance])                    request_spec = objects.RequestSpec.from_primitives(                        context, request_spec, filter_properties)                elif recreate:宕机疏散是要走这个的通过在RequestSpec中增加source host来排除调度器调度到它                    # NOTE(sbauza): Augment the RequestSpec object by excluding                    # the source host for avoiding the scheduler to pick it                    request_spec.ignore_hosts = [instance.host]排除掉虚机的host                    # NOTE(sbauza): Force_hosts/nodes needs to be reset                    # if we want to make sure that the next destination                    # is not forced to be the original host                    request_spec.reset_forced_destinations()                try:                    request_spec.ensure_project_id(instance)nova scheduler服务,根据request_spec来调度选择一个可用的计算节点                    hosts = self._schedule_instances(context, request_spec,                                                     [instance.uuid])                    host_dict = hosts.pop(0)                    host, node, limits = (host_dict['host'],                                          host_dict['nodename'],                                          host_dict['limits']).......             compute_utils.notify_about_instance_usage(                self.notifier, context, instance, "rebuild.scheduled")             instance.availability_zone = (                availability_zones.get_host_availability_zone(                    context, host))            self.compute_rpcapi.rebuild_instance(context,                    instance=instance,                    new_pass=new_pass,                    injected_files=injected_files,                    image_ref=image_ref,                    orig_image_ref=orig_image_ref,                    orig_sys_metadata=orig_sys_metadata,                    bdms=bdms,                    recreate=recreate,                    on_shared_storage=on_shared_storage,                    preserve_ephemeral=preserve_ephemeral,                    migration=migration, 此时传递了migration这个结构体                    host=host, node=node, limits=limits) 
 

Part 6

目标节点的nova-compute 服务阶段

 

nova-compute阶段 manager.py阶段
  • 1) 根据recreate值来区分是nova evacuate宕机疏散操作还是nova rebuild操作

  • 2) Recreate参数为真的情况下,nova evacuate宕机疏散,recreate为假的情况下,nova rebuild操作

  • 3) 根据选择的sceduler node 来对目标节点进行资源申请

  • 4) 获取虚机的镜像信息

  • 5) 根据虚机的uuid,读取 block_device_mapping 表来获取虚机的块设备信息,

  • 6) 获取虚机的网络信息

  • 7) 把虚机的块设备进行卸载

  • 8) 因为libvirt没有实现rebuild驱动,所以实际调用了_rebuild_default_impl方法来实现,宕机疏散和rebuild重建

  • 9) 如果是宕机疏散nova evacuate操作,那么就在目标节点上,调用spawn驱动,进行新建操作

  • 10) 如果是rebuild操作,那么先在目标节点上destory虚机,然后再调用spawn驱动,进行新建操作,如果是evacuate操作,那么直接进行重建虚机

整个nova-compute服务调用的主要函数流程如下:rebuild_instance------->_do_rebuild_instance_with_claim----->_do_rebuild_instance----->_rebuild_default_impl