xm migrate源码分析
xen动态迁移虚机的命令为:xm migrate --live <domain id> <destination machine>
迁移的原理
Xen live migration begins by sending a request, or reservation, to the target specifying the resources the migrating domain will need. If the target accepts the request, the source begins the iterative precopy phase of migration. During this step, Xen copies pages of memory over a TCP connection to the destination host. While this is happening, pages that change are marked as dirty and then recopied. The machine iterates this until only very frequently changed pages remain, at which point it begins the stop and copy phase. Now Xen stops the VM and copies over any pages that change too frequently to efficiently copy during the previous phase. In practice, our testing suggests that Xen usually reaches this point after four to eight iterations. Finally the VM starts executing on the new machine.
By default, Xen will iterate up to 29 times and stop if the number of dirty pages falls below a certain threshold. You can specify this threshold and the number of iterations at compile time, but the defaults should work fine.
xen动态迁移分为save和restore两部分,先看save部分。
tools/python/xen/xm/migrate.py
该函数主要是对输入的命令作参数解析,然后跳转到
server.xend.domain.migrate(dom, dst, opts.vals.live,
opts.vals.port,
opts.vals.node,
opts.vals.ssl)
-->XendDomain.domain_migrate()
tools/python/xen/xend/XendDomain.py
domain_migrate(self, domid, dst, live=False, port=0, node=-1, ssl=None):
1. Make sure vm existing and being running
dominfo = self.domain_lookup_nr(domid) # get a structure of XendDomainInfo
if not dominfo:
raise XendInvalidDomain(str(domid))
if dominfo.getDomid() == DOM0_ID:
raise XendError("Cannot migrate privileged domain %s" % domid)
if dominfo._stateGet() != DOM_STATE_RUNNING:
raise VMBadState("Domain is not running",
POWER_STATE_NAMES[DOM_STATE_RUNNING],POWER_STATE_NAMES[dominfo._stateGet()])
2. Notify all device about intention of migration
dominfo.testMigrateDevices(True, dst)
---> XendDomainInfo.migrateDevice(n, c, network, dst, DEV_MIGRATE_TEST, self.getName())
---> XendDomainInfo.getDeviceController(deviceClass).migrate(deviceConfig,
network, dst, step, domName)
---> DevController.migrate()
由于xoptions.get_external_migration_tool()返回为空,实际上什么也不做,直接返回0
This function is called for 4 steps:
If step == 0: Check whether the device is ready to be migrated
or can at all be migrated; return a '-1' if
the device is NOT ready, a '0' otherwise. If it is
not ready ( = not possible to migrate this device),
migration will not take place.
step == 1: Called immediately after step 0; migration
of the kernel has started;
step == 2: Called after the suspend has been issued
to the domain and the domain is not scheduled anymore.
Synchronize with what was started in step 1, if necessary.
Now the device should initiate its transfer to the
given target. Since there might be more than just
one device initiating a migration, this step should
put the process performing the transfer into the
background and return immediately to achieve as much
concurrency as possible.
step == 3: Synchronize with the migration of the device that
was initiated in step 2.
Make sure that the migration has finished and only
then return from the call.
这儿DEV_MIGRATE_TEST=0,即步骤0
3. For live migration, make sure there's memory free for enabling shadow mode
dominfo.checkLiveMigrateMemory()
--->XendDomainInfo.checkLiveMigrateMemory()
迁移需要的内存为:1MB per vcpu plus 4Kib/Mib of RAM
这些内存通过balloon.free(overhead_kb, self)获得
4. 如果使用--ssl选项,则建立SSL连接;否则,建立普通tcp连接
5. 开始迁移
XendCheckpoint.save(sock.fileno(), dominfo, True, live, dst, node=node)
--->XendCheckpoint.save()
tools/python/xen/xend/XendCheckPoint.py
def save(fd, dominfo, network, live, dst, checkpoint=False, node=-1)
1. 首先发送SIGNATURE,即LinuxGuestRecord
2. 为了避免将虚机迁移到本机时虚机重名,将原虚机暂时重命名为migrating-domain_name
3. 发送配置文件
4. 真正迁移是在 forkHelper(cmd, fd, saveInputHandler, False)中做的,即创建一个子进程来执行xc_save,而主进程继续执行后面的步骤:
其中,cmd为 /usr/lib64/xen/bin/xc_save fd domid 0 0 str(int(live)|(int(hvm)<<2))
(xc_save的参数格式为/usr/lib64/xen/bin/xc_save iofd domid maxit maxf flags)
def saveInputHandler(line, tochild):
log.debug("In saveInputHandler %s", line)
if line == "suspend":
log.debug("Suspending %d ...", dominfo.getDomid())
dominfo.shutdown('suspend')
dominfo.waitForSuspend()
if line in ('suspend', 'suspended'):
dominfo.migrateDevices(network, dst, DEV_MIGRATE_STEP2,
domain_name)
log.info("Domain %d suspended.", dominfo.getDomid())
dominfo.migrateDevices(network, dst, DEV_MIGRATE_STEP3,
domain_name)
if hvm:
dominfo.image.saveDeviceModel()
if line == "suspend":
tochild.write("done\n")
tochild.flush()
log.debug('Written done')
5. 发送qemu设备的状态(即/var/lib/xen/qemu-save.7)
6. 由于传入的checkpoint参数为True,将suspended的虚机resume
7. 最后,将重命名的虚机改为原来的名字
下面是restore部分
tools/xen/xend/server/SrvDaemon.py
xend在启动的时候会打开8002端口的监听:
Daemon()-->run()
-->relocate.listenRelocation()
-->tcp.TCPListener(RelocationProtocol, port, interface = interface, hosts_allow = hosts_allow)
相应的参数在xend配置文件中设置:
(xend-relocation-server yes)
(xend-relocation-port 8002)
(xend-relocation-hosts-allow '^localhost$ ^localhost\\.localdomain$')
建立连接后的处理由RelocationProtocol完成,其中最重要的函数为op_receive:
def op_receive(self, name, _):
if self.transport:
self.send_reply(["ready", name])
try:
XendDomain.instance().domain_restore_fd(
self.transport.sock.fileno(), relocating=True)
except:
self.send_error()
self.close()
else:
log.error(name + ": no transport")
raise XendError(name + ": no transport")
而XendDomain.domain_restore_fd()是通过XendCheckpoint.restore(self, fd, paused=paused, relocating=relocating)来完成的。
tools/python/xen/xend/XendCheckPoint.py
def restore(xd, fd, dominfo = None, paused = False, relocating = False):
1. 首先接收并验证SIGNATURE
2. 接收配置文件
3. 确保本机上没有与要迁移的虚机同名或同UUID的虚机
4. 根据接收的配置文件建立XendDomainInfo结构:
dominfo = xd.restore_(vmconfig)-->XendDomainInfo.restore(config)
5. 如果原虚机设置了node_to_cpu,则绑定vcpu到相应的pcpu
6. 创建image并分配内存(shadow_memory + memory_dynamic_max)
restore_image = image.create(dominfo, dominfo.info)
balloon.free(memory + shadow, dominfo)
7. 真正迁移是在forkHelper(cmd, fd, handler.handler, True)中做的,即创建一个子进程来执行xc_restore,而主进程继续执行后面的步骤:
其中,cmd为:
cmd = map(str, [xen.util.auxbin.pathTo(XC_RESTORE),
fd, dominfo.getDomid(),
store_port, console_port, int(is_hvm), pae, apic])
(xc_save的参数格式为/usr/lib64/xen/bin/xc_restore iofd domid store_evtchn console_evtchn hvm pae apic)
def handler(self, line, _):
m = re.match(r"^(store-mfn) (\d+)$", line)
if m:
self.store_mfn = int(m.group(2))
else:
m = re.match(r"^(console-mfn) (\d+)$", line)
if m:
self.console_mfn = int(m.group(2))
5. 接收qemu设备的状态 (即/var/lib/xen/qemu-save.7)
6. 设置虚机cpuid,并创建虚机:
restore_image.setCpuid()
os.read(fd, 1) # Wait for source to close connection
dominfo.completeRestore(handler.store_mfn, handler.console_mfn):
self._introduceDomain()
self.image = image.create(self, self.info)
if self.image:
self.image.createDeviceModel(True)
self._storeDomDetails()
self._registerWatches()
self.refreshShutdown()
7. 最后,启动虚机:
dominfo.waitForDevices()
dominfo.unpause()
xc_save和xc_restore的源码
暂时还没有仔细看。