Resize功能,除了会调用flavor之外,和迁移功能几乎完全一致,会走schedule过程。下面只说迁移的实现,迁移实现后resize功能自然同步实现。
一、热迁移需要shared storage,如NFS存储后端
二、移植云主机(Dashboard->migrate instance,# nova migrate instance_id)
==Dashboard的migrate instance(移植云主机,冷迁移,实例需要关机)就是执行命令nova migrate,基于ssh,不能指定目标host(依据nova-scheduler)
==Dashboard的Live Migrate Instance(热迁移主机,os-migrateLive)就是执行命令nova live-migration,基于qemu+tcp,能指定目标host,其中默认--block-migrate=false(第7步),热迁移需要共享存储,也可以不要共享存储执行块迁移,另外kilo之前的版本还需要更改nova.conf配置开启Live Migration。
The migration types are:
A.Non-live migration (sometimes referred to simply as ‘migration’).
The instance is shut down for a period of time to be moved to another hypervisor. In this case, the instance recognizes that it was rebooted.
B.Live migration (or ‘true live migration’).
Almost no instance downtime. Useful when the instances must be kept running during the migration. The different types of live migration are:
a.Shared storage-based live migration.
Both hypervisors have access to shared storage.
b.Block live migration.
No shared storage is required. Incompatible with read-only devices such as CD-ROMs and Configuration Drive (config_drive).
c.Volume-backed live migration.
Instances are backed by volumes rather than ephemeral disk, no shared storage is required, and migration is supported
(currently only available for libvirt-based hypervisors).
Enabling true live migration:
Prior to the Kilo release, the Compute service did not use the libvirt live migration function by default.
Because there is a risk that the migration process will never end if the guest operating system uses blocks on the disk faster than
they can be migrated. To enable this function, add the following line to the [libvirt] section of the nova.conf file:
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED
1. 管理面板Dashboard直接报错,即是不能ssh到另一台计算节点上去建立用于存储准备迁移的实例的文件夹:
Success: Scheduled migration (pending confirmation) of Instance: migration-test1
Error: Failed to launch instance "migration-test1": Please try again later [Error: Unexpected error while running command.
Command: ssh 192.168.1.123 mkdir -p /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834
Exit code: 255 Stdout: u'' Stderr: u'Host key verification failed.\r\n'].
2. 出现这个错误的原因在于用以ssh的账号nova默认是不登陆的nologin
# cat /etc/passwd | grep nova
nova:x:162:162:OpenStack Nova Daemons:/var/lib/nova:/sbin/nologin
3. 所有计算节点上分别设置nova用户可登陆,生成密钥对,并且拷贝公钥到其他所有计算节点上
# usermod -s /bin/bash nova //更改nova用户的shell
# passwd nova //设置nova的密码
# su - nova //切换到nova
-bash-4.2$ ssh-keygen //生成密钥对
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/nova/.ssh/id_rsa): //直接回车,(必须默认)使用id_rsa作为文件名
Created directory '/var/lib/nova/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/nova/.ssh/id_rsa.
Your public key has been saved in /var/lib/nova/.ssh/id_rsa.pub.
The key fingerprint is:
53:ef:8d:99:15:4f:d9:fd:a3:17:c8:e4:f8:89:44:b4 nova@compute1
The key's randomart image is:
+--[ RSA 2048]----+
| . |
| . . +|
| .E ...+|
| ...= .+.|
| S o.+.oo|
| ...o*o o|
| .=+.. |
| . |
| |
+-----------------+
-bash-4.2$ ssh-copy-id -i /var/lib/nova/.ssh/id_rsa.pub compute2 //把公钥写入到目标计算节点的/var/lib/nova/.ssh/authorized_keys
nova@compute2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'compute2'"
and check to make sure that only the key(s) you wanted were added.
另外,如果提示ssh-copy-id command not found的话,那就直接人工的scp过去:
-bash-4.2$ cat ~/.ssh/id_rsa.pub | ssh nova@compute2 'cat >> ~/.ssh/authorized_keys'
等于分别执行了下面两条命令:
①在本地机器上执行:scp ~/.ssh/id_rsa.pub nova@compute2:/~
②到远程机器上执行:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
同时记得.ssh文件夹和authorized_keys文件的权限:
-bash-4.2$ chmod 600 .ssh/authorized_keys
-bash-4.2$ chmod 700 -R .ssh
到此,Dashboard再次执行迁移,报错:
Error: Failed to launch instance "migration-test1": Please try again later [Error: Unexpected error while running command.
Command: ssh 192.168.1.123 mkdir -p /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834
Exit code: 255 Stdout: u'' Stderr: u'Permission denied, please try again.\r\nPermission denied, please try again.\r\nR].
在当前计算节点去ssh compute2还是提示输入密码
4. 在计算节点把nova的私钥存入到ssh-agent中
-bash-4.2$ eval `ssh-agent` //处理下ssh-agent的输出先,不然ssh-add报Could not open a connection to your authentication agent
Agent pid 40697
eval [arg ...]
The args are read and concatenated together into a single command. This command is then read and executed by the shell, and its exit status is
returned as the value of eval. If there are no args, or only null arguments, eval returns 0.
-bash-4.2$ ssh-add .ssh/id_rsa //加入私钥到ssh-agent
$ ssh-add -l //验证查看已添加的密钥
-bash-4.2$ ssh compute2 //测试计算节点2,成功无密码登陆了
5. (可选)更改nova用户的ssh不使用known_hosts来记录核查连接主机,这样做的目的和好处在于避免以后用相同IP但不同系统/机器来连接时需要首先去known_hosts中删除对应记录
-bash-4.2$ vim .ssh/config
Host *
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
a. StrictHostKeyChecking no 表示连接主机密钥发生变化也不拒绝连接,也就是不用强制先去known_hosts中删除对应项
b. UserKnownHostsFile=/dev/null 给个空文件给known_hosts文件,直接不存known_hosts
6. 在Dashboard上再次执行迁移(migrate instance,# nova migrate instance_id)
之前的迁移报错,可能会导致实例状态变为Error,所以在其所在计算节点上重置下状态为Active
# nova reset-state --active d8db2011-217b-433d-aa80-06230203a834 //控制节点重置实例状态
# nova migrate --poll d8db2011-217b-433d-aa80-06230203a834 //命令行来执行,DAshboard上会同步显示
Server migrating... 100% complete
Finished
# tail -f /var/log/nova/nova-compute.log //当前计算节点
2016-02-17 19:09:50.796 23958 INFO nova.compute.manager [req-e22d9e9a-96ce-4aa6-9027-cf29d0997fd4 None]
[instance: d8db2011-217b-433d-aa80-06230203a834] During sync_power_state the instance has a pending task (resize_migrating). Skip.
2016-02-17 19:09:51.030 23958 INFO nova.virt.libvirt.driver [req-019123e1-4f1d-4e3e-931a-ae528d193193 None]
[instance: d8db2011-217b-433d-aa80-06230203a834] Instance shutdown successfully after 3 seconds.
留意的是:迁移之后需要确认,确认之前实例就已经恢复正常运行;这个给个revert的选项在于还可以复原迁移。到此,迁移完成。
冷迁移的过程是:首先把实例给关闭了,然后重命名实例文件夹为instanceid_resize,ssh把实例文件给拷贝过去,数据库更改,在新计算节点上启动实例,在dashboard上显示确认迁移,确认后删除原有实例文件,删除老节点qemu下的xml文件并在新节点新建(更加具体流程还需要看原码)。
另外1. 如果前面配置前的迁移报错造成一些数据不同步的情况:数据库没有了,但是计算节点检测到还有。计算节点每10分钟就会去synchronizing instance power states一次。老计算节点nova-compute.log显示如下:
2016-02-18 10:26:39.111 89998 WARNING nova.compute.manager [-] While synchronizing instance power states,
found 2 instances in the database and 3 instances on the hypervisor.
发现老节点上实例文件夹_resize还在:
drwxr-xr-x 2 nova nova 69 2月 18 10:26 ca23a858-f0fc-406c-a14b-a25a47356361_resize
总用量 106G
-rw-rw---- 1 root root 0 1月 21 19:33 console.log
-rw-r--r-- 1 root root 106G 2月 18 09:49 disk
-rw-r--r-- 1 nova nova 79 11月 27 13:49 disk.info
-rw-r--r-- 1 nova nova 2.6K 1月 18 23:16 libvirt.xml
......
2016-02-18 16:35:34.071 89998 WARNING nova.compute.manager [-] While synchronizing instance power states,
found 0 instances in the database and 2 instances on the hypervisor.
# virsh list --all //列出hypervisor识别的所有的实例,包括关机的
Id 名称 状态
----------------------------------------------------
- instance-0000008a 关闭
- instance-0000008c 关闭
如果实例已经搬迁过去了,那老节点上直接给undefine删除就好。
# virsh undefine instance-0000008a //删除对该实例的定义,也就是删除实例,之后便没有再提示还有实例在hypervisor上。搞定。
另外2. 至于(新节点)hypervisor上实例的数量,在实例错误状态,电源nostate时,实例在新节点上没自动启动,没有libvirt.xml文件,且新节点nova-compute.log提示少了个实例:
2016-02-18 10:36:47.347 144832 WARNING nova.compute.manager [-] While synchronizing instance power states,
found 9 instances in the database and 8 instances on the hypervisor.
则硬启动(hard reboot(建立/etc/libvirt/qemu/下的xml文件))实例之后,建立了libvirt.xml文件,数据库和hypervisor识别的实例数变得一致。
三、热迁移主机(Dashboard->Live Migrate Instance,# nova live-migration --block-migrate instance_id host)
在上面冷迁移的开启计算节点nova用户可登陆、密钥生成、拷贝公钥,添加私钥步骤基础之上,进行下面的步骤。
# vim /etc/nova/nova.conf //计算节点取消注释
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED
# systemctl status openstack-nova-api
# systemctl restart openstack-nova-compute //重启nova-api和nova-compute
# nova live-migration --block-migrate d8db2011-217b-433d-aa80-06230203a834 compute2 //控制节点执行迁移
# tail -f /var/log/nova/nova-compute.log //当前计算节点日志
2016-02-17 11:31:15.179 11959 ERROR nova.virt.libvirt.driver [-] [instance: d8db2011-217b-433d-aa80-06230203a834]
Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute2/system:
unable to connect to server at 'compute2:16509': Connection refused
# tail -f /var/log/nova/nova-compute.log //目标计算节点日志
2016-02-17 11:31:14.250 46026 WARNING nova.virt.disk.vfs.guestfs [req-fa8c2e70-9679-493d-b0ba-170a9a0343d5 None]
Failed to close augeas aug_close: do_aug_close: you must call 'aug-init' first to initialize Augeas
2016-02-17 11:31:15.442 46026 WARNING nova.virt.libvirt.driver [-] [instance: d8db2011-217b-433d-aa80-06230203a834]
During wait destroy, instance disappeared.
2016-02-17 11:31:16.056 46026 INFO nova.virt.libvirt.driver [req-fa8c2e70-9679-493d-b0ba-170a9a0343d5 None]
[instance: d8db2011-217b-433d-aa80-06230203a834] Deleting instance files /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834_del
2016-02-17 11:31:16.057 46026 INFO nova.virt.libvirt.driver [req-fa8c2e70-9679-493d-b0ba-170a9a0343d5 None]
[instance: d8db2011-217b-433d-aa80-06230203a834] Deletion of /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834_del complete
# netstat -an | grep 16509 //目标计算节点发现这个端口没有开,该端口用于libvirtd的TCP连接监听
# grep listen_ /etc/libvirt/libvirtd.conf //目标计算节点上的libvirtd配置,#的为末日,对应没有#的为手工配置
# This is enabled by default, uncomment this to disable it
#listen_tls = 0
listen_tls = 0
# This is disabled by default, uncomment this to enable it.
#listen_tcp = 1
listen_tcp = 1
#listen_addr = "192.168.0.1"
listen_addr = "0.0.0.0"
......
<pre name="code" class="html"># Override the port for accepting secure TLS connections
# This can be a port number, or service name
#
#tls_port = "16514"
# Override the port for accepting insecure TCP connections
# This can be a port number, or service name
#
#tcp_port = "16509"
......
#auth_tcp = "sasl"
auth_tcp = "none"
# grep LIBVIRTD_ARGS
/etc/sysconfig/libvirtd //开启TCP端口监听,#的为默认配置
# Listen for TCP/IP connections
# NB. must setup TLS/SSL keys prior to using this
#LIBVIRTD_ARGS="--listen"
LIBVIRTD_ARGS="--listen"
# systemctl restart libvirtd //重启libvirtd
# netstat -an | grep 16509 //检测端口开放
tcp 0 0 0.0.0.0:16509 0.0.0.0:* LISTEN
再次命令行迁移
# tail -f /var/log/nova/nova-compute.log //当前计算节点日志
2016-02-17 15:35:34.402 11959 ERROR nova.virt.libvirt.driver [-] [instance: d8db2011-217b-433d-aa80-06230203a834]
Live Migration failure: internal error: unable to execute QEMU command 'migrate': this feature or command is not currently supported
# tail -f /var/log/nova/nova-compute.log //目标计算节点日志
2016-02-17 15:35:32.898 46026 WARNING nova.virt.disk.vfs.guestfs [req-d810867d-9880-48cb-8b24-3a9ed357178d None]
Failed to close augeas aug_close: do_aug_close: you must call 'aug-init' first to initialize Augeas
2016-02-17 15:35:34.183 46026 INFO nova.compute.manager [-] [instance: d8db2011-217b-433d-aa80-06230203a834]
VM Started (Lifecycle Event)
2016-02-17 15:35:34.302 46026 INFO nova.compute.manager [-] [instance: d8db2011-217b-433d-aa80-06230203a834]
During the sync_power process the instance has moved from host compute5 to host compute4
2016-02-17 15:35:34.400 46026 INFO nova.compute.manager [-] [instance: d8db2011-217b-433d-aa80-06230203a834]
VM Stopped (Lifecycle Event)
2016-02-17 15:35:34.519 46026 INFO nova.compute.manager [-] [instance: d8db2011-217b-433d-aa80-06230203a834]
During the sync_power process the instance has moved from host compute5 to host compute4
2016-02-17 15:35:34.658 46026 WARNING nova.virt.libvirt.driver [-] [instance: d8db2011-217b-433d-aa80-06230203a834]
During wait destroy, instance disappeared.
2016-02-17 15:35:35.232 46026 INFO nova.virt.libvirt.driver [req-d810867d-9880-48cb-8b24-3a9ed357178d None]
[instance: d8db2011-217b-433d-aa80-06230203a834] Deleting instance files /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834_del
2016-02-17 15:35:35.233 46026 INFO nova.virt.libvirt.driver [req-d810867d-9880-48cb-8b24-3a9ed357178d None]
[instance: d8db2011-217b-433d-aa80-06230203a834] Deletion of /var/lib/nova/instances/d8db2011-217b-433d-aa80-06230203a834_del complete
G之后说可能CentOS7想用的是qemu-kvm-rhev而非qemu-kvm,在控制节点和计算节点上都配置repo、安装这个包
# vim /etc/yum.repos.d/qemu-kvm-rhev.repo
[qemu-kvm-rhev]
name=oVirt rebuilds of qemu-kvm-rhev
baseurl=http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/
mirrorlist=http://resources.ovirt.org/pub/yum-repo/mirrorlist-ovirt-3.5-el7Server
enabled=1
skip_if_unavailable=1
gpgcheck=0
# yum -y install qemu-kvm-rhev
再次命令行迁移,同样的错误 - 待续。。。。。。