环境
版本: rdo OpenStack Kilo
ceph version 0.94.7
背景介绍
先来说下OpenStack nova传统的虚拟机快照方式(这里不管nova后端存储是啥,实现方式都是一样的)
virt/libvirt/driver.py -> def snapshot(最终会走到这里)
1、获取虚拟机磁盘文件的格式
CONF.libvirt.snapshot_image_format,强制修改;
虚拟机后端类型为lvm或rbd的话,上传的glance镜像统一为raw
2、创建快照的元数据,因为对nova来说,快照即镜像。
3、uuid.uuid4().hex生成快照名字
4、获取虚拟机电源状态(shutdown下不能做在线快照,跟下面判断是否支持在线快照有关)
5、判断是否支持在线快照
CONF.ephemeral_storage_encryption.enabled和CONF.workarounds.disable_libvirt_livesnapshot
这两个配置参数为false.
6、卸载虚拟机的pci设备和sriov端口(前提:虚拟化类型不是lxc、虚拟机出于运行中或暂停状态)
7、更改虚拟机任务状态为IMAGE_PENDING_UPLOAD
8、创建快照相关的临时目录
9、进行snapshot_extract操作(实际上qemu-img convert操作)
10、重新挂载虚拟机的pci设备和sriov端口
11、上传glance
这里是不是很奇怪,如果nova和glance后端都是ceph,那虚拟机做快照的时候,把磁盘从ceph中拉到本地,然后再上传到ceph中,有没可能直接在ceph集群中做呢?
具体实现
master分支上已经实现了这个功能,具体参考:https://blueprints.launchpad.net/nova/+spec/rbd-instance-snapshots
大致步骤如下:
可能错误
# glance conf文件增加如下配置
[DEFAULT]
show_image_direct_url = True # 显示镜像后端url
# 部分代码逻辑
virt/libvirt/driver.py
try:
update_task_state(task_state=task_states.IMAGE_UPLOADING,
expected_state=task_states.IMAGE_PENDING_UPLOAD)
metadata['location'] = snapshot_backend.direct_snapshot( # 尝试rbd快照逻辑,如果不行的话,还是执行原本的快照逻辑
context, snapshot_name, image_format, image_id,
instance.image_ref)
self._snapshot_domain(context, live_snapshot, virt_dom, state,
instance)
self._image_api.update(context, image_id, metadata,
purge_props=False)
except (NotImplementedError, exception.ImageUnacceptable,
exception.Forbidden) as e:
if type(e) != NotImplementedError:
LOG.warning(_LW('Performing standard snapshot because direct '
'snapshot failed: %(error)s'), {'error': e})
failed_snap = metadata.pop('location', None)
if failed_snap:
failed_snap = {'url': str(failed_snap)}
snapshot_backend.cleanup_direct_snapshot(failed_snap,
also_destroy_volume=True,
ignore_errors=True)
update_task_state(task_state=task_states.IMAGE_PENDING_UPLOAD,
expected_state=task_states.IMAGE_UPLOADING)
snapshot_directory = CONF.libvirt.snapshots_directory
fileutils.ensure_tree(snapshot_directory)
# 利用ceph快照的机制来实现虚拟机快照
virt/libvirt/imagebackend.py
def _get_parent_pool(self, context, base_image_id, fsid):
parent_pool = None
try:
# The easy way -- the image is an RBD clone, so use the parent
# images' storage pool
parent_pool, _im, _snap = self.driver.parent_info(self.rbd_name)
except exception.ImageUnacceptable:
# The hard way -- the image is itself a parent, so ask Glance
# where it came from
LOG.debug('No parent info for %s; asking the Image API where its '
'store is', base_image_id)
try:
image_meta = IMAGE_API.get(context, base_image_id,
include_locations=True)
except Exception as e:
LOG.debug('Unable to get image %(image_id)s; error: %(error)s',
{'image_id': base_image_id, 'error': e})
image_meta = {}
# Find the first location that is in the same RBD cluster
for location in image_meta.get('locations', []):
try:
parent_fsid, parent_pool, _im, _snap = \
self.driver.parse_url(location['url'])
if parent_fsid == fsid:
break
else:
parent_pool = None
except exception.ImageUnacceptable:
continue
if not parent_pool:
raise exception.ImageUnacceptable(
_('Cannot determine the parent storage pool for %s; '
'cannot determine where to store images') %
base_image_id)
return parent_pool
def direct_snapshot(self, context, snapshot_name, image_format,
image_id, base_image_id):
"""Creates an RBD snapshot directly.
"""
fsid = self.driver.get_fsid()
# NOTE(nic): Nova has zero comprehension of how Glance's image store
# is configured, but we can infer what storage pool Glance is using
# by looking at the parent image. If using authx, write access should
# be enabled on that pool for the Nova user
parent_pool = self._get_parent_pool(context, base_image_id, fsid)
# Snapshot the disk and clone it into Glance's storage pool. librbd
# requires that snapshots be set to "protected" in order to clone them
self.driver.create_snap(self.rbd_name, snapshot_name, protect=True)
location = {'url': 'rbd://%(fsid)s/%(pool)s/%(image)s/%(snap)s' %
dict(fsid=fsid,
pool=self.pool,
image=self.rbd_name,
snap=snapshot_name)}
try:
self.driver.clone(location, image_id, dest_pool=parent_pool)
# Flatten the image, which detaches it from the source snapshot
self.driver.flatten(image_id, pool=parent_pool) # flatten之后相当一个完整的克隆
finally:
# all done with the source snapshot, clean it up
self.cleanup_direct_snapshot(location)
# Glance makes a protected snapshot called 'snap' on uploaded
# images and hands it out, so we'll do that too. The name of
# the snapshot doesn't really matter, this just uses what the
# glance-store rbd backend sets (which is not configurable).
self.driver.create_snap(image_id, 'snap', pool=parent_pool,
protect=True)
return ('rbd://%(fsid)s/%(pool)s/%(image)s/snap' %
dict(fsid=fsid, pool=parent_pool, image=image_id))
def cleanup_direct_snapshot(self, location, also_destroy_volume=False,
ignore_errors=False):
"""Unprotects and destroys the name snapshot.
With also_destroy_volume=True, it will also cleanup/destroy the parent
volume. This is useful for cleaning up when the target volume fails
to snapshot properly.
"""
if location:
_fsid, _pool, _im, _snap = self.driver.parse_url(location['url'])
self.driver.remove_snap(_im, _snap, pool=_pool, force=True,
ignore_errors=ignore_errors)
if also_destroy_volume:
self.driver.destroy_volume(_im, pool=_pool)
1、nova ceph pool用户cinder对glance ceph pool没有写的权限(根据ceph官方文档操作)
解决:在线为pool用户cinder增加images pool的写权限
ceph auth caps client.cinder mon 'allow r' \
osd 'allow class-read object_prefix rbd_children, \
allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images'
转载于:https://blog.51cto.com/iceyao/1921487