GitLab在docker和Kubernetes之间折腾

1. 概述

最近用上了Kubernetes,刚好又要求Gitlab AutoDev配合Kubernetes,所以将旧的Gitlab升级下,并迁移成了helm版本。
但是在使用过程中,发现并不如docker版本稳定,特别是pod在重新分配后,在节点上pull image失败问题,即使配置了镜像加速,虽然有办法解决(我是多个节点去pull,某个成功后,tag,push到私有仓库,再到pod分配节点pull,tag)。
另外,helm版本的gitlab组件全部分离,相互依赖,非常复杂。在不熟悉gitlab内部原理和通信的情况下,排查问题非常困难。
于是,又将helm版本迁移到docker版本,反而是遇到了坑。本想放弃迁移回docker版本,谁想在2018年12月11日,git突然push失败了,而且问题排查半天,也没发现有用的信息。转而解决在docker版本下恢复helm版本的gitlab数据对象存储问题,好在解决了,虽然也有些问题,至少能用了。

2. Gitlab从docker迁移到Kubernetes

Gitlab docker: GitLab-CE 9.5.4 Gitlab Kubernetes: GitLab-CE 11.4.3

2.1 备份恢复过程

进入gitlab docker:

docker exec -it mygitlab /bin/bash

执行备份:


gitlab-rake gitlab:backup:create &  # 避免超时自动退出

helm安装gitlab:

helm upgrade --install gitlab --timeout 600 --namespace devops . \
--set gitlab.migrations.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ce \
--set gitlab.sidekiq.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce \
--set gitlab.unicorn.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-unicorn-ce \
--set gitlab.unicorn.workhorse.image=registry.gitlab.com/gitlab-org/build/cng/gitlab-workhorse-ce \
--set gitlab.task-runner.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-task-runner-ce \
--set certmanager-issuer.email=29ygq@sina.com \
--set gitlab.migrations.initialRootPassword.key="Git@domain.com" \
--set glabal.initialRootPassword.key="Git@domain.com" \
--set global.hosts.domain=domain.com \
--set global.smtp.password.secret=163mail \
--set global.time_zone="Asia/Shanghai" \
--set gitlab.gitaly.persistence.storageClass=ceph-rbd \
--set gitlab.gitaly.persistence.size=20Gi \
--set gitlab.gitaly.persistence.storageClass=ceph-rbd \
--set postgresql.persistence.size=5Gi \
--set postgresql.persistence.storageClass=ceph-rbd \
--set minio.persistence.size=5Gi \
--set minio.persistence.storageClass=ceph-rbd \
--set redis.persistence.size=1Gi \
--set redis.persistence.storageClass=ceph-rbd \
--set nginx-ingress.enabled=false \
--set prometheus.install=false \
--set certmanager.install=false \
--set gitlab.gitlab-shell.service.externalPort=2222 \
--set gitlab.gitlab-shell.service.internalPort=2222 \
--set gitlab.gitlab-runner.rbac.clusterWideAccess=true \
--set gitlab.gitlab-runner.rbac.create=true \
--set gitlab.gitlab-runner.runners.privileged=true 

将备份文件拷出来,并同步到Kubernetes master节点:

docker cp mygitlab:/var/opt/gitlab/backups/1541148206_2018_11_02_9.5.4_gitlab_backup.tar /tmp/
rsync -avzP /tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar root@master_ip:/tmp/

进入Kubernetes上的gitlab task-runner pod中:

kubectl exec <task-runner pod name> -it /bin/bash 

执行恢复:

backup-utility --restore -f file:///tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar &

2.2 恢复失败解决

恢复中断:

[root@lab1 tmp]# kubectl describe pod gitlab-task-runner-6cfc59d65d-zg64k  -n devops
Name:               gitlab-task-runner-6cfc59d65d-zg64k
Namespace:          devops
Priority:           0
PriorityClassName:  <none>
Node:               lab1/
Start Time:         Wed, 31 Oct 2018 17:04:30 +0800
Labels:             app=task-runner
                    pod-template-hash=6cfc59d65d
                    release=gitlab
Annotations:        checksum/config: dcbb3fa285b4af4cc7d76913713f826335d0826672270a02beeda74bdac8f528
                    cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status:             Failed
Reason:             Evicted
Message:            The node was low on resource: ephemeral-storage. Container task-runner was using 11874956Ki, which exceeds its request of 0. 

Gitlab上有issue说明,但是当前官方版本还没有修复。 https://gitlab.com/charts/gitlab/issues/705 https://github.com/kubernetes/enhancements/issues/361

我们按他的想法,给task-runner添加持久化存储。

pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gitlab-task-runner
  namespace: devops
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ceph-rbd 
  resources:
    requests:
      storage: 20Gi

kubectl apply -f pvc.yaml

kubectl edit deploy gitlab-task-runner -n devops

找到相关地方,添加挂载点和存储点:

        volumeMounts:
        - mountPath: /srv/gitlab/tmp
          name: data-storage-tmp
      volumes:
      - name: data-storage-tmp
        persistentVolumeClaim:
          claimName: gitlab-task-runner          

待POD重新启来,再将备份文件拷进去,重新恢复:

kubectl cp /tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar devops/gitlab-task-runner-5d859b8c8d-gbggx:/srv/gitlab/tmp/
kubectl exec -it gitlab-task-runner-5d859b8c8d-gbggx -n devops -- backup-utility --restore -f file:///srv/gitlab/tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar

恢复后,登录系统,发现仓库为空。

尝试将目录repositories同步过去恢复。

docker目录为/var/opt/gitlab/git-data/repositories k8s里目录为/home/git/repositories

恢复后数据可用。

3. Gitlab从Kubernetes迁移到docker

Gitlab docker: GitLab-CE 11.5.2 Gitlab chart: 1.3.2 GitLab-CE 11.5.2

3.1 备份恢复过程

因为有好几G的数据量了,为了加快备份速度,不备份repositories,而是手动打包repositories。

kubectl exec <task-runner pod name> -i /bin/bash backup-utility --skip repositories

备份好的文件类似下面文件名: 1544529772_2018_12_11_11.5.2_gitlab_backup.tar

提前安装docker版gitlab,当时我装的时候版本为11.5.2。

sudo docker run --detach \
  --hostname gitlab.domain.com \
  --publish 443:443 --publish 80:80 --publish 10022:22 \
  --name gitlab \
  --restart always \
  --volume /data/gitlab/config:/etc/gitlab \
  --volume /data/gitlab/logs:/var/log/gitlab \
  --volume /data/gitlab/data:/var/opt/gitlab \
  gitlab/gitlab-ce:latest

因为恢复过程中,它会寻找/data/gitlab/data/backups/目录下的备份打包文件,因此,我们可以手动准备已经解压好的备份文件,然后打包一个小文件为1544529772_2018_12_11_11.5.2_gitlab_backup.tar以减少恢复过程中的解压时间(特别对于多次恢复)。

恢复命令: docker exec -it gitlab gitlab-rake gitlab:backup:restore

直接将备份恢复后,在页面上随便都会报500错误,日志提示'Object Storage is not enabled',需要按下文开启lfs功能再恢复数据。

3.2 500错误

  • 情况一:

上文提到的'Object Storage is not enabled'gitlab.rb需要开启lfs功能。

### Job Artifacts
# gitlab_rails['artifacts_enabled'] = true
# gitlab_rails['artifacts_path'] = "/var/opt/gitlab/gitlab-rails/shared/artifacts"
####! Job artifacts Object Store
####! Docs: https://docs.gitlab.com/ee/administration/job_artifacts.html#using-object-storage
gitlab_rails['artifacts_object_store_enabled'] = true
gitlab_rails['artifacts_object_store_direct_upload'] = true
gitlab_rails['artifacts_object_store_background_upload'] = true
gitlab_rails['artifacts_object_store_proxy_download'] = true
gitlab_rails['artifacts_object_store_remote_directory'] = "artifacts"
gitlab_rails['artifacts_object_store_connection'] = {
  'provider' => 'AWS',
  'region' => 'eu-west-1',
  'aws_access_key_id' => 'Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld',
  'aws_secret_access_key' => 'ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT',
  # # The below options configure an S3 compatible host instead of AWS
    'aws_signature_version' => 4, # For creation of signed URLs. Set to 2 if provider does not support v4.
    'endpoint' => 'http://minio地址:9000', # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces
    'host' => 'localhost',
    'path_style' => true # Use 'host/bucket_name/object' instead of 'bucket_name.host/object'
}

### Git LFS
gitlab_rails['lfs_enabled'] = true
gitlab_rails['lfs_storage_path'] = "/var/opt/gitlab/gitlab-rails/shared/lfs-objects"
gitlab_rails['lfs_object_store_enabled'] = true
gitlab_rails['lfs_object_store_direct_upload'] = true
gitlab_rails['lfs_object_store_background_upload'] = true
gitlab_rails['lfs_object_store_proxy_download'] = true
gitlab_rails['lfs_object_store_remote_directory'] = "lfs-objects"
gitlab_rails['lfs_object_store_connection'] = {
  'provider' => 'AWS',
  'region' => 'eu-west-1',
  'aws_access_key_id' => 'Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld',
  'aws_secret_access_key' => 'ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT#',
  # # The below options configure an S3 compatible host instead of AWS
   'aws_signature_version' => 4, # For creation of signed URLs. Set to 2 if provider does not support v4.
  # 'endpoint' => 'https://s3.amazonaws.com', # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces
    'host' => 'localhost',
    'endpoint' => 'http://minio地址:9000',
    'path_style' => true
#   # 'path_style' => false # Use 'host/bucket_name/object' instead of 'bucket_name.host/object'
}

### GitLab uploads
###! Docs: https://docs.gitlab.com/ee/administration/uploads.html
gitlab_rails['uploads_storage_path'] = "/var/opt/gitlab/gitlab-rails/public"
gitlab_rails['uploads_base_dir'] = "uploads/-/system"
gitlab_rails['uploads_object_store_enabled'] = true
gitlab_rails['uploads_object_store_direct_upload'] = true
gitlab_rails['uploads_object_store_background_upload'] = true
gitlab_rails['uploads_object_store_proxy_download'] = true
gitlab_rails['uploads_object_store_remote_directory'] = "uploads"
gitlab_rails['uploads_object_store_connection'] = {
   'provider' => 'AWS',
   'region' => 'eu-west-1',
   'aws_access_key_id' => 'Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld',
   'aws_secret_access_key' => 'ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT',
#   # # The below options configure an S3 compatible host instead of AWS
    'host' => 'localhost',
    'aws_signature_version' => 4, # For creation of signed URLs. Set to 2 if provider does not support v4.
    'endpoint' => 'http://minio地址:9000', # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces
    'path_style' => true # Use 'host/bucket_name/object' instead of 'bucket_name.host/object'
}

恢复备份数据后,gitlab的docker中执行以下命令迁移数据至对象存储:

 # Avatars
 gitlab-rake "gitlab:uploads:migrate[AvatarUploader, Project, :avatar]"
 gitlab-rake "gitlab:uploads:migrate[AvatarUploader, Group, :avatar]"
 gitlab-rake "gitlab:uploads:migrate[AvatarUploader, User, :avatar]"
 
 # Attachments
 gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Note, :attachment]"
 gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Appearance, :logo]"
 gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Appearance, :header_logo]"
 
 # Markdown
 gitlab-rake "gitlab:uploads:migrate[FileUploader, Project]"
 gitlab-rake "gitlab:uploads:migrate[PersonalFileUploader, Snippet]"
 gitlab-rake "gitlab:uploads:migrate[NamespaceFileUploader, Snippet]"
 gitlab-rake "gitlab:uploads:migrate[FileUploader, MergeRequest]"

然后再将repositories覆盖到/data/gitlab/data/git-data/repositories/,并进gitlab docker把目录属主改为git。 在gitlab UI界面,部分项目可能提示为空项目,可以用高权限的维护者,把仓库pull下来,修改下,再push,即可解决项目为空问题。

  • 情况二: 有可能为DB数据关系错误,需要升级数据库关系

输入以下指令查看数据升级状态

gitlab-rake db:migrate:status

果然发现有一些显示为Down,显示为Up即表示正常同,再执行数据库关系升级

gitlab-rake db:migrate

执行完成再重复重建、重启命令,问题解决。

4. helm版本问题记录

最后,记录下helm版本,push失败问题。日后再研究。

$ git push
warning: redirecting to https://gitlab.domain.com/devops/docs.git/
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (7/7), 3.65 KiB | 1.22 MiB/s, done.
Total 7 (delta 2), reused 0 (delta 0)
remote: GitLab: Failed to authorize your Git request: internal API unreachable
To https://gitlab.domain.com/devops/docs
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://gitlab.domain.com/devops/docs'

参考资料: [1] https://docs.gitlab.com/omnibus/README.html [2] https://docs.gitlab.com/ce/raketasks/backup_restore.html [3] https://docs.gitlab.com/omnibus/settings/backups.html [4] https://docs.gitlab.com/ce/workflow/lfs/lfs_administration.html [5] https://docs.gitlab.com/ce/administration/uploads.html [6] https://docs.gitlab.com/ce/administration/raketasks/check.html