1、重新安装kubernetes
【注】参照我的安装Kubernetes入门篇[Kubernetes百分百安装]只需要安装到 [4-8]子节点加入主节点,后面的k8s官方可视化工具不用安装。
2、安装KubeSphere前置环境
1、nfs文件系统
1-1、安装nfs-server
# 在每个机器。
yum install -y nfs-utils
#暴露nfs目录 在master 执行以下命令
echo "/nfs/data/ *(insecure,rw,sync,no_root_squash)" > /etc/exports
# 执行以下命令,启动 nfs 服务;创建共享目录
mkdir -p /nfs/data
# 在master执行
systemctl enable rpcbind
systemctl enable nfs-server
systemctl start rpcbind
systemctl start nfs-server
# 使配置生效
exportfs -r
#检查配置是否生效
exportfs
1-2、配置nfs-client(选做)
让所有节点都能同步
#IP需要改成自己的master
showmount -e 172.31.0.3
#IP需要改成自己的nfs目录
mkdir -p /nfs/data
#IP需要改成自己的master,改成自己的nfs目录
mount -t nfs 172.31.0.3:/nfs/data /nfs/data
1-3、配置默认存储
配置动态供应的默认存储类
## 创建了一个存储类
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-storage
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
archiveOnDelete: "true" ## 删除pv的时候,pv的内容是否要备份
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/nfs-subdir-external-provisioner:v4.0.2
# resources:
# limits:
# cpu: 10m
# requests:
# cpu: 10m
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 172.31.0.3 # 指定自己master nfs服务器地址
- name: NFS_PATH
value: /nfs/data ## nfs服务器共享的目录
volumes:
- name: nfs-client-root
nfs:
server: 172.31.0.3 # 指定自己master nfs服务器地址
path: /nfs/data
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
#查看存储类
kubectl get sc
1-4、创建PVC,绑定存储类
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nginx-pvc #名称
spec:
accessModes:
- ReadWriteMany #申请权限
resources:
requests:
storage: 200Mi #申请大小
#storageClassName: nfs #申请是哪个PV池名称,不写就是默认存储类
2、metrics-server
集群指标监控组件
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --kubelet-insecure-tls
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
image: registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/metrics-server:v0.4.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
3、安装KubeSphere
3-1、下载核心文件
如果下载不到,请复制附录的内容
#没有wget先安装
yum install -y wget
#再安装vim,因为vi会有问题不能高亮
yum install -y vim
wget https://github.com/kubesphere/ks-installer/releases/download/v3.1.1/kubesphere-installer.yaml
wget https://github.com/kubesphere/ks-installer/releases/download/v3.1.1/cluster-configuration.yaml
3-2、修改cluster-configuration
在 cluster-configuration.yaml中指定我们需要开启的功能
参照官网“启用可插拔组件”
https://kubesphere.com.cn/docs/pluggable-components/overview/
打开所有功能,所有false都改成true,如:开启etcd监控,ip改成自己master的。
【注】 metrics_server不用打开,因为之前已经安装了,不然还会重新安装一遍,这是从官网下载很慢,所以前面就先安装了。
打开network,改成我们安装的calico
可以复制我下面的cluster-configuration.yaml
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.2.0
spec:
persistence:
storageClass: "" # If there is no default StorageClass in your cluster, you need to specify an existing StorageClass here.
authentication:
local_registry: "" # Add your private registry address if it is needed.
# dev_tag: "" # Add your kubesphere image tag you want to install, by default it's same as ks-install release version.
etcd:
monitoring: ture # Enable or disable etcd monitoring dashboard installation. You have to create a Secret for etcd before you enable it.
endpointIps: 172.31.0.3 # etcd cluster EndpointIps. It can be a bunch of IPs here.
port: 2379 # etcd port.
tlsEnable: true
common:
core:
console:
enableMultiLogin: true # Enable or disable simultaneous logins. It allows different users to log in with the same account at the same time.
port: 30880
type: NodePort
# apiserver: # Enlarge the apiserver and controller manager's resource requests and limits for the large cluster
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: true
volumeSize: 2Gi # Redis PVC size.
openldap:
enabled: true
volumeSize: 2Gi # openldap PVC size.
minio:
volumeSize: 20Gi # Minio PVC size.
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090 # Prometheus endpoint to get metrics data.
GPUMonitoring: # Enable or disable the GPU-related metrics. If you enable this switch but have no GPU resources, Kubesphere will set it to zero.
enabled: ture
gpu: # Install GPUKinds. The default GPU kind is nvidia.com/gpu. Other GPU kinds can be added here according to your needs.
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es: # Storage backend for logging, events and auditing.
type: NodePort
# apiserver: # Enlarge the apiserver and controller manager's resource requests and limits for the large cluster
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: true
volumeSize: 2Gi # Redis PVC size.
openldap:
enabled: true
volumeSize: 2Gi # openldap PVC size.
minio:
volumeSize: 20Gi # Minio PVC size.
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090 # Prometheus endpoint to get metrics data.
GPUMonitoring: # Enable or disable the GPU-related metrics. If you enable this switch but have no GPU resources, Kubesphere will set it to zero.
enabled: ture
gpu: # Install GPUKinds. The default GPU kind is nvidia.com/gpu. Other GPU kinds can be added here according to your needs.
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es: # Storage backend for logging, events and auditing.
# master:
namespace: kubesphere-system
labels:
version: v3.2.0
spec:
persistence:
storageClass: "" # If there is no default StorageClass in your cluster, you need to specify an existing StorageClass here.
authentication:
local_registry: "" # Add your private registry address if it is needed.
# dev_tag: "" # Add your kubesphere image tag you want to install, by default it's same as ks-install release version.
etcd:
monitoring: ture # Enable or disable etcd monitoring dashboard installation. You have to create a Secret for etcd before you enable it.
endpointIps: 172.31.0.3 # etcd cluster EndpointIps. It can be a bunch of IPs here.
port: 2379 # etcd port.
tlsEnable: true
common:
core:
console:
enableMultiLogin: true # Enable or disable simultaneous logins. It allows different users to log in with the same account at the same time.
port: 30880
type: NodePort
# apiserver: # Enlarge the apiserver and controller manager's resource requests and limits for the large cluster
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: true
volumeSize: 2Gi # Redis PVC size.
openldap:
enabled: true
volumeSize: 2Gi # openldap PVC size.
minio:
volumeSize: 20Gi # Minio PVC size.
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090 # Prometheus endpoint to get metrics data.
GPUMonitoring: # Enable or disable the GPU-related metrics. If you enable this switch but have no GPU resources, Kubesphere will set it to zero.
enabled: ture
gpu: # Install GPUKinds. The default GPU kind is nvidia.com/gpu. Other GPU kinds can be added here according to your needs.
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es: # Storage backend for logging, events and auditing.
# master:
# resources: {}
# data:
# volumeSize: 20Gi # The volume size of Elasticsearch data nodes.
# replicas: 1 # The total number of data nodes.
# resources: {}
logMaxAge: 7 # Log retention time in built-in Elasticsearch. It is 7 days by default.
elkPrefix: logstash # The string making up index names. The index name will be formatted as ks-<elk_prefix>-log.
basicAuth:
enabled: false
username: ""
password: ""
externalElasticsearchUrl: ""
externalElasticsearchPort: ""
enabled: true # Enable or disable the KubeSphere Alerting System.
# thanosruler:
# replicas: 1
enabled: true # Enable or disable the KubeSphere Auditing Log System.
# operator:
# resources: {}
# webhook:
# resources: {}
enabled: true # Enable or disable the KubeSphere DevOps System.
# resources: {}
jenkinsMemoryLim: 2Gi # Jenkins memory limit.
jenkinsMemoryReq: 1500Mi # Jenkins memory request.
jenkinsVolumeSize: 8Gi # Jenkins volume size.
jenkinsJavaOpts_Xms: 512m # The following three fields are JVM parameters.
jenkinsJavaOpts_Xmx: 512m
jenkinsJavaOpts_MaxRAM: 2g
events: # Provide a graphical web console for Kubernetes Events exporting, filtering and alerting in multi-tenant Kubernetes clusters.
enabled: true # Enable or disable the KubeSphere Events System.
# operator:
# resources: {}
# exporter:
# resources: {}
# ruler:
# enabled: true
# replicas: 2
# resources: {}
enabled: true # Enable or disable the KubeSphere Logging System.
containerruntime: docker
# resources: {}
# ruler:
# enabled: true
# replicas: 2
# resources: {}
enabled: true # Enable or disable the KubeSphere Logging System.
containerruntime: docker
logsidecar:
enabled: true
replicas: 2
# resources: {}
metrics_server: # (CPU: 56 m, Memory: 44.35 MiB) It enables HPA (Horizontal Pod Autoscaler).
enabled: false # Enable or disable metrics-server.
monitoring:
storageClass: "" # If there is an independent StorageClass you need for Prometheus, you can specify it here. The default StorageClass is used by default.
# kube_rbac_proxy:
# resources: {}
# kube_state_metrics:
# resources: {}
# volumeSize: 20Gi # Prometheus PVC size.
# resources: {}
# operator:
# resources: {}
# adapter:
# resources: {}
# node_exporter:
# resources: {}
# alertmanager:
# replicas: 1 # AlertManager Replicas.
# resources: {}
# notification_manager:
# resources: {}
# operator:
# resources: {}
# proxy:
# resources: {}
gpu: # GPU monitoring-related plugins installation.
nvidia_dcgm_exporter:
enabled: true
# resources: {}
multicluster:
clusterRole: none # host | member | none # You can install a solo cluster, or specify it as the Host or Member Cluster.
network:
networkpolicy: # Network policies allow network isolation within the same cluster, which means firewalls can be set up between certain instances (Pods).
enabled: true # Enable or disable network policies.
ippool: # Use Pod IP Pools to manage the Pod network address space. Pods to be created can be assigned IP addresses from a Pod IP Pool.
type: calico # Specify "calico" for this field if Calico is used as your CNI plugin. "none" means that Pod IP Pools are disabled.
topology: # Use Service Topology to view Service-to-Service communication based on Weave Scope.
type: none # Specify "weave-scope" for this field to enable Service Topology. "none" means that Service Topology is disabled.
openpitrix: # An App Store that is accessible to all platform tenants. You can use it to manage apps across their entire lifecycle.
store:
enabled: true # Enable or disable the KubeSphere App Store.
servicemesh: # (0.3 Core, 300 MiB) Provide fine-grained traffic management, observability and tracing, and visualized traffic topology.
enabled: true # Base component (pilot). Enable or disable KubeSphere Service Mesh (Istio-based).
kubeedge: # Add edge nodes to your cluster and deploy workloads on edge nodes.
enabled: true # Enable or disable KubeEdge.
cloudCore:
nodeSelector: {"node-role.kubernetes.io/worker": ""}
tolerations: []
cloudhubPort: "10000"
cloudhubQuicPort: "10001"
cloudhubHttpsPort: "10002"
cloudstreamPort: "10003"
tunnelPort: "10004"
cloudHub:
advertiseAddress: # At least a public IP address or an IP address which can be accessed by edge nodes must be provided.
- "" # Note that once KubeEdge is enabled, CloudCore will malfunction if the address is not provided.
nodeLimit: "100"
service:
cloudhubNodePort: "30000"
cloudhubQuicNodePort: "30001"
cloudhubHttpsNodePort: "30002"
cloudstreamNodePort: "30003"
tunnelNodePort: "30004"
edgeWatcher:
nodeSelector: {"node-role.kubernetes.io/worker": ""}
tolerations: []
edgeWatcherAgent:
nodeSelector: {"node-role.kubernetes.io/worker": ""}
tolerations: []
3-3、执行安装
kubectl apply -f kubesphere-installer.yaml
kubectl apply -f cluster-configuration.yaml
3-4、查看安装进度日志
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f
3-5、进入可视化界面
访问任意机器的 139.198.108.199:30880 端口进入可视化界面
默认账号 : admin
默认密码 : P@88w0rd(安装完成控制台会有密码出现,不记得了重新运行上面命令查看日志)
解决prometheus报错,etcd监控证书找不到问题,
kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs --from-file=etcd-client-ca.crt=/etc/kubernetes/pki/etcd/ca.crt --from-file=etcd-client.crt=/etc/kubernetes/pki/apiserver-etcd-client.crt --from-file=etcd-client.key=/etc/kubernetes/pki/apiserver-etcd-client.key
3-6、镜像问题排查
有些Pod的一直没有运行,我们可以使用下面的命令进行排查,在最后面查看事件。
#查看pod信息
kubectl get pod -A
#复制有问题pod的名称空间和Pod名称 描述这个pod,查看事件
kubectl describe pod -n kubesphere-system redis-8f84d9fdd-xjf6w
4、使用KubeSphere多租户功能
使用k8s官方的可视化界面,每次都要使用密钥登陆,而且不能多账号登录,而KubeSphere可以多账号登录并且能进行权限划分。
对照官方文档即可:https://kubesphere.com.cn/docs/quick-start/create-workspace-and-project/ 尚硅谷的按照企业结构进行权限划分,让我们更好理解。
第一部分账号图解
1、先创建多个用户,首先用admin创建users-manager权限的hr-zhang用户可以管理企业所有用户,让以后所有的用户创建都使用hr-zhang这个账号。、
2、之后使用hr-zhang账号创建workspaces-manager权限的boss-li账号,boss-li相当于企业大老板,能够管理所有的企业空间(相当于大老板能管理所有分公司)。
3、之后所有的用户都用hr-zhang来创建,并且之后创建的所有用户都是**platform-regular **普通用户权限,不管是开发人员账号还是分公司boss账号。
第二部分企业空间图解
1、使用boss-li来创建企业空间,先创建wuhan分公司,管理员分配给wuhan-boss。
2、使用wuhan-boss登录,可以自由管理企业的角色和企业的用户部门权限等。
3、在每个分公司(企业空间)都有自己的项目,各个公司做的项目都不同。
4、企业空间默认有四个权限,workspace-admin(分公司boss可以管理分公司所有)、workspace-self-provisioer(项目总监可以创建项目)、workspace-regular(开发人员不可以创建项目)、workspace-viwer(销售人员只能查看)。
5、使用wuhan-boss账号邀请普通账号到wuhan企业空间下,给一个zonjian账号(任意账号)分配workspace-self-provisioer权限。
第三部分项目空间图解
1、登录zonjian创建一个his项目,邀请普通用户进入项目。
2、每个项目又有3个权限admin(项目组长)、viewer(观察者,比如老板不懂技术可以给这个权限)、operator(普通开发成员)
5、项目菜单
每个项目其实就是一个个的名称空间,可以使用:kubectl get ns 查看。
使用拥有项目admin权限的账号登录。
菜单介绍:
应用: 一键部署Pod,不用下面的操作
服务: Service,集群服务器之间内网互通,可以用IP相互访问集群服务器内的Pod,但是外网不能访问
工作负载: Pod创建方式
任务: 定时任务
应用路由: Ingress域名统一路由Service,外网可以访问
存储: k8s的存储卷(PVC)
配置: 配置集ConfigMap
5-1、工作负载
1、什么是工作负载?
就是创建Pod的3种类型。
部署: 就是k8s的Deployment,用于部署微服务这些无状态应用,删除后不需要记住微服务的信息数据。
有状态副本集: 就是k8s的StatefulSet,用户部署mysql、redis等这些中间件,数据会被保留,删除了重新启动一份中间件Pod一样能访问到原来数据。
守护进程集: 就是K8s的DaemonSet,微服务需要收集日志,需要在所有Pod中都启动一个日志收集工具,有且只有一份。
5-2、应用仓库
使用企业空间管理员(wuhan-boss)登录,设置应用仓库
学习Helm即可,去helm的应用市场添加一个仓库地址,比如:charts.bitnami.com/bitnami
和dockerhub一样,都是镜像地址。
查询镜像地址:https://artifacthub.io/使用步骤:
5-3、存活探针
nacos在服务器重启后,比mysql先启动会报错停止工作,而通过存活探针来请求nacos服务,如果请求不通过则重新启动。
3种模式,一般用第一种就可以了。
会自动请求Pod的Ip+设置的请求路径和端口