Spark on kubernetes环境搭建
实验要求
键 | 值 |
kubernetes集群 | 版本1.15 |
spark | 版本 2.4.5 |
私建docker仓库 | 已经被kubernetes集群的docker信任 |
jdk | 1.8 |
scala | 2.12 |
构建kubernetes
- 单机测试的话,可以使用minikube
- 为每一台kubernetes节点,下载jdk和Scala,并配置JAVA_HOME和SCALA_HOME
搭建私有docker镜像仓库
- 准备一台单独的设备安装docker,或者,在kubernetes集群中寻找一台设备
- 配置insecure docker仓库,docker镜像仓库本身,以及每一台kubernetes集群中的节点都需要配置
# 编辑/etc/docker/daemon.json,内容如下
{
"insecure-registries": ["${设备的外网IP}:5000"]
}
- 重启docker,docker镜像仓库所在的设备和kubernetes的每一台设备都需要重启docker
systemctl restart docker.service
- 启动docker 私有仓库
docker run -d --restart=always --name registry-test -p 5000:5000 registry:2
构建spark docker images
- 在一台已经安装了docker的设备上下载spark
wget http://apache.communilink.net/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
- build spark docker镜像
tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz
cd spark-2.4.5-bin-hadoop2.7
# ./bin/docker-image-tool.sh -r ${你的镜像仓库地址} -t ${你的镜像tag} build
./bin/docker-image-tool.sh -r 10.211.55.24:5000 -t v1 build
- 将build好的image导入到自建的docker镜像仓库中
# 方式一
docker push ${你的镜像}
# 方式二
./bin/docker-image-tool.sh -r 10.211.55.24:5000 -t v1 push
# 方式三
docker save 10.211.55.24:5000/spark:v1 > tmp.tar
scp tmp.tar ${你的镜像仓库}/${path}
ssh ${你的镜像仓库}
docker load < 10.211.55.24:5000/spark:v1
docker push 10.211.55.24:5000/spark:v1
为spark程序创建serviceaccount和对应的rbac
- 为spark应用程序构建运行需要的角色和账户
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
运行spark的example sparkPi
- 在kubernetes master上运行spark提交脚本
- 在kubernetes master运行proxy
kubectl proxy
- spark运行脚本
bin/spark-submit \
--master k8s://http://127.0.0.1:8001 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=${你的镜像仓库IP地址}/apache/spark:v1 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
/opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
可能会遇到的问题
- spark无法与kubernetes apiserver连接,或者应该是java无法与apiserver连接
- 报错
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:212)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:221)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:215)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:134)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:350)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:759)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:738)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:69)
at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(KubernetesClientApplication.scala:140)
at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(KubernetesClientApplication.scala:140)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2542)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:328)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1614)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:987)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:319)
at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:283)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:168)
at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:112)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:200)
... 4 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1596)
... 39 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392)
... 45 more
- 原因
kubernetes集群是自建CA的,kubernetes的证书java是不认的 - 解决方案
- 目前有一种流传的解决方案:将kubernetes的ca证书导入到java的证书信任链中,我目前还没能成功
- 另一种解决方案是,使用kubectl proxy开一个http服务器,然后使用http服务器的url做spark-submit的–master
- 权限问题
- 报错
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-1573016872026-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-1573016872026-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:478)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:415)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:381)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:801)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:218)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:185)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
at scala.Option.map(Option.scala:146)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:55)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
... 20 more
- 解决方案
为spark应用程序创建serviceaccount和rbac
- 示例程序的java class找不到
- 原因
运行在kubernetes的spark程序,找local的jar包是在镜像内部找的 - 解决方案
将jar包放到http服务器或者hdfs中
目前尚未解决的问题
- 目前运行spark指定master只能用kubectl proxy运行一个http服务器,然后spark的–master指定http的url;尚不能直接在–master处指定kubernetes的https url
- 目前运行的sparkPi的jar包是直接构建到spark的镜像内部的,脚本中写的jar包的脚本是jar包在镜像内部的路径
- 目前尚不能从http server或者hdfs获取jar包
- 目前还在考虑如何让kubernetes访问到我的数据存储系统