Spark on kubernetes环境搭建

实验要求



kubernetes集群

版本1.15

spark

版本 2.4.5

私建docker仓库

已经被kubernetes集群的docker信任

jdk

1.8

scala

2.12

构建kubernetes

  • 单机测试的话,可以使用minikube
  • 为每一台kubernetes节点,下载jdk和Scala,并配置JAVA_HOME和SCALA_HOME

搭建私有docker镜像仓库

  • 准备一台单独的设备安装docker,或者,在kubernetes集群中寻找一台设备
  • 配置insecure docker仓库,docker镜像仓库本身,以及每一台kubernetes集群中的节点都需要配置
# 编辑/etc/docker/daemon.json,内容如下
{
	"insecure-registries": ["${设备的外网IP}:5000"]
}
  • 重启docker,docker镜像仓库所在的设备和kubernetes的每一台设备都需要重启docker
systemctl restart docker.service
  • 启动docker 私有仓库
docker run -d --restart=always --name registry-test -p 5000:5000 registry:2

构建spark docker images

  • 在一台已经安装了docker的设备上下载spark
wget http://apache.communilink.net/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
  • build spark docker镜像
tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz
cd spark-2.4.5-bin-hadoop2.7
# ./bin/docker-image-tool.sh -r ${你的镜像仓库地址} -t ${你的镜像tag} build
./bin/docker-image-tool.sh -r 10.211.55.24:5000 -t v1 build
  • 将build好的image导入到自建的docker镜像仓库中
# 方式一
docker push ${你的镜像}
# 方式二
./bin/docker-image-tool.sh -r 10.211.55.24:5000 -t v1 push
# 方式三
docker save 10.211.55.24:5000/spark:v1 > tmp.tar
scp tmp.tar ${你的镜像仓库}/${path}
ssh ${你的镜像仓库}
docker load < 10.211.55.24:5000/spark:v1
docker push 10.211.55.24:5000/spark:v1

为spark程序创建serviceaccount和对应的rbac

  • 为spark应用程序构建运行需要的角色和账户
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

运行spark的example sparkPi

  • 在kubernetes master上运行spark提交脚本
  • 在kubernetes master运行proxy
kubectl proxy
  • spark运行脚本
bin/spark-submit \
  --master k8s://http://127.0.0.1:8001 \
  --deploy-mode cluster \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.executor.instances=5 \
  --conf spark.kubernetes.container.image=${你的镜像仓库IP地址}/apache/spark:v1 \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  /opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar

可能会遇到的问题

  • spark无法与kubernetes apiserver连接,或者应该是java无法与apiserver连接
  • 报错
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
  	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:212)
  	at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
  	at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:221)
  	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:215)
  	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  	at java.lang.Thread.run(Thread.java:748)
  	Suppressed: java.lang.Throwable: waiting here
  		at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:134)
  		at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:350)
  		at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:759)
  		at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:738)
  		at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:69)
  		at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(KubernetesClientApplication.scala:140)
  		at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(KubernetesClientApplication.scala:140)
  		at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2542)
  		at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
  		at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
  		at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
  		at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
  		at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
  		at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
  		at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
  		at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
  		at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
  		at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
  		at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
  		at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
  		at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
  	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
  	at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)
  	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:328)
  	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322)
  	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1614)
  	at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
  	at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052)
  	at sun.security.ssl.Handshaker.process_record(Handshaker.java:987)
  	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)
  	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
  	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
  	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
  	at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:319)
  	at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:283)
  	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:168)
  	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
  	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
  	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
  	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
  	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
  	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
  	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
  	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
  	at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:112)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
  	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
  	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
  	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:200)
  	... 4 more
  Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
  	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397)
  	at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302)
  	at sun.security.validator.Validator.validate(Validator.java:260)
  	at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
  	at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
  	at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
  	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1596)
  	... 39 more
  Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
  	at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
  	at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
  	at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
  	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392)
  	... 45 more
  • 原因
    kubernetes集群是自建CA的,kubernetes的证书java是不认的
  • 解决方案
  • 目前有一种流传的解决方案:将kubernetes的ca证书导入到java的证书信任链中,我目前还没能成功
  • 另一种解决方案是,使用kubectl proxy开一个http服务器,然后使用http服务器的url做spark-submit的–master
  • 权限问题
  • 报错
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
   	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
   	at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
   	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
   	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
   	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
   	at scala.Option.getOrElse(Option.scala:121)
   	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
   	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
   	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-1573016872026-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-1573016872026-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".
   	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:478)
   	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:415)
   	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:381)
   	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
   	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313)
   	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296)
   	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:801)
   	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:218)
   	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:185)
   	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
   	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
   	at scala.Option.map(Option.scala:146)
   	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:55)
   	at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
   	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
   	... 20 more
  • 解决方案
    为spark应用程序创建serviceaccount和rbac
  • 示例程序的java class找不到
  • 原因
    运行在kubernetes的spark程序,找local的jar包是在镜像内部找的
  • 解决方案
    将jar包放到http服务器或者hdfs中

目前尚未解决的问题

  • 目前运行spark指定master只能用kubectl proxy运行一个http服务器,然后spark的–master指定http的url;尚不能直接在–master处指定kubernetes的https url
  • 目前运行的sparkPi的jar包是直接构建到spark的镜像内部的,脚本中写的jar包的脚本是jar包在镜像内部的路径
  • 目前尚不能从http server或者hdfs获取jar包
  • 目前还在考虑如何让kubernetes访问到我的数据存储系统