文章目录

一、问题描述

在运行某个Pyspark代码时报错如下:

Warning: Ignoring non-Spark config property: deploy-mode
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/08/02 11:35:34 WARN [Thread-4] Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
22/08/02 11:35:34 WARN [Thread-4] Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
22/08/02 11:35:34 WARN [Thread-4] Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
22/08/02 11:35:34 WARN [Thread-4] Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
22/08/02 11:35:34 WARN [Thread-4] Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
22/08/02 11:35:34 WARN [Thread-4] Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
22/08/02 11:35:34 WARN [Thread-4] Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
22/08/02 11:35:34 WARN [Thread-4] Utils: Service 'SparkUI' could not bind on port 4047. Attempting port 4048.
22/08/02 11:35:35 WARN [Thread-4] Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

二、解决方案

上面是因为在yarn上执行spark作业时,其他的app占用了集群资源,导致新的job无法使用足够的资源,可以Kill掉相应的job再试试:

lsof -i:4041
lsof -i:4042
kill -9 进程id

查了下其他的说法:spark-shell里面又单独的设置了spark的context,因为spark-shell里已经有一个context对象了,所以新建创建的数据无法使用。