加上参数:
–conf spark.executor.heartbeatInterval=13000s
–conf spark.storage.blockManagerSlaveTimeoutMs=13100s
–conf spark.network.timeout=13200s

注意这几个参数的大小要一个比一个大,是spark内部的限制

另外,
先rdd再repartition:
df.rdd.repartition(100000).mapPartitions 而不是先repartition再rdd:
df.repartition(100000).rdd.mapPartitions