才入门,很菜,基本原理搞不清楚。但是误打误撞解决了问题,把我的解决方法放在这里帮助下可能需要的人。


我在IDEA上使用Scala语言编写Flink的WordCount代码,并尝试将结果写入hdfs时出现了报错。以下是部分报错信息,完整的可以看后面。

Caused by: java.io.IOException: Cannot instantiate file system for URI: hdfs://usr/d0316/1.output
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: usr
Caused by: java.net.UnknownHostException: usr

usr/d0316/1.output是我在hdfs上尝试写入的文件路径。

问题情况

  1. 问题主要出在写入hdfs的uri上。我的代码里写数据是这样写的:
aggDs.writeAsText("hdfs://usr/d0316/1.output", FileSystem.WriteMode.OVERWRITE)

当我尝试写回本地的时候,可以正常运行,在本地也可以找到这个输出的文档。

写回本地的代码,直接写回E盘:

aggDs.writeAsText("E:", FileSystem.WriteMode.OVERWRITE)
  1. 我的flink和hdfs都是安装在虚拟机上的,本地是通过maven安装好那些配置之后在idea上来运行的。我先在虚拟机上启动了 ./start-dfs.sh 。启动后,本地浏览器可以通过端口访问hadoop的overview界面,如下图。

ps:在本地查看的时候需要注意图上画红框那个地方,有时候页面点开过后那里是虚拟机上的主机名称,比如说我的主机名是 localhost-node1,在本地的话因为没有设置就会导致页面无法访问。我比较懒没有设置,所以是直接把主机名换成ip地址,比如说我的主机名 localhost-node1对应的ip地址是192.168.87.133,就直接在上面改一下地址,改成图上那种,本地浏览器就能访问到hdfs的ui了。

解决方法

确认虚拟机上的hdfs启动。修改hdfs的文件路径,添加上nameservice的域名端口信息。

aggDs.writeAsText("hdfs://192.168.87.133:9000/usr/d0316/1.output", FileSystem.WriteMode.OVERWRITE)

说明:

  • 这里的192.168.87.133:9000,和我在 hadoop的core-site.xml设置的 fs.defaultFS是一致的。(就是下图画红线的地方,我的 localhost-node1就是 192.168.87.133:9000
    下图是我在 hadoop的core-site.xml设置的 fs.defaultFS
  • 好像其他的方法也可以修改设置,但是我不会。这里这样改,程序可以跑通也可以在hdfs上查看到输出文件。
  • 我在找解决方法的时候看到了这些博文,也许对你有帮助:
    我在这篇里面发现了uri的这个格式,然后修改了尝试跑了下代码发现问题解决了hadoop加载fs.hdfs.impl_51CTO博客_hadoop fs -cat

我的源码

就是用scala语言写的简单的wordcount处理,抄的教程……

这个代码里输出文件的uri是没有修改的,跑出来会报错。

package cn.itcast.flink.batch

import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.api.scala._
import org.apache.flink.core.fs.FileSystem

/*
使用flink批处理进行单词计数
 */
object WordCountDemo {
  def main(args: Array[String]): Unit = {
    /*
    1.获得一个execution environment,
    2.加载/创建初始数据,
    3.指定这些数据的转换,
    4.指定将计算结果放在哪里,
    5.触发程序执行
     */
    //  1.获得一个execution environment, 批处理程序入口对象
    val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    //设置全局并行度为1,
    env.setParallelism(1)
    // 2.加载/创建初始数据
    val sourceDs: DataSet[String] = env.fromElements("Apache Flink is an open source platform for " +
      "distributed stream and batch data processing",
      "Flink’s core is a streaming dataflow engine that provides data distribution")
    // 大致思路:对每行语句按照空格进行切分,切分之后组成(单词,1)tuple,按照单词分组最后进行聚合计算
    // 3.指定这些数据的转换, transformation
    val wordsDs: DataSet[String] = sourceDs.flatMap(_.split(" "))
    //(单词,1)
    val wordAndOneDs: DataSet[(String, Int)] = wordsDs.map((_, 1))
    val groupDs: GroupedDataSet[(String, Int)] = wordAndOneDs.groupBy(0)
    //聚合
    val aggDs: AggregateDataSet[(String, Int)] = groupDs.sum(1)
    // 4.指定将计算结果放在哪里,
    System.setProperty("HADOOP_USER_NAME","root")
    aggDs.writeAsText("hdfs://usr/d0316/1.output", FileSystem.WriteMode.OVERWRITE)
    //关于默认的并行度:默认获取的是当前机器的cpu核数是8,所以有8个结果文件,
    // 5 触发程序执行
    env.execute()
  }
}

完整报错信息

Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Could not retrieve JobResult.
	at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:643)
	at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:223)
	at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:91)
	at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:817)
	at org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:525)
	at cn.itcast.flink.batch.WordCountDemo$.main(WordCountDemo.scala:40)
	at cn.itcast.flink.batch.WordCountDemo.main(WordCountDemo.scala)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$submitJob$2(Dispatcher.java:267)
	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:753)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
	at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
	at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
	at akka.actor.ActorCell.invoke(ActorCell.scala:495)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
	at akka.dispatch.Mailbox.run(Mailbox.scala:224)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
	... 4 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:176)
	at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058)
	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:308)
	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
	... 7 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 'DataSink (TextOutputFormat (hdfs://usr/d0316/1.output) - UTF-8)': Cannot instantiate file system for URI: hdfs://usr/d0316/1.output
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:220)
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:100)
	at org.apache.flink.runtime.jobmaster.JobMaster.createExecutionGraph(JobMaster.java:1173)
	at org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1153)
	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:296)
	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:157)
	... 10 more
Caused by: java.io.IOException: Cannot instantiate file system for URI: hdfs://usr/d0316/1.output
	at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:187)
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:399)
	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
	at org.apache.flink.api.common.io.FileOutputFormat.initializeGlobal(FileOutputFormat.java:275)
	at org.apache.flink.runtime.jobgraph.OutputFormatVertex.initializeOnMaster(OutputFormatVertex.java:89)
	at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:216)
	... 15 more
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: usr
	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320)
	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
	at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:159)
	... 21 more
Caused by: java.net.UnknownHostException: usr
	... 28 more

Process finished with exit code 1