才入门,很菜,基本原理搞不清楚。但是误打误撞解决了问题,把我的解决方法放在这里帮助下可能需要的人。
我在IDEA上使用Scala语言编写Flink的WordCount代码,并尝试将结果写入hdfs时出现了报错。以下是部分报错信息,完整的可以看后面。
Caused by: java.io.IOException: Cannot instantiate file system for URI: hdfs://usr/d0316/1.output
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: usr
Caused by: java.net.UnknownHostException: usr
usr/d0316/1.output
是我在hdfs上尝试写入的文件路径。
问题情况
- 问题主要出在写入hdfs的uri上。我的代码里写数据是这样写的:
aggDs.writeAsText("hdfs://usr/d0316/1.output", FileSystem.WriteMode.OVERWRITE)
当我尝试写回本地的时候,可以正常运行,在本地也可以找到这个输出的文档。
写回本地的代码,直接写回E盘:
aggDs.writeAsText("E:", FileSystem.WriteMode.OVERWRITE)
- 我的flink和hdfs都是安装在虚拟机上的,本地是通过maven安装好那些配置之后在idea上来运行的。我先在虚拟机上启动了
./start-dfs.sh
。启动后,本地浏览器可以通过端口访问hadoop的overview界面,如下图。
ps:
在本地查看的时候需要注意图上画红框那个地方,有时候页面点开过后那里是虚拟机上的主机名称,比如说我的主机名是 localhost-node1
,在本地的话因为没有设置就会导致页面无法访问。我比较懒没有设置,所以是直接把主机名换成ip地址,比如说我的主机名 localhost-node1
对应的ip地址是192.168.87.133
,就直接在上面改一下地址,改成图上那种,本地浏览器就能访问到hdfs的ui了。
解决方法
确认虚拟机上的hdfs启动。修改hdfs的文件路径,添加上nameservice的域名端口信息。
aggDs.writeAsText("hdfs://192.168.87.133:9000/usr/d0316/1.output", FileSystem.WriteMode.OVERWRITE)
说明:
- 这里的
192.168.87.133:9000
,和我在hadoop的core-site.xml
设置的fs.defaultFS
是一致的。(就是下图画红线的地方,我的localhost-node1
就是192.168.87.133:9000
)
下图是我在hadoop的core-site.xml
设置的fs.defaultFS
。 - 好像其他的方法也可以修改设置,但是我不会。这里这样改,程序可以跑通也可以在hdfs上查看到输出文件。
- 我在找解决方法的时候看到了这些博文,也许对你有帮助:
我在这篇里面发现了uri的这个格式,然后修改了尝试跑了下代码发现问题解决了hadoop加载fs.hdfs.impl_51CTO博客_hadoop fs -cat
我的源码
就是用scala语言写的简单的wordcount处理,抄的教程……
这个代码里输出文件的uri是没有修改的,跑出来会报错。
package cn.itcast.flink.batch
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.api.scala._
import org.apache.flink.core.fs.FileSystem
/*
使用flink批处理进行单词计数
*/
object WordCountDemo {
def main(args: Array[String]): Unit = {
/*
1.获得一个execution environment,
2.加载/创建初始数据,
3.指定这些数据的转换,
4.指定将计算结果放在哪里,
5.触发程序执行
*/
// 1.获得一个execution environment, 批处理程序入口对象
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
//设置全局并行度为1,
env.setParallelism(1)
// 2.加载/创建初始数据
val sourceDs: DataSet[String] = env.fromElements("Apache Flink is an open source platform for " +
"distributed stream and batch data processing",
"Flink’s core is a streaming dataflow engine that provides data distribution")
// 大致思路:对每行语句按照空格进行切分,切分之后组成(单词,1)tuple,按照单词分组最后进行聚合计算
// 3.指定这些数据的转换, transformation
val wordsDs: DataSet[String] = sourceDs.flatMap(_.split(" "))
//(单词,1)
val wordAndOneDs: DataSet[(String, Int)] = wordsDs.map((_, 1))
val groupDs: GroupedDataSet[(String, Int)] = wordAndOneDs.groupBy(0)
//聚合
val aggDs: AggregateDataSet[(String, Int)] = groupDs.sum(1)
// 4.指定将计算结果放在哪里,
System.setProperty("HADOOP_USER_NAME","root")
aggDs.writeAsText("hdfs://usr/d0316/1.output", FileSystem.WriteMode.OVERWRITE)
//关于默认的并行度:默认获取的是当前机器的cpu核数是8,所以有8个结果文件,
// 5 触发程序执行
env.execute()
}
}
完整报错信息
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Could not retrieve JobResult.
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:643)
at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:223)
at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:91)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:817)
at org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:525)
at cn.itcast.flink.batch.WordCountDemo$.main(WordCountDemo.scala:40)
at cn.itcast.flink.batch.WordCountDemo.main(WordCountDemo.scala)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$submitJob$2(Dispatcher.java:267)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:753)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
... 4 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:176)
at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058)
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:308)
at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
... 7 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 'DataSink (TextOutputFormat (hdfs://usr/d0316/1.output) - UTF-8)': Cannot instantiate file system for URI: hdfs://usr/d0316/1.output
at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:220)
at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:100)
at org.apache.flink.runtime.jobmaster.JobMaster.createExecutionGraph(JobMaster.java:1173)
at org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1153)
at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:296)
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:157)
... 10 more
Caused by: java.io.IOException: Cannot instantiate file system for URI: hdfs://usr/d0316/1.output
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:187)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:399)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at org.apache.flink.api.common.io.FileOutputFormat.initializeGlobal(FileOutputFormat.java:275)
at org.apache.flink.runtime.jobgraph.OutputFormatVertex.initializeOnMaster(OutputFormatVertex.java:89)
at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:216)
... 15 more
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: usr
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:159)
... 21 more
Caused by: java.net.UnknownHostException: usr
... 28 more
Process finished with exit code 1