厦门大学 hadoop

原创

mob64ca12f63d4f 2023-10-12 04:36:42 ©著作权

文章标签 Hadoop hadoop bash 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者mob64ca12f63d4f的原创作品，请联系作者获取转载授权，否则将追究法律责任

实现“厦门大学 hadoop”流程

flowchart TD;
    A(准备工作) --> B(安装Hadoop);
    B --> C(配置Hadoop);
    C --> D(启动Hadoop);
    D --> E(上传数据);
    E --> F(运行Hadoop任务);
    F --> G(获取结果);
    G --> H(分析和展示结果);

准备工作

在开始实现“厦门大学 hadoop”之前，我们需要准备一些必要的工作。

安装Java：Hadoop是基于Java开发的，因此需要先安装Java运行环境。请确保已经安装了JDK，并配置了环境变量。
下载Hadoop：从官方网站（

安装Hadoop

安装Hadoop的步骤如下：

解压缩Hadoop压缩包：使用以下命令解压缩下载的Hadoop压缩包。

tar -xvf hadoop-X.X.X.tar.gz

移动Hadoop目录：将解压后的Hadoop目录移动到所需的位置，并设置HADOOP_HOME环境变量。

mv hadoop-X.X.X /usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop

配置Hadoop

配置Hadoop的步骤如下：

编辑Hadoop配置文件：进入Hadoop配置目录，编辑core-site.xml文件，添加以下配置。

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

编辑Hadoop配置文件：编辑hdfs-site.xml文件，添加以下配置。

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

启动Hadoop

启动Hadoop的步骤如下：

格式化Hadoop文件系统：使用以下命令格式化Hadoop文件系统。

hdfs namenode -format

启动Hadoop集群：使用以下命令启动Hadoop集群。

start-dfs.sh
start-yarn.sh

上传数据

在运行Hadoop任务之前，我们需要将数据上传到Hadoop文件系统。

hdfs dfs -mkdir /input

上传数据文件：使用以下命令将数据文件上传到输入目录。

hdfs dfs -put local_file_path /input

运行Hadoop任务

运行Hadoop任务的步骤如下：

编写Hadoop程序：创建一个Java项目，并编写Hadoop程序。程序需要实现Mapper和Reducer接口，并实现map和reduce方法。

public class HadoopJob {
    public static class MapClass extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] words = line.split(" ");
            for (String word : words) {
                this.word.set(word);
                context.write(this.word, one);
            }
        }
    }

    public static class ReduceClass extends Reducer<Text, IntWritable, Text, IntWritable> {
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
                sum += value.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(MapClass.class);
        job.setCombinerClass(ReduceClass.class);
        job.setReducerClass(ReduceClass.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path("/input"));
        FileOutputFormat.setOutputPath(job, new Path("/output"));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }