hadoop hbase zookeeper

原创

mob64ca12f8da8d 2024-03-25 04:24:45 ©著作权

文章标签 apache hadoop Hadoop 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者mob64ca12f8da8d的原创作品，请联系作者获取转载授权，否则将追究法律责任

Hadoop, HBase and Zookeeper: A Comprehensive Introduction

In the world of big data, Hadoop, HBase, and Zookeeper are three essential tools that play crucial roles in processing and managing large volumes of data. In this article, we will provide a comprehensive overview of each of these tools, discuss their functionalities, and provide code examples to illustrate their usage.

Hadoop

Hadoop is an open-source framework that allows for distributed processing of large data sets across clusters of computers using simple programming models. It consists of two main components: Hadoop Distributed File System (HDFS) for storage and MapReduce for processing.

Code Example:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCount {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable>{

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

HBase

HBase is a distributed, scalable, and consistent NoSQL database built on top of Hadoop. It provides real-time read/write access to large datasets and is well-suited for applications where high availability and low latency are required.

Code Example:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseExample {

    public static void main(String[] args) throws IOException {
        Configuration config = HBaseConfiguration.create();
        HTable table = new HTable(config, "myTable");
        
        Put put = new Put(Bytes.toBytes("row1"));
        put.add(Bytes.toBytes("cf"), Bytes.toBytes("col1"), Bytes.toBytes("value1"));
        table.put(put);
        
        table.close();
    }
}

Zookeeper

Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and group services. It acts as a coordination service for distributed systems, ensuring consistency and reliability.

Code Example:

import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.ZooDefs;
import org.apache.zookeeper.ZooKeeper;

public class ZookeeperExample {

    private static final String ZOOKEEPER_ADDRESS = "localhost:2181";
    
    public static void main(String[] args) throws IOException, InterruptedException, KeeperException {
        ZooKeeper zooKeeper = new ZooKeeper(ZOOKEEPER_ADDRESS, 3000, new Watcher() {
            @Override
            public void process(WatchedEvent event) {
                System.out.println("Event received: " + event);
            }
        });
        
        String path = zooKeeper.create("/test", "data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
        
        byte[] data = zooKeeper.getData("/test", false, null);
        System.out.println("Data: " + new String(data));
        
        zooKeeper.close();
    }
}

Conclusion

In this article, we have provided a comprehensive overview of Hadoop, HBase, and Zookeeper, three essential tools in the world of big data. Each of these tools plays a crucial role in processing, managing, and coordinating large volumes of data across distributed systems. By understanding their functionalities and using code examples, developers can leverage these tools to build robust and scalable big data applications.

上一篇：java 线程池实现队列满了抛弃入队时间过30秒的 demo

下一篇：axios cdn稳定版本

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯