最近项目要用Hadoop来做数据分析,所以让我先熟悉一下相关的知识,第一次接触,公司也没有人有这方面的经验,只好自己摸着石头过河,理论知识只能在之后的使用过程中慢慢积累和深入学习,现在只能先想办法把项目跑起来。

环境的搭建是运维给搞好的,Hadoop是2.6.0版本,这里需要注意的是,相关端口需要运维提前给好相关权限,我用了半天时间找一个问题,最后发现是一个端口没有权限造成的,白白浪费了时间。

配置

pom文件


内容没有给全,主要贴出了hbase需要的,其他的根据个人需要而添加


<properties>
    <spring.version>4.1.6.RELEASE</spring.version>
    <spring.hadoop.version>2.4.0.RELEASE</spring.hadoop.version>
    <hadoop.version>2.6.0</hadoop.version>
    <hbase.version>1.3.1</hbase.version>
  </properties>

    <dependency>
      <groupId>org.springframework</groupId>
      <artifactId>spring-tx</artifactId>
      <version>${spring.version}</version>
    </dependency>

    <dependency>
      <groupId>org.springframework.data</groupId>
      <artifactId>spring-data-hadoop</artifactId>
      <version>${spring.hadoop.version}</version>
      <exclusions>
        <exclusion>
          <groupId>org.springframework</groupId>
          <artifactId>spring-context-support</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <!-- hbase -->
    <dependency>
      <groupId>com.yammer.metrics</groupId>
      <artifactId>metrics-core</artifactId>
      <version>2.2.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
      <scope>compile</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>${hbase.version}</version>
      <scope>compile</scope>
      <exclusions>
        <exclusion>
          <groupId>log4j</groupId>
          <artifactId>log4j</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

web.xml

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="3.0"
         xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
         http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd">
  <display-name>${maven.project.name}</display-name>

  <listener>
    <listener-class>org.springframework.web.util.IntrospectorCleanupListener</listener-class>
  </listener>

  <!-- ========================================================= -->
  <!-- Spring配置 -->
  <!-- ========================================================= -->
  <listener>
    <description>spring监听器</description>
    <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
  </listener>
  <context-param>
    <param-name>contextConfigLocation</param-name>
    <param-value>classpath*:/spring/*.xml</param-value>
  </context-param>

  <!-- ========================================== -->
  <!-- 字符集过滤器,对request和response中的字符编码 -->
  <!-- ========================================== -->
  <filter>
    <description>字符集过滤器</description>
    <filter-name>characterEncodingFilter</filter-name>
    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
    <init-param>
      <description>字符集编码</description>
      <param-name>encoding</param-name>
      <param-value>UTF-8</param-value>
    </init-param>
    <init-param>
      <param-name>forceEncoding</param-name>
      <param-value>true</param-value>
    </init-param>
  </filter>
  <filter-mapping>
    <filter-name>characterEncodingFilter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

  <!-- ====================== -->
  <!-- SpringMVC配置 -->
  <!-- ====================== -->
  <servlet>
    <servlet-name>springmvc</servlet-name>
    <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
    <init-param>
      <description>spring mvc 配置文件</description>
      <param-name>contextConfigLocation</param-name>
      <param-value>classpath*:/servlet-context.xml</param-value>
    </init-param>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>springmvc</servlet-name>
    <url-pattern>/</url-pattern>
  </servlet-mapping>
  <welcome-file-list>
    <welcome-file>/index.jsp</welcome-file>
  </welcome-file-list>
</web-app>

servlet-context.xml

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	   xmlns:context="http://www.springframework.org/schema/context"
	   xmlns:aop="http://www.springframework.org/schema/aop"
	   xmlns:mvc="http://www.springframework.org/schema/mvc"
	   xmlns:tx="http://www.springframework.org/schema/tx"
	   xmlns:p="http://www.springframework.org/schema/p"
	   xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.1.xsd
	http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.1.xsd
	http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-4.1.xsd
	http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-4.1.xsd
	http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-4.1.xsd"
	   default-autowire="byName">
	<description>Spring MVC视图解析配置</description>
	<import resource="classpath:spring/hbase-config.xml" />
	<!-- ======================================= -->
	<!-- spring组件扫描 -->
	<!-- ======================================= -->
	<context:component-scan base-package="com.jikefriend.test.hdfs" />

	<!--对静态资源文件的访问-->
	<!--<mvc:resources mapping="/resource/**" location="/resource/" />-->

	<mvc:annotation-driven>
		<mvc:message-converters>
			<bean id="mappingJacksonHttpMessageConverter" class="org.springframework.http.converter.json.MappingJackson2HttpMessageConverter">
				<property name="supportedMediaTypes">
					<list>
						<value>application/json;charset=UTF-8</value>
						<value>application/x-www-form-urlencoded;charset=UTF-8</value>
					</list>
				</property>
			</bean>
			<bean class="org.springframework.http.converter.StringHttpMessageConverter" />
			<bean class="org.springframework.http.converter.FormHttpMessageConverter" />
			<bean class="org.springframework.http.converter.BufferedImageHttpMessageConverter" />
			<bean class="org.springframework.http.converter.ByteArrayHttpMessageConverter" />
			<bean class="org.springframework.http.converter.ResourceHttpMessageConverter" />
		</mvc:message-converters>
	</mvc:annotation-driven>
</beans>

hbase相关配置

hbase-config.xml

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:hdp="http://www.springframework.org/schema/hadoop"
       xmlns:p="http://www.springframework.org/schema/p"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
	http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
	http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

    <context:property-placeholder location="classpath*:**/*.properties"/>

    <context:component-scan base-package="com.jikefriend.test.hdfs.hbase"/>

    <hdp:configuration id="hadoopConfiguration">
    </hdp:configuration>

    <hdp:hbase-configuration configuration-ref="hadoopConfiguration" zk-quorum="${hbase.zk.host}" zk-port="${hbase.zk.port}" >
        zookeeper.znode.parent=/hbase
    </hdp:hbase-configuration>

    <bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
        <property name="configuration" ref="hbaseConfiguration"/>
    </bean>

</beans>

hbase.properties

hbase.zk.host=zk1.com,zk2.com
hbase.zk.port=2181
fs.defaultFS=hdfs://hd.host:8020

8020端口:NameNode 运行 HDFS 协议的端口。结合 NameNode 的主机名称建立其地址。(可以在管理页面看到)


到此,基本的配置信息就已经完成了。

代码编写

实体类User.java

package com.jikefriend.test.hdfs.hbase;

public class User {

    private String name;
    private String email;
    private String password;

    public User(String name, String email, String password) {
        super();
        this.name = name;
        this.email = email;
        this.password = password;
    }

    public String getName() {
        return name;
    }

    public String getEmail() {
        return email;
    }

    public String getPassword() {
        return password;
    }

    @Override
    public String toString() {
        return "User [name=" + name + ", email=" + email + ", password="
                + password + "]";
    }
}

与hbase交互的Repository.java

package com.jikefriend.test.hdfs.hbase;

import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.hadoop.hbase.HbaseTemplate;
import org.springframework.data.hadoop.hbase.RowMapper;
import org.springframework.data.hadoop.hbase.TableCallback;
import org.springframework.stereotype.Component;

import java.util.List;


@Component
public class UserRepository {

    @Autowired
    private HbaseTemplate hbaseTemplate;

    private String tableName = "users";

    public static byte[] CF_INFO = Bytes.toBytes("cfInfo");

    private byte[] qUser = Bytes.toBytes("user");
    private byte[] qEmail = Bytes.toBytes("email");
    private byte[] qPassword = Bytes.toBytes("password");

    public List<User> findAll() {
        return hbaseTemplate.find(tableName, "cfInfo", new RowMapper<User>() {
            public User mapRow(Result result, int rowNum) throws Exception {
                return new User(Bytes.toString(result.getValue(CF_INFO, qUser)),
                        Bytes.toString(result.getValue(CF_INFO, qEmail)),
                        Bytes.toString(result.getValue(CF_INFO, qPassword)));
            }
        });

    }

    public User save(final String userName, final String email,
                     final String password) {
        return hbaseTemplate.execute(tableName, new TableCallback<User>() {
            public User doInTable(HTableInterface table) throws Throwable {
                User user = new User(userName, email, password);
                Put p = new Put(Bytes.toBytes(user.getName()));
                p.add(CF_INFO, qUser, Bytes.toBytes(user.getName()));
                p.add(CF_INFO, qEmail, Bytes.toBytes(user.getEmail()));
                p.add(CF_INFO, qPassword, Bytes.toBytes(user.getPassword()));
                table.put(p);
                return user;
            }
        });
    }

}

最后是用于验证的Controller.java

package com.jikefriend.test.hdfs.hbase;

import com.alibaba.fastjson.JSON;
import org.springframework.data.hadoop.hbase.HbaseTemplate;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;

import javax.annotation.Resource;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.util.List;

@Controller
public class UserController {

    @Resource
    private UserRepository userRepository;

    @ResponseBody
    @RequestMapping("/hadoop/hbase/find/all")
    public String findAll(HttpServletRequest request, HttpServletResponse response) {
        try {
            List<User> list = userRepository.findAll();
            System.out.println(JSON.toJSON(list));
        } catch (Exception e) {
            e.printStackTrace();
        }

        return "ok";
    }

    @ResponseBody
    @RequestMapping("/hadoop/hbase/find/save")
    public String save(HttpServletRequest request, HttpServletResponse response,
                       @RequestParam("userName") String userName,
                       @RequestParam("email") String email,
                       @RequestParam("password") String password) {
        try {
            User user = userRepository.save(userName, email, password);
            System.out.println(JSON.toJSON(user));
        } catch (Exception e) {
            e.printStackTrace();
        }

        return "ok";
    }
}



到此就算结束了,可以通过

hbase shell来进行验证


数据导入hbase

Sqoop是Apache顶级项目,主要用来在Hadoop和关系数据库中传递数据。通过sqoop,我们可以方便的将数据从关系数据库导入到HDFS,或者将数据从HDFS导出到关系数据库。


使用如下命令,将mysql中某张表的数据导入到hbase中

sqoop import --connect jdbc:mysql://mysql.host:mysqlport/databasename \
--table [mysqltable] --hbase-table [hbasetable] --column-family testfamily  \
--hbase-row-key id --hbase-create-table --username [mysqluname] --password [mysqlpwd]



  • --connect jdbc:mysql://mysql.host:mysqlport/databasename     mysql.host为Mysql 服务的Host,mysqlport是Mysql的监听端口,databasename是数据库名称
  • --table [mysqltable]                 mysqltable表示要导出数据库的表。
  • --hbase-table [hbasetable]      hbasetable表示要在在HBase中建立的表。
  • --column-family testfamily       表示在表hbasetable中建立列族testfamily。
  • --hbase-row-key id                  表示表A的row-key是hbasetable表的id字段。
  • --hbase-create-table               表示在HBase中建立表。
  • --username [mysqluname]      mysqluname表示连接Mysql的用户名。
  • --password [mysqlpwd]           mysqlpwd表示连接Mysql的用户密码。
参考

Spring for Apache Hadoop,http://projects.spring.io/spring-hadoop/

代码参考,https://github.com/spring-projects/spring-hadoop-samples/tree/master/hbase

sqoop导入数据,http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html,