最近项目要用Hadoop来做数据分析,所以让我先熟悉一下相关的知识,第一次接触,公司也没有人有这方面的经验,只好自己摸着石头过河,理论知识只能在之后的使用过程中慢慢积累和深入学习,现在只能先想办法把项目跑起来。
环境的搭建是运维给搞好的,Hadoop是2.6.0版本,这里需要注意的是,相关端口需要运维提前给好相关权限,我用了半天时间找一个问题,最后发现是一个端口没有权限造成的,白白浪费了时间。
配置
pom文件
内容没有给全,主要贴出了hbase需要的,其他的根据个人需要而添加
<properties>
<spring.version>4.1.6.RELEASE</spring.version>
<spring.hadoop.version>2.4.0.RELEASE</spring.hadoop.version>
<hadoop.version>2.6.0</hadoop.version>
<hbase.version>1.3.1</hbase.version>
</properties>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-tx</artifactId>
<version>${spring.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-hadoop</artifactId>
<version>${spring.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>org.springframework</groupId>
<artifactId>spring-context-support</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- hbase -->
<dependency>
<groupId>com.yammer.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
web.xml
<?xml version="1.0" encoding="UTF-8"?>
<web-app version="3.0"
xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd">
<display-name>${maven.project.name}</display-name>
<listener>
<listener-class>org.springframework.web.util.IntrospectorCleanupListener</listener-class>
</listener>
<!-- ========================================================= -->
<!-- Spring配置 -->
<!-- ========================================================= -->
<listener>
<description>spring监听器</description>
<listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
</listener>
<context-param>
<param-name>contextConfigLocation</param-name>
<param-value>classpath*:/spring/*.xml</param-value>
</context-param>
<!-- ========================================== -->
<!-- 字符集过滤器,对request和response中的字符编码 -->
<!-- ========================================== -->
<filter>
<description>字符集过滤器</description>
<filter-name>characterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<description>字符集编码</description>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>characterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
<!-- ====================== -->
<!-- SpringMVC配置 -->
<!-- ====================== -->
<servlet>
<servlet-name>springmvc</servlet-name>
<servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
<init-param>
<description>spring mvc 配置文件</description>
<param-name>contextConfigLocation</param-name>
<param-value>classpath*:/servlet-context.xml</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>springmvc</servlet-name>
<url-pattern>/</url-pattern>
</servlet-mapping>
<welcome-file-list>
<welcome-file>/index.jsp</welcome-file>
</welcome-file-list>
</web-app>
servlet-context.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:aop="http://www.springframework.org/schema/aop"
xmlns:mvc="http://www.springframework.org/schema/mvc"
xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:p="http://www.springframework.org/schema/p"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.1.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.1.xsd
http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-4.1.xsd
http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-4.1.xsd
http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-4.1.xsd"
default-autowire="byName">
<description>Spring MVC视图解析配置</description>
<import resource="classpath:spring/hbase-config.xml" />
<!-- ======================================= -->
<!-- spring组件扫描 -->
<!-- ======================================= -->
<context:component-scan base-package="com.jikefriend.test.hdfs" />
<!--对静态资源文件的访问-->
<!--<mvc:resources mapping="/resource/**" location="/resource/" />-->
<mvc:annotation-driven>
<mvc:message-converters>
<bean id="mappingJacksonHttpMessageConverter" class="org.springframework.http.converter.json.MappingJackson2HttpMessageConverter">
<property name="supportedMediaTypes">
<list>
<value>application/json;charset=UTF-8</value>
<value>application/x-www-form-urlencoded;charset=UTF-8</value>
</list>
</property>
</bean>
<bean class="org.springframework.http.converter.StringHttpMessageConverter" />
<bean class="org.springframework.http.converter.FormHttpMessageConverter" />
<bean class="org.springframework.http.converter.BufferedImageHttpMessageConverter" />
<bean class="org.springframework.http.converter.ByteArrayHttpMessageConverter" />
<bean class="org.springframework.http.converter.ResourceHttpMessageConverter" />
</mvc:message-converters>
</mvc:annotation-driven>
</beans>
hbase相关配置
hbase-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:hdp="http://www.springframework.org/schema/hadoop"
xmlns:p="http://www.springframework.org/schema/p"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">
<context:property-placeholder location="classpath*:**/*.properties"/>
<context:component-scan base-package="com.jikefriend.test.hdfs.hbase"/>
<hdp:configuration id="hadoopConfiguration">
</hdp:configuration>
<hdp:hbase-configuration configuration-ref="hadoopConfiguration" zk-quorum="${hbase.zk.host}" zk-port="${hbase.zk.port}" >
zookeeper.znode.parent=/hbase
</hdp:hbase-configuration>
<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
<property name="configuration" ref="hbaseConfiguration"/>
</bean>
</beans>
hbase.properties
hbase.zk.host=zk1.com,zk2.com
hbase.zk.port=2181
fs.defaultFS=hdfs://hd.host:8020
8020端口:NameNode 运行 HDFS 协议的端口。结合 NameNode 的主机名称建立其地址。(可以在管理页面看到)
到此,基本的配置信息就已经完成了。
代码编写
实体类User.java
package com.jikefriend.test.hdfs.hbase;
public class User {
private String name;
private String email;
private String password;
public User(String name, String email, String password) {
super();
this.name = name;
this.email = email;
this.password = password;
}
public String getName() {
return name;
}
public String getEmail() {
return email;
}
public String getPassword() {
return password;
}
@Override
public String toString() {
return "User [name=" + name + ", email=" + email + ", password="
+ password + "]";
}
}
与hbase交互的Repository.java
package com.jikefriend.test.hdfs.hbase;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.hadoop.hbase.HbaseTemplate;
import org.springframework.data.hadoop.hbase.RowMapper;
import org.springframework.data.hadoop.hbase.TableCallback;
import org.springframework.stereotype.Component;
import java.util.List;
@Component
public class UserRepository {
@Autowired
private HbaseTemplate hbaseTemplate;
private String tableName = "users";
public static byte[] CF_INFO = Bytes.toBytes("cfInfo");
private byte[] qUser = Bytes.toBytes("user");
private byte[] qEmail = Bytes.toBytes("email");
private byte[] qPassword = Bytes.toBytes("password");
public List<User> findAll() {
return hbaseTemplate.find(tableName, "cfInfo", new RowMapper<User>() {
public User mapRow(Result result, int rowNum) throws Exception {
return new User(Bytes.toString(result.getValue(CF_INFO, qUser)),
Bytes.toString(result.getValue(CF_INFO, qEmail)),
Bytes.toString(result.getValue(CF_INFO, qPassword)));
}
});
}
public User save(final String userName, final String email,
final String password) {
return hbaseTemplate.execute(tableName, new TableCallback<User>() {
public User doInTable(HTableInterface table) throws Throwable {
User user = new User(userName, email, password);
Put p = new Put(Bytes.toBytes(user.getName()));
p.add(CF_INFO, qUser, Bytes.toBytes(user.getName()));
p.add(CF_INFO, qEmail, Bytes.toBytes(user.getEmail()));
p.add(CF_INFO, qPassword, Bytes.toBytes(user.getPassword()));
table.put(p);
return user;
}
});
}
}
最后是用于验证的Controller.java
package com.jikefriend.test.hdfs.hbase;
import com.alibaba.fastjson.JSON;
import org.springframework.data.hadoop.hbase.HbaseTemplate;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;
import javax.annotation.Resource;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.util.List;
@Controller
public class UserController {
@Resource
private UserRepository userRepository;
@ResponseBody
@RequestMapping("/hadoop/hbase/find/all")
public String findAll(HttpServletRequest request, HttpServletResponse response) {
try {
List<User> list = userRepository.findAll();
System.out.println(JSON.toJSON(list));
} catch (Exception e) {
e.printStackTrace();
}
return "ok";
}
@ResponseBody
@RequestMapping("/hadoop/hbase/find/save")
public String save(HttpServletRequest request, HttpServletResponse response,
@RequestParam("userName") String userName,
@RequestParam("email") String email,
@RequestParam("password") String password) {
try {
User user = userRepository.save(userName, email, password);
System.out.println(JSON.toJSON(user));
} catch (Exception e) {
e.printStackTrace();
}
return "ok";
}
}
到此就算结束了,可以通过
hbase shell来进行验证
数据导入hbase
Sqoop是Apache顶级项目,主要用来在Hadoop和关系数据库中传递数据。通过sqoop,我们可以方便的将数据从关系数据库导入到HDFS,或者将数据从HDFS导出到关系数据库。
使用如下命令,将mysql中某张表的数据导入到hbase中
sqoop import --connect jdbc:mysql://mysql.host:mysqlport/databasename \
--table [mysqltable] --hbase-table [hbasetable] --column-family testfamily \
--hbase-row-key id --hbase-create-table --username [mysqluname] --password [mysqlpwd]
- --connect jdbc:mysql://mysql.host:mysqlport/databasename mysql.host为Mysql 服务的Host,mysqlport是Mysql的监听端口,databasename是数据库名称
- --table [mysqltable] mysqltable表示要导出数据库的表。
- --hbase-table [hbasetable] hbasetable表示要在在HBase中建立的表。
- --column-family testfamily 表示在表hbasetable中建立列族testfamily。
- --hbase-row-key id 表示表A的row-key是hbasetable表的id字段。
- --hbase-create-table 表示在HBase中建立表。
- --username [mysqluname] mysqluname表示连接Mysql的用户名。
- --password [mysqlpwd] mysqlpwd表示连接Mysql的用户密码。
Spring for Apache Hadoop,http://projects.spring.io/spring-hadoop/
代码参考,https://github.com/spring-projects/spring-hadoop-samples/tree/master/hbase
sqoop导入数据,http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html,