使用Java API与HBase集群交互时,需要先创建一个HTable的实例,再使用该实例提供的方法来进行插入/删除/查询等操作。
要创建HTable对象,要先创建一个包含了HBase集群信息的配置实例Configuration conf,其一般创建方法如下:
Configuration conf = HBaseConfiguration.create();
//设置HBase集群的IP和端口
conf.set("hbase.zookeeper.quorum", "XX.XXX.X.XX");
conf.set("hbase.zookeeper.property.clientPort", "2181");
在拥有了conf之后,可以通过HTable提供的如下两种构造方法来创建HTable实例:
方法一:直接利用conf来创建HTable实例
对应的构造函数如下:
public HTable(Configuration conf, final TableName tableName)
throws IOException {
this.tableName = tableName;
this.cleanupPoolOnClose = this.cleanupConnectionOnClose = true;
if (conf == null) {
this.connection = null;
return;
}
this.connection = HConnectionManager.getConnection(conf);
this.configuration = conf;
this.pool = getDefaultExecutor(conf);
this.finishSetup();
}
注意红色部分的代码。在这种构造方法中,会调用HConnectionManager的getConnection函数,这个函数以conf作为输入参数,来获取了一个HConnection的实例connection。熟悉odbc,jdbc的话,会知道使用Java API进行数据库操作的时候,都会创建一个类似的connection/connection pool来维护一些数据库与客户端之间相互的连接。对于Hbase来说,承担类似角色的就是HConnection。不过与oracle不同的一点是,HConnection实际上去连接的并不是HBase集群本身,而是维护其关键数据信息的Zookeeper(简称ZK)集群。有关ZK的内容在这里不做展开,不熟悉的话可以单纯地理解为一个独立的元信息管理角色。回过来看getConnection函数,其具体实现如下:
public static HConnection getConnection(final Configuration conf)
throws IOException {
HConnectionKey connectionKey = new HConnectionKey(conf);
synchronized (CONNECTION_INSTANCES) {
HConnectionImplementation connection = CONNECTION_INSTANCES.get(connectionKey);
if (connection == null) {
connection = (HConnectionImplementation)createConnection(conf, true);
CONNECTION_INSTANCES.put(connectionKey, connection);
} else if (connection.isClosed()) {
HConnectionManager.deleteConnection(connectionKey, true);
connection = (HConnectionImplementation)createConnection(conf, true);
CONNECTION_INSTANCES.put(connectionKey, connection);
}
connection.incCount();
return connection;
}
}
其中,CONNECTION_INSTANCES的类型是LinkedHashMap
HConnectionKey(Configuration conf) {
Map<String, String> m = new HashMap<String, String>();
if (conf != null) {
for (String property : CONNECTION_PROPERTIES) {
String value = conf.get(property);
if (value != null) {
m.put(property, value);
}
}
}
this.properties = Collections.unmodifiableMap(m);
try {
UserProvider provider = UserProvider.instantiate(conf);
User currentUser = provider.getCurrent();
if (currentUser != null) {
username = currentUser.getName();
}
} catch (IOException ioe) {
HConnectionManager.LOG.warn("Error obtaining current user, skipping username in HConnectionKey", ioe);
}
}
public int hashCode() {
final int prime = 31;
int result = 1;
if (username != null) {
result = username.hashCode();
}
for (String property : CONNECTION_PROPERTIES) {
String value = properties.get(property);
if (value != null) {
result = prime * result + value.hashCode();
}
}
return result;
}
可以看到,hashCode函数被重写以后,其返回值实际上是username的hashCode函数的返回值,而username来自于currentuser,currentuser又来自于provider,provider是由conf创建的。可以看出,只要有相同的conf,就能创建出相同的username,也就能保证HConnectionKey的hashCode函数被重写以后,能够在username相同时返回相同的值。而CONNECTION_INSTANCES是一个LinkedHashMap,其get函数会调用HConnectionKey的hashCode函数来判断该对象是否已经存在。因此,getConnection函数的本质就是根据conf信息返回connection对象,对每一个内容相同的conf,只会返回一个connection
方法二:调用createConnection方法来显式地创建Hconnection的实例,再将其作为输入参数来创建HTable实例
createConnection方法和Htable对应的构造函数分别如下:
public static HConnection createConnection(Configuration conf) throws IOException {
UserProvider provider = UserProvider.instantiate(conf);
return createConnection(conf, false, null, provider.getCurrent());
}
static HConnection createConnection(final Configuration conf, final boolean managed,final ExecutorService pool, final User user)
throws IOException {
String className = conf.get("hbase.client.connection.impl",HConnectionManager.HConnectionImplementation.class.getName());
Class<?> clazz = null;
try {
clazz = Class.forName(className);
} catch (ClassNotFoundException e) {
throw new IOException(e);
}
try {
// Default HCM#HCI is not accessible; make it so before invoking.
Constructor<?> constructor =
clazz.getDeclaredConstructor(Configuration.class,
boolean.class, ExecutorService.class, User.class);
constructor.setAccessible(true);
return (HConnection) constructor.newInstance(conf, managed, pool, user);
} catch (Exception e) {
throw new IOException(e);
}
}
public HTable(TableName tableName, HConnection connection) throws IOException {
this.tableName = tableName;
this.cleanupPoolOnClose = true;
this.cleanupConnectionOnClose = false;
this.connection = connection;
this.configuration = connection.getConfiguration();
this.pool = getDefaultExecutor(this.configuration);
this.finishSetup();
}
可以看出,这种构造HTable的方法会通过反射来创建一个新的HConnection实例,而不像方法一中那样共享一个HConnection实例。
值得一提的是,通过此种方法创建出来的HConnection,是需要在不再使用的时候显式调用close方法去释放掉的,否则容易造成端口占用等问题。