地址:http://aperise.iteye.com/blog/2372505

源码解读--(1)hbase客户端源代码

http://aperise.iteye.com/blog/2372350

源码解读--(2)hbase-examples BufferedMutator Example

http://aperise.iteye.com/blog/2372505

源码解读--(3)hbase-examples MultiThreadedClientExample

http://aperise.iteye.com/blog/2372534

1.摒弃HTable,直接创建HTable里的BufferedMutator对象操作hbase客户端完全可行

    在前面的hbase客户端源代码分析中,我们客户端的创建方式如下:


Hbase的rit问题 hbase mutator_hbase



  1. //默认connection实现是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation  
  2. Connection connection = ConnectionFactory.createConnection(configuration);        
  3. //默认table实现是org.apache.hadoop.hbase.client.HTable  
  4. Table table = connection.getTable(TableName.valueOf("tableName"));   


  1. 默认我们拿到了connection的实现org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation,里面我们需要注意的是通过setupRegistry()类设置了与zookeeper交互的重要类org.apache.hadoop.hbase.client.ZookeeperRegistry类,后续与zookeeper交互都由此类完成
  2. 然后通过connection拿到了table的实现org.apache.hadoop.hbase.client.HTable
  3. 最后发现org.apache.hadoop.hbase.client.HTable归根结底持有的就是BufferedMutatorImpl类型的属性mutator,所有后续的操作都是基于mutator操作

    那么其实我们操作hbase客户端,完全可以摒弃HTable对象,直接构建BufferedMutator,然后操作hbase,正如所料在hbase的源码模块hbase-examples里也正好提到了这种使用方法,使用的关键代码如下:


Hbase的rit问题 hbase mutator_hbase



1. Configuration configuration = HBaseConfiguration.create();        
2. configuration.set("hbase.zookeeper.property.clientPort", "2181");        
3. configuration.set("hbase.client.write.buffer", "2097152");        
4. configuration.set("hbase.zookeeper.quorum","192.168.199.31,192.168.199.32,192.168.199.33,192.168.199.34,192.168.199.35");  
5.   
6. BufferedMutatorParams params = new BufferedMutatorParams(TableName.valueOf("tableName"));  
7.   
8. //3177不是我杜撰的,是2*hbase.client.write.buffer/put.heapSize()计算出来的   
9. int bestBathPutSize = 3177;     
10.   
11. //这里利用jdk1.7里的新特性try(必须实现java.io.Closeable的对象){}catch (Exception e) {}
12. //相当于调用了finally功能,调用(必须实现java.io.Closeable的对象)的close()方法,也即会调用conn.close(),mutator.close()
13. try(  
14. //默认connection实现是org.apache.hadoop.hbase.client.ConnectionManager.HConnectionImplementation 
15.   Connection conn = ConnectionFactory.createConnection(configuration);  
16. //默认mutator实现是org.apache.hadoop.hbase.client.BufferedMutatorImpl
17.   BufferedMutator mutator = conn.getBufferedMutator(params);  
18. ){           
19. new
20. for(int count=0;count<100000;count++){      
21. new
22. "columnFamily1".getBytes(), "columnName1".getBytes(), "columnValue1".getBytes());      
23. "columnFamily1".getBytes(), "columnName2".getBytes(), "columnValue2".getBytes());      
24. "columnFamily1".getBytes(), "columnName3".getBytes(), "columnValue3".getBytes());      
25.     put.setDurability(Durability.SKIP_WAL);    
26.     putLists.add(put);      
27.           
28. if(putLists.size()==bestBathPutSize){      
29. //达到最佳大小值了,马上提交一把    
30.         mutator.mutate(putLists);     
31.         mutator.flush();  
32.         putLists.clear();  
33.     }      
34.   }      
35. //剩下的未提交数据,最后做一次提交       
36.   mutator.mutate(putLists);     
37.   mutator.flush();  
38. }catch(IOException e) {  
39. "exception while creating/destroying Connection or BufferedMutator", e);  
40. }


 

2.BufferedMutatorParams

BufferedMutatorParams主要是收集构造BufferedMutator对象的参数信息,这些参数包括hbase数据表名、hbase客户端缓冲区、hbase rowkey最大所占空间、线程池和监听hbase操作的回调监听器(比如监听hbase写入失败)


Hbase的rit问题 hbase mutator_hbase



1. package
2.   
3. import
4.   
5. import
6. import
7. import
8.   
9. /**
10.  * 构造BufferedMutator对象的类BufferedMutatorParams
11.  */
12. @InterfaceAudience.Public  
13. @InterfaceStability.Evolving  
14. public class
15.   
16. static final int UNSET = -1;  
17.   
18. private final TableName tableName;//hbase数据表
19. private long writeBufferSize = UNSET;//hbase客户端缓冲区
20. private int maxKeyValueSize = UNSET;//hbase rowkey最大所占空间
21. private ExecutorService pool = null;//线程池
22. private BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() {//监听hbase操作的回调监听器,比如监听hbase写入失败
23. @Override
24. public void
25.         BufferedMutator bufferedMutator)  
26. throws
27. throw
28.     }  
29.   };  
30.   
31. public BufferedMutatorParams(TableName tableName) {//构造方法
32. this.tableName = tableName;  
33.   }  
34.   
35. public TableName getTableName() {//获取表名
36. return
37.   }  
38.   
39. public long getWriteBufferSize() {//获取写缓冲区大小
40. return
41.   }  
42.   
43. /**
44.    * 重写缓冲区设置函数
45.    */
46. public BufferedMutatorParams writeBufferSize(long
47. this.writeBufferSize = writeBufferSize;  
48. return this;  
49.   }  
50.   
51. public int getMaxKeyValueSize() {//获取rowkey所占空间
52. return
53.   }  
54.   
55. /**
56.    * 重写设置rowkey所占空间的函数
57.    */
58. public BufferedMutatorParams maxKeyValueSize(int
59. this.maxKeyValueSize = maxKeyValueSize;  
60. return this;  
61.   }  
62.   
63. public ExecutorService getPool() {//获取线程池
64. return
65.   }  
66.     
67. public BufferedMutatorParams pool(ExecutorService pool) {//构造函数
68. this.pool = pool;  
69. return this;  
70.   }  
71.   
72. public BufferedMutator.ExceptionListener getListener() {//获取监听器
73. return
74.   }  
75.     
76. public BufferedMutatorParams listener(BufferedMutator.ExceptionListener listener) {//构造函数
77. this.listener = listener;  
78. return this;  
79.   }  
80. }


3.BufferedMutator

    BufferedMutator是一个接口,主要定义了一些抽象方法:


Hbase的rit问题 hbase mutator_hbase



1. public interface BufferedMutator extends
2. //获取表名
3. //获取hadoop配置对象Configuration
4. void mutate(Mutation mutation) throws IOException;//操作缓冲区
5. void mutate(List<? extends Mutation> mutations) throws IOException;//批量操作缓冲区
6. @Override
7. void close() throws IOException;//实现Closeable接口,这样可以利用JDK1.7新特性不写finally就可以关闭对象
8. void flush() throws IOException;//想hbase服务端提交数据请求
9. long getWriteBufferSize();//获取写缓冲区大小
10. @InterfaceAudience.Public  
11. @InterfaceStability.Evolving  
12. interface ExceptionListener {//监听器
13. public void
14. throws
15.   }  
16. }


4.BufferedMutatorImpl


Hbase的rit问题 hbase mutator_hbase



1. package
2.   
3. import
4. import
5. import
6. import
7. import
8. import
9. import
10. import
11.   
12. import
13. import
14. import
15. import
16. import
17. import
18. import
19. import
20. import
21.   
22. /**
23.  * hbase1.0.0才开始使用BufferedMutatorImpl
24.  * 主要用于在多线程中操作同一个数据表
25.  * 需要注意的是多线程中共享一个BufferedMutator对象,如果某个线程中出错,其他线程也会出错
26.  */
27. @InterfaceAudience.Private  
28. @InterfaceStability.Evolving  
29. public class BufferedMutatorImpl implements
30.   
31. private static final Log LOG = LogFactory.getLog(BufferedMutatorImpl.class);  
32.     
33. private final ExceptionListener listener;//hbase客户端每次操作的监听回调对象
34.   
35. protected ClusterConnection connection; //持有的链接
36. private final TableName tableName;//hbase数据表
37. private volatile Configuration conf;//hadoop配置类Configuration
38. @VisibleForTesting
39. final ConcurrentLinkedQueue<Mutation> writeAsyncBuffer = new ConcurrentLinkedQueue<Mutation>();//hbase缓冲区队列
40. @VisibleForTesting
41. new AtomicLong(0);//线程安全的长整型值,主要累计当前在缓冲区中数据所占空间大小
42.   
43. private long writeBufferSize;//hbase客户端缓冲区大小
44. private final int maxKeyValueSize;//hbase客户端rowkey所占最大空间
45. private boolean closed = false;//hbase客户端是否已经被关闭
46. private final ExecutorService pool;//hbase客户端使用的线程池
47.   
48. @VisibleForTesting
49. protected AsyncProcess ap; //hbase客户端异步操作对象
50.   
51.   BufferedMutatorImpl(ClusterConnection conn, RpcRetryingCallerFactory rpcCallerFactory,  
52.       RpcControllerFactory rpcFactory, BufferedMutatorParams params) {  
53. if (conn == null
54. throw new IllegalArgumentException("Connection is null or closed.");  
55.     }  
56.   
57. this.tableName = params.getTableName();  
58. this.connection = conn;  
59. this.conf = connection.getConfiguration();  
60. this.pool = params.getPool();  
61. this.listener = params.getListener();  
62.   
63. //基于传入的conf构建自己的属性ConnectionConfiguration,客户端没有设置的配置会自动使用默认值
64. new
65. //设置缓冲区大小
66. this.writeBufferSize = params.getWriteBufferSize() != BufferedMutatorParams.UNSET ? params.getWriteBufferSize() : tableConf.getWriteBufferSize();  
67. //设置rowkey最大所占空间
68. this.maxKeyValueSize = params.getMaxKeyValueSize() != BufferedMutatorParams.UNSET ? params.getMaxKeyValueSize() : tableConf.getMaxKeyValueSize();  
69.   
70. //hbase客户端异步操作对象
71. new AsyncProcess(connection, conf, pool, rpcCallerFactory, true, rpcFactory);  
72.   }  
73.   
74. @Override
75. public TableName getName() {//获取表名
76. return
77.   }  
78.   
79. @Override
80. public Configuration getConfiguration() {//获取hadoop配置对象Configuration,这里是客户端传入的conf
81. return
82.   }  
83.   
84. @Override
85. public void mutate(Mutation m) throws
86. //操作缓冲区
87.     mutate(Arrays.asList(m));  
88.   }  
89.   
90. @Override
91. public void mutate(List<? extends Mutation> ms) throws
92. //如果BufferedMutatorImpl已经关闭,直接退出返回  
93. if
94. throw new IllegalStateException("Cannot put when the BufferedMutator is closed.");    
95.     }    
96.     
97. //这里先不断循环累计提交的List<Put>记录所占的空间,放置到toAddSize  
98. long toAddSize = 0;    
99. for
100. if (m instanceof
101.         validatePut((Put) m);    
102.       }    
103.       toAddSize += m.heapSize();    
104.     }    
105.     
106. // This behavior is highly non-intuitive... it does not protect us against  
107. // 94-incompatible behavior, which is a timing issue because hasError, the below code  
108. // and setter of hasError are not synchronized. Perhaps it should be removed.  
109. if
110. //设置BufferedMutatorImpl当前记录的提交记录所占空间值为toAddSize  
111.       currentWriteBufferSize.addAndGet(toAddSize);    
112. //把提交的记录List<Put>放置到缓存对象writeAsyncBuffer,在为提交完成前先不进行清理  
113.       writeAsyncBuffer.addAll(ms);    
114. //这里当捕获到异常时候,再进行异常前的一次数据提交  
115. true);    
116. else
117. //设置BufferedMutatorImpl当前记录的提交记录所占空间值为toAddSize  
118.       currentWriteBufferSize.addAndGet(toAddSize);    
119. //把提交的记录List<Put>放置到缓存对象writeAsyncBuffer,在为提交完成前先不进行清理  
120.       writeAsyncBuffer.addAll(ms);    
121.     }    
122.     
123. // Now try and queue what needs to be queued.  
124. // 如果当前提交的List<Put>记录所占空间大于hbase.client.write.buffer设置的值,默认2MB,那么就马上调用backgroundFlushCommits方法  
125. // 如果小于hbase.client.write.buffer设置的值,那么就直接退出,啥也不做  
126. while
127. false);    
128.     }    
129.   }    
130.   
131. // 校验Put
132. public void validatePut(final Put put) throws
133.     HTable.validatePut(put, maxKeyValueSize);  
134.   }  
135.   
136. @Override
137. public synchronized void close() throws
138. try
139. if (this.closed) {//如果已经关闭了,直接返回
140. return;  
141.       }  
142.         
143. //关闭前做最后一次提交
144. true);  
145. this.pool.shutdown();//关闭线程池
146. boolean
147. int loopCnt = 0;  
148. do
149. // wait until the pool has terminated
150. this.pool.awaitTermination(60, TimeUnit.SECONDS);  
151. 1;  
152. if (loopCnt >= 10) {  
153. "close() failed to terminate pool after 10 minutes. Abandoning pool.");  
154. break;  
155.         }  
156. while
157.   
158. catch
159. "waitForTermination interrupted");  
160.   
161. finally
162. this.closed = true;  
163.     }  
164.   }  
165.   
166. @Override
167. public synchronized void flush() throws
168. //主动调用flush提交数据到hbase服务端
169. true);  
170.   }  
171.   
172. private void backgroundFlushCommits(boolean synchronous) throws
173. new
174. // Keep track of the size so that this thread doesn't spin forever  
175. long dequeuedSize = 0;    
176.     
177. try
178. //分析所有提交的List<Put>,Put是Mutation的实现  
179.       Mutation m;    
180. //如果(hbase.client.write.buffer <= 0 || 0 < (whbase.client.write.buffer * 2) || synchronous)&& writeAsyncBuffer里仍然有Mutation对象  
181. //那么就不断计算所占空间大小dequeuedSize  
182. //currentWriteBufferSize的大小则递减  
183. while ((writeBufferSize <= 0 || dequeuedSize < (writeBufferSize * 2) || synchronous) && (m = writeAsyncBuffer.poll()) != null) {    
184.         buffer.add(m);    
185. long
186.         dequeuedSize += size;    
187.         currentWriteBufferSize.addAndGet(-size);    
188.       }    
189.     
190. //backgroundFlushCommits(false)时候,当List<Put>,这里不会进入  
191. if (!synchronous && dequeuedSize == 0) {    
192. return;    
193.       }    
194.     
195. //backgroundFlushCommits(false)时候,这里会进入,并且不会等待结果返回  
196. if
197. //不会等待结果返回  
198. true, null, false);    
199. if
200. ": One or more of the operations have failed -"
201. " waiting for all operation in progress to finish (successfully or not)");    
202.         }    
203.       }    
204. //backgroundFlushCommits(true)时候,这里会进入,并且会等待结果返回  
205. if
206. while
207. true, null, false);    
208.         }    
209. //会等待结果返回  
210. null);    
211. if (error != null) {    
212. if (listener == null) {    
213. throw
214. else
215. this.listener.onException(error, this);    
216.           }    
217.         }    
218.       }    
219. finally
220. //如果还有数据,那么给到外面最后提交  
221. for
222. long
223.         currentWriteBufferSize.addAndGet(size);    
224.         dequeuedSize -= size;    
225.         writeAsyncBuffer.add(mut);    
226.       }    
227.     }    
228.   }   
229.   
230. /**
231.    * 设置hbase客户端缓冲区所占空间大小
232.    */
233. @Deprecated
234. public void setWriteBufferSize(long writeBufferSize) throws
235.       InterruptedIOException {  
236. this.writeBufferSize = writeBufferSize;  
237. if
238.       flush();  
239.     }  
240.   }  
241.   
242. /**
243.    * 获取写缓冲区大小
244.    */
245. @Override
246. public long
247. return this.writeBufferSize;  
248.   }  
249.   
250.   
251. @Deprecated
252. public
253. return Arrays.asList(writeAsyncBuffer.toArray(new Row[0]));  
254.   }  
255. }


  

5.BufferedMutatorExample

    在hbase的源代码模块hbase-examples里提供了使用hbase客户端的例子,这个java类是BufferedMutatorExample,从这个类里面告诉了我们另外一种操作hbase客户端的实现,其代码如下:

 


Hbase的rit问题 hbase mutator_hbase



1. import
2. import
3. import
4. import
5. import
6. import
7. import
8. import
9. import
10. import
11. import
12. import
13. import
14.   
15. import
16. import
17. import
18. import
19. import
20. import
21. import
22. import
23. import
24. import
25.   
26. /**
27.  * An example of using the {@link BufferedMutator} interface.
28.  */
29. public class BufferedMutatorExample extends Configured implements
30.   
31. private static final Log LOG = LogFactory.getLog(BufferedMutatorExample.class);  
32.   
33. private static final int POOL_SIZE = 10;// 线程池大小
34. private static final int TASK_COUNT = 100;// 任务数
35. private static final TableName TABLE = TableName.valueOf("foo");// hbase数据表foo
36. private static final byte[] FAMILY = Bytes.toBytes("f");// hbase数据表foo的列簇f
37.   
38. /**
39.      * 重写Tool.run(String [] args)方法,传入的是main函数的参数String[] args
40.      */
41. @Override
42. public int run(String[] args) throws
43.   
44. /** 一个异步回调监听器,在hbase write失败的时候触发. */
45. final BufferedMutator.ExceptionListener listener = new
46. @Override
47. public void
48. for (int i = 0; i < e.getNumExceptions(); i++) {  
49. "Failed to sent put " + e.getRow(i) + ".");  
50.                 }  
51.             }  
52.         };  
53. /** 
54.          * BufferedMutator的构造参数对象BufferedMutatorParams. 
55.          * BufferedMutatorParams参数如下:
56.          *              TableName tableName
57.          *              long writeBufferSize
58.          *              int maxKeyValueSize
59.          *               ExecutorService pool
60.          *               BufferedMutator.ExceptionListener listener
61.          *  这里只设置了属性tableName和listener
62.          * */
63. new
64.           
65. /**
66.          * step 1: 创建一个连接Connection和BufferedMutator对象,供线程池中的所有线程共享使用
67.          *              这里利用了jdk1.7里的新特性try(必须实现java.io.Closeable的对象){}catch (Exception e) {},
68.          *              在调用完毕后会主动调用(必须实现java.io.Closeable的对象)的close()方法,
69.          *              这里也即默认实现了finally的功能,相当于执行了
70.          *              finally{
71.          *                  conn.close();
72.          *                  mutator.close();
73.          *              }
74.          */
75. try
76. final
77. final
78.         ) {  
79. /** 操作BufferedTable对象的工作线程池,大小为10 */
80. final
81. new
82.   
83. /** 不断创建任务,放入线程池执行,任务数为100个 */
84. for (int i = 0; i < TASK_COUNT; i++) {  
85. new
86. @Override
87. public Void call() throws
88. /** 
89.                          * step 2: 所有任务都共同向BufferedMutator的缓冲区发送数据,
90.                          *              所有任务共享BufferedMutator的缓冲区(hbase.client.write.buffer),
91.                          *              所有任务共享回调监听器listener和线程池
92.                          *  */
93.   
94. /** 
95.                          * 这里构造Put对象
96.                          *  */
97. new Put(Bytes.toBytes("someRow"));  
98. "someQualifier"), Bytes.toBytes("some value"));  
99. /** 
100.                          * 添加数据到BufferedMutator的缓冲区(hbase.client.write.buffer),
101.                          * 这里不会立即提交数据到hbase服务端,只会在缓冲区大小大于hbase.client.write.buffer时候才会主动提交数据到服务端
102.                          *  */
103.                         mutator.mutate(p);  
104.                           
105. /** 
106.                          * TODO
107.                          * 这里你可以在退出本任务前自己主动调用mutator.flush()提交数据到hbase服务端
108.                          * mutator.flush();
109.                          *  */
110. return null;  
111.                     }  
112.                 }));  
113.             }  
114.   
115. /**
116.              * step 3: 遍历每个回调任务的Future,如果未执行完,每个Future等待5分钟
117.              */
118. for
119. 5, TimeUnit.MINUTES);  
120.             }  
121. /**
122.              * 最后关闭线程池
123.              */
124.             workerPool.shutdown();  
125. catch
126. // exception while creating/destroying Connection or BufferedMutator
127. "exception while creating/destroying Connection or BufferedMutator", e);  
128.         }  
129. /**
130.          * 这里没有finally代码,原因是前面用了jdk1.7里的新特性try(必须实现java.io.Closeable的对象){}catch (Exception e) {},
131.          * 在调用完毕后会主动调用(必须实现java.io.Closeable的对象)的close()方法,也即会调用conn.close(),mutator.close()
132.          */
133. return 0;  
134.     }  
135.   
136. public static void main(String[] args) throws
137. //调用工具类ToolRunner执行实现了接口Tool的对象BufferedMutatorExample的run方法,同时会把String[] args传入BufferedMutatorExample的run方法
138. new
139.     }  
140. }


 

6.源码收获

  •     BufferedMutator完全可以用于操作hbase客户端;
  •     BufferedMutator可以供多线程共享使用;