开源框架 Apache GORA 提供了一个内存中的大数据的数据模型和持久性。
Gora 支持列存储,关键值存储,文档存储和关系数据库管理系统,具有广泛的Apache Hadoop的MapReduce的支持和分析数据。
GORA使用步骤:
1. gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
2. gora.datastore.autocreateschema=true
2、定义数据源BEAN,以JSON格式定义数据源BEAN,
创建一个json文件,内容如下
1. {
2. "type": "record",
3. "name": "Pageview",
4. "namespace": "org.apache.gora.tutorial.log.generated",
5. "fields" : [
6. {"name": "url", "type": "string"},
7. {"name": "timestamp", "type": "long"},
8. {"name": "ip", "type": "string"},
9. {"name": "httpMethod", "type": "string"},
10. {"name": "httpStatusCode", "type": "int"},
11. {"name": "responseSize", "type": "int"},
12. {"name": "referrer", "type": "string"},
13. {"name": "userAgent", "type": "string"}
14. ]
15. }
3、apache gora使用了arvo框架作为orm映射的实体,这里可以使用gora自带的工具来对json文件进行编译,获取你要的实体对象
1. $ bin/gora goracompile
编译工具说明如下:
1. $ Usage: GoraCompiler <schema file> <output dir> [-license <id>]
2. <schema file> - individual avsc file to be compiled or a directory path containing avsc files
3. <output dir> - output directory for generated Java files
4. [-license <id>] - the preferred license header to add to the
5. generated Java file. Current options include;
6. ASLv2 (Apache Software License v2.0)
7. AGPLv3 (GNU Affero General Public License)
8. CDDLv1 (Common Development and Distribution License v1.0)
9. FDLv13 (GNU Free Documentation License v1.3)
10. GPLv1 (GNU General Public License v1.0)
11. GPLv2 (GNU General Public License v2.0)
12. GPLv3 (GNU General Public License v3.0)
13. LGPLv21 (GNU Lesser General Public License v2.1)
14. LGPLv3 (GNU Lesser General Public License v2.1)
示例:
1. $ bin/gora goracompiler gora-tutorial/src/main/avro/pageview.json gora-tutorial/src/main/java/
4、定义数据存储映射:gora-hbase-mapping.xml
完成以上三部工作之后,接下来需要做的是实体和表的映射配置
示例如下:
1. <!-- This is gora-sql-mapping.xml
2.
3. <gora-orm>
4. <class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog">
5. <primarykey column="line"/>
6. <field name="url" column="url" length="512" primarykey="true"/>
7. <field name="timestamp" column="timestamp"/>
8. <field name="ip" column="ip" length="16"/>
9. <field name="httpMethod" column="httpMethod" length="6"/>
10. <;field name="httpStatusCode" column="httpStatusCode"/>
11. <field name="responseSize" column="responseSize"/>
12. <field name="referrer" column="referrer" length="512"/>
13. <field name="userAgent" column="userAgent" length="512"/>
14. </class>
15.
16. ...
17.
18. </gora-orm>
19.
20. >
21.
22. <gora-orm>
23. <table name="Pageview"> <!-- optional descriptors for tables -->
24. <family name="common"> <!-- This can also have params like compression, bloom filters -->
25. <family name="http"/>
26. <family name="misc"/>
27. </table>
28.
29. <class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog">
30. <field name="url" family="common" qualifier="url"/>
31. <field name="timestamp" family="common" qualifier="timestamp"/>
32. <field name="ip" family="common" qualifier="ip" />
33. <field name="httpMethod" family="http" qualifier="httpMethod"/>
34. <field name="httpStatusCode" family="http" qualifier="httpStatusCode"/>
35. <field name="responseSize" family="http" qualifier="responseSize"/>
36. <field name="referrer" family="misc" qualifier="referrer"/>
37. <field name="userAgent" family="misc" qualifier="userAgent"/>
38. </class>
39.
40. ...
41.
42. </gora-orm>
5、Api
1)、初始化创建HBaseStore对象
1. private void init() throws IOException {
2. class, Pageview.class);
3. }
这里GORA会根据你上面编译的实体类以及gora-hbase-mapping.xml帮你创建好相应的hbase数据库表
2)、数据存储
1. /** Stores the pageview object with the given key */
2. private void storePageview(long key, Pageview pageview) throws IOException {
3. dataStore.put(key, pageview);
4. }
3)、读取数据
1. /** Fetches a single pageview object and prints it*/
2. private void get(long key) throws IOException {
3. Pageview pageview = dataStore.get(key);
4. printPageview(pageview);
5. }
4)、查询
1. /** Queries and prints pageview object that have keys between startKey and endKey*/
2. private void query(long startKey, long endKey) throws IOException {
3. Query<Long, Pageview> query = dataStore.newQuery();
4. //set the properties of query
5. query.setStartKey(startKey);
6. query.setEndKey(endKey);
7.
8. Result<Long, Pageview> result = query.execute();
9.
10. printResult(result);
11. }
遍历结果
1. private void printResult(Result<Long, Pageview> result) throws IOException {
2.
3. while(result.next()) { //advances the Result object and breaks if at end
4. long resultKey = result.getKey(); //obtain current key
5. //obtain current value object
6.
7. //print the results
8. ":");
9. printPageview(resultPageview);
10. }
11.
12. "Number of pageviews from the query:" + result.getOffset());
13. }
5)、删除数据
1. /**Deletes the pageview with the given line number */
2. private void delete(long lineNum) throws Exception {
3. dataStore.delete(lineNum);
4. //write changes may need to be flushed before they are committed
5. }
6.
7. /** This method illustrates delete by query call */
8. private void deleteByQuery(long startKey, long endKey) throws IOException {
9. //Constructs a query from the dataStore. The matching rows to this query will be deleted
10. Query<Long, Pageview> query = dataStore.newQuery();
11. //set the properties of query
12. query.setStartKey(startKey);
13. query.setEndKey(endKey);
14.
15. dataStore.deleteByQuery(query);
16. }
6)、MapReduce支持
JOB:
1. public Job createJob(DataStore<Long, Pageview> inStore
2. int numReducer) throws IOException {
3. new Job(getConf());
4.
5. "Log Analytics");
6. job.setNumReduceTasks(numReducer);
7. job.setJarByClass(getClass());
8.
9. /* Mappers are initialized with GoraMapper.initMapper() or
10. * GoraInputFormat.setInput()*/
11. class, LongWritable.class
12. class, true);
13.
14. /* Reducers are initialized with GoraReducer#initReducer().
15. * If the output is not to be persisted via Gora, any reducer
16. * can be used instead. */
17. class);
18.
19. return job;
20. }
Mapper:
1. private TextLong tuple;
2.
3. protected void map(Long key, Pageview pageview, Context context)
4. throws IOException ,InterruptedException {
5.
6. Utf8 url = pageview.getUrl();
7. long day = getDay(pageview.getTimestamp());
8.
9. tuple.getKey().set(url.toString());
10. tuple.getValue().set(day);
11.
12. context.write(tuple, one);
13. };
Reducer:
1. protected void reduce(TextLong tuple
2. , Iterable<LongWritable> values, Context context)
3. throws IOException ,InterruptedException {
4.
5. long sum = 0L; //sum up the values
6. for(LongWritable value: values) {
7. sum+= value.get();
8. }
9.
10. String dimension = tuple.getKey().toString();
11. long timestamp = tuple.getValue().get();
12.
13. new Utf8(dimension));
14. metricDatum.setTimestamp(timestamp);
15.
16. String key = metricDatum.getMetricDimension().toString();
17. metricDatum.setMetric(sum);
18.
19. context.write(key, metricDatum);
20. };
)
,dynamodb,cassandra,accumulo。需要的话大伙可以试试其他功能。具体使用与上面的使用方法类似!