cdh5.14.2中集成安装kylin与使用测试

标签(空格分隔): 大数据平台构建


  • 一:kylin 简介
  • 二:安装配置kylin
  • 三:kylin 运行实例

一:kylin 简介

Apache Kylin™是一个开源的分布式分析引擎,提供Hadoop/Spark之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc. 开发并贡献至开源社区。它能在亚秒内查询巨大的Hive表。

image_1cfmo0npv15od1epg1e1e1qq216rlp.png-381.6kB

kylin 软件下载:

社区版kylin下载地址:https://archive.apache.org/dist/kylin/ ,本次测试使用apache-kylin-2.3.1.tar.gz 

image_1cfmo369bv731o2u119d1h8gcc316.png-294.4kB

二:在cdh5.14.2 上面配置安装kylin

2.1: kylin 安装的环境配置

login: node-01.flyfish

tar -zxvf apache-kylin-2.3.1-cdh57-bin.tar.gz -C /usr/local/

cd /usr/local/

mv apache-kylin-2.3.1-bin/ kylin

vim /etc/profile
----
### kylin #### 
export KYLIN_HOME=/usr/local/kylin
PATH=$PATH:$HOME/bin:$KYLIN_HOME/bin
---
source /etc/profile

image_1cfmpltko7qi8c8188e49tgk51j.png-161.4kB

2.2:kylin的启动验证

cd /usr/local/kylin/
./check-env.sh

image_1cfmpog7616c3dudcii6kmvib2g.png-104.8kB

su - hdfs 

hdfs dfs -chmod -R 777 /

image_1cfmpq8t9shuen6ki2t4f13403d.png-77.6kB

从新检测处理
cd /usr/local/kylin/
./check-env.sh

image_1cfmps8015mil3c6ot1atf1r8k3q.png-47.5kB

启动kylin
./kylin.sh start 

image_1cfmpuc6f1i5h3f7mjp1ueg15lg47.png-332.5kB

image_1cfmq37il17q7jsa58un651n374.png-499.3kB

image_1cfmq47lo14i018851dj2lj81gms7h.png-226kB

默认用户名:ADMIN 
密码:KYLIN

image_1cfmq8foa5i91ip8gahrlijjk8e.png-273.2kB

三:kylin 运行实例使用测试

cd /usr/local/kylin/bin

./sample.sh

image_1cfmqtpc06731qdc1f9d7gd1at18r.png-238.1kB

image_1cfmr0lso1rf211dspl7c7r1ign9o.png-330.1kB

image_1cfmr194n14pg1jt51heh19asqvja5.png-421.3kB

image_1cfmr2vji2709151a2maaj18fqc2.png-481.6kB

从启kylin

cd /usr/local/kylin/bin
./kylin.sh stop 
./kylin.sh start 

image_1cfmr8h441m2616d4nj517go1nllcf.png-164.5kB

重新刷新kylin的元数据

image_1cfmrdelj1fe517vehn71ja1kesdp.png-651.3kB

image_1cfmreathjtuqh65i4hjc1k47em.png-527.5kB

image_1cfmrg3mq185q427155q1i35a5ug6.png-551.5kB

image_1cfmrh9jq11v61hbo5b11r011ulsh3.png-353.6kB

impala的加载表:
impala-shell -i "INVALIDATE METADATA" 
单独刷新一张表:
refrash  + 表名

hive 的default库 当中多了几张kylin的表

image_1cfmrq46tp1l1akn1cjjjio1qn7hg.png-90.3kB

构建cube 

image_1cfn261lbr821q281oqq1ko0bnpp.png-381.7kB

image_1cfn2b040mtqogs1qa31rhk67i30.png-304.3kB

这个地方如果机器配置不够的话,尽量日期间隔选小一点。

image_1cfn2brd115871vl94gb26a1u33d.png-335.5kB

image_1cfn2ke301r9gine1jfi187itap.png-380.1kB

这一步会比较耗时,因为这步会进行预计算,默认是MapReduce作业。

image_1cfn2t8r81v8k1empaqc15gf1pm816.png-566kB

image_1cfn3vt7p1lrs1nuv1bc81vg6g5u1j.png-395.4kB

kylin的数据查询

image_1cfn4bquj1m1tnod8cloq15u33g.png-402.1kB

image_1cfn4gajtmp8gj6117717fj1er73t.png-409.9kB

image_1cfn4hcpt11caaesia5p9a21k4a.png-459.8kB

查询构建完成的cube
先运行简单的count,可以看到耗时4.12s,再次执行基本在0.5s级,基本是毫秒级别
就可以查询出来,这是因为kylin 支持缓存的功能

image_1cfn4ihlvhi71sv21s6u1f79qmg4n.png-371kB

复杂的查询:

select sum(KYLIN_SALES.PRICE) 
as price_sum,KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME 
from KYLIN_SALES inner join KYLIN_CATEGORY_GROUPINGS
on KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID and 
KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID
group by KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME
order by KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME asc,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME desc

image_1cfn4p19jfsn1jeu10ag1a82156s54.png-518kB

image_1cfn4phdbkqf1cdt9111f8k1g475h.png-242.7kB

四:kylin 运行实例二

4.1 数据文件准备

create_table.sql  department.csv  employee.csv

image_1cfngq2ae1su2ge7do01d0ohuv9.png-39.6kB

4.2 在hdfs 上面创建文件上传

在hdfs 上面创建上传目录

hdfs dfs -mkdir /kylin-test

hdfs dfs -put department.csv  employee.csv /kylin-test

image_1cfnh06sufhoh8i1amp65c1k78m.png-161.4kB

4.3 在hive 上面执行sql 脚本加载数据与验证

执行create_table.sql 

create_table 内容如下
---
DROP TABLE IF EXISTS employee;

CREATE TABLE employee(
id int,
name string,
deptId int,
age int,
salary float
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

DROP TABLE IF EXISTS department;

CREATE TABLE department(
id int,
name string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

LOAD DATA INPATH '/kylin-test/employee.csv' OVERWRITE INTO TABLE employee;
LOAD DATA INPATH '/kylin-test/department.csv' OVERWRITE INTO TABLE department;
---

在hive 中执行create_table.sql 

hive -f create_table.sql 

image_1cfnhidm943d1vh110ovnd91nou13.png-223.1kB

hive -e "use default;select * from employee"

image_1cfnhkkns1u9k18uv1a303631v0s1g.png-141.9kB

hive -e "use default;select * from department" 

image_1cfnhmr67k5i1gjba4r2gn1aoe1t.png-209.9kB

4.4 在kylin 上面创建project

image_1cfni3tric3n132v17kk2541ego2q.png-260.7kB

image_1cfni5b081pnhrkl1hvsep21qt037.png-285.4kB

image_1cfni5sqq1b2dogl4jd1kuhtad3k.png-275.7kB

加载hive数据到kylin当中

image_1cfni9i60t5qo6d1hhjdka1a2141.png-209.2kB

image_1cfnicpf615th5iuugvfv71g6q4e.png-224.5kB

image_1cfnidhovqhilo69eb1lpag6v4r.png-276.1kB

image_1cfnie73a1a621lsqd861riia2058.png-179.6kB

创建model,入project的名称和描述:

image_1cfnijehl1iq511881lu96b51gm365.png-211.9kB

image_1cfnil81q1gma1t5mphovr21ql86i.png-283.2kB

image_1cfninc7d50sthreqgv771pci6v.png-290.6kB

image_1cfnip2pmv2n1dfd70bg67hpo7c.png-254.3kB

image_1cfnisd321ibr1fr1fs9ffj8f57p.png-343.5kB

选择事实表,并点击add Lookup Table查询表

image_1cfnitmt37ttnvm123f50olup8m.png-335.2kB

选择维度字段

image_1cfnj2b9f1e6mjbs1b4f16pgh4793.png-319.3kB

image_1cfnj41e6ek8mo512i04a7al9g.png-294.3kB

image_1cfnj521def8p05q0kj0ru599t.png-292kB

image_1cfnj721t1bph1vuu704q0t1chbaa.png-299.2kB

image_1cfnj81mpgrj14no28uhqjmhcan.png-323.3kB

image_1cfnj8htijqgkr15jcaoualkb4.png-281.4kB

image_1cfnj9pmka9sh3u1rbr1gvn58bh.png-307.8kB

创建cube

image_1cfnjcsjs11eh1hft1avj1kaf1vtdbu.png-222kB

image_1cfnjeapc9gu1l8rh441gqpm8mcb.png-314.3kB

image_1cfnjfcqr8j21fnu1l8719sc1hpgdo.png-270.4kB

image_1cfnjh93a187q1aru15mr1oq710jfei.png-317.3kB

image_1cfnji6ba162i9r51daehiq24dev.png-364.6kB

image_1cfnjpdju1d5imu91nk1g8s1npugc.png-358.2kB

image_1cfnjqdnf6aj1jb317veipirfagp.png-244.7kB

image_1cfnjrn3semh12s9uocjcatcjh6.png-312.8kB

image_1cfnjsqdaq4bm2c1lba3i41svehj.png-292.2kB

image_1cfnjtc3mi2i286163q1on61kpji0.png-348.6kB

image_1cfnjtr4619501nloc2q17vi1sq8id.png-270.2kB

image_1cfnjub0210a1rmn13fggck1liq.png-351.8kB

cube创建完成

image_1cfnjusi3to41cvq64okmk15erj7.png-328.7kB

构建cube

image_1cfnk0m8p1t7l1gumcu6167k1hc3jk.png-316.4kB

image_1cfnk1bds9onebq69dbp61ba1kh.png-248.3kB

image_1cfnk1qls39an6l995bbgc2oku.png-267.4kB

image_1cfnk3ctn1v6t17jr3b16sjkdllb.png-277.3kB

image_1cfnk58gt12c7opf1p3l12fs1dsflo.png-642.7kB

image_1cfnk64dh1pem1hl712rbo511cc9m5.png-263.8kB

image_1cfnk8da0k8b5mhk481k5m1jm8mi.png-313kB

image_1cfnkcvdm188jsocaf513r16j0mv.png-317.9kB

查询测试:
 select count(*) from department;

 select max(salary) from EMPLOYEE;

image_1cfnkguufpmjo71nv318dkdqunc.png-345.6kB

image_1cfnkjoa547t1f3j1dk31e4g1o0lo9.png-331.9kB

image_1cfnkl7fa10eiob516os1bq1fkpq6.png-238.1kB

统计各部门员工薪资总和:

select d.ID,sum(e.SALARY) as salary from EMPLOYEE as e left join DEPARTMENT as d on e.DEPTID=d.id group by d.ID order by salary desc

image_1cfnkncqonm91i0qttc1sb21qrlqj.png-373.2kB

image_1cfnko11c19ho93e1q7u1ocmu7rr0.png-353.8kB