Tez是Apache开源的DAG作业的计算引擎,是为了减小Hive作业的延迟而提出的解决方案,Tez已被Hortonworks用于Hive引擎的优化,经测试,性能提升约100倍。Tez+Hive仍然采用MapReduce计算框架,但对DAG的依赖关系进行了剪裁,并将多个小作业合并成一个大作业,这样不仅作业量减少了,而且写HDFS的次数也会大大减少。Tez具有以下几个特点: (1) 丰富的数据流(dataflow,NOT Streaming!)编程接口; (2) 扩展性良好的“Input-Processor-Output”运行模型; (3) 简化数据部署(充分利用了YARN框架,Tez本身仅是一个客户端编 程库,无需事先部署相关服务) (4) 性能优于MapReduce (5) 优化的资源管理(直接运行在资源管理系统YARN之上) (6) 动态生成物理数据流(dataflow) Tez和MapReduce的区别,如下图所示: 一、源代码安装 1.1 依赖软件包 本文的操作系统环境是Oracle Linux 7.4,需要安装以下依赖包:

[root@hdp01 ~]# yum -y install git bzip2 redhat-lsb

1.2 安装protobuf软件

[root@hdp01 src]# wget https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.tar.gz
[root@hdp01 software]# tar -xzf /u02/software/src/protobuf-all-3.5.1.tar.gz
[root@hdp01 software]# cd /u02/protobuf-3.5.1;./configure;make;make install
--编译安装完成后,执行protoc命令出现以下结果则安装成功:
[root@hdp01 src]# protoc --version
libprotoc 3.5.1 

1.3 编译安装tez

[hadoop@hdp01 src]$ wget http://mirrors.hust.edu.cn/apache/tez/0.9.0/apache-tez-0.9.0-src.tar.gz
[hadoop@hdp01 software]$ tar -xzf /u02/software/src/apache-tez-0.9.0-src.tar.gz
[hadoop@hdp01 software]$ cd apache-tez-0.9.0-src
--若protoc不是2.5.0版本,则必须编辑源代码文件夹里的pom.xml文件,修改protoc为系统当前使用的版本。
[hadoop@hdp01 apache-tez-0.9.0-src]$ mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

此过程比较漫长,编译成功后,如下图所示: 1.4 配置tez 编译后的tez-dist/target/tez-0.9.0.tar.gz就是需要的二进制软件包。 1.4.1 上传二进制软件包

[hadoop@hdp01 ~]$ hdfs dfs -mkdir /user/tez
[hadoop@hdp01 ~]$ hdfs dfs -put /u02/software/apache-tez-0.9.0-src/tez-dist/target/tez-0.9.0.tar.gz /user/tez

1.4.2 解压缩文件

[hadoop@hdp01 u01]$ tar -xzf /u02/software/apache-tez-0.9.0-src/tez-dist/target/tez-0.9.0.tar.gz
[hadoop@hdp01 u01]$ mv tez-0.9.0 tez

1.4.3 创建tez-site.xml文件 在hadoop主节点的$HADOOP_HOME/etc/hadoop/目录下创建tez-site.xml文件(只在主节点创建即可),内容如下:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/user/tez/tez-0.9.0.tar.gz</value>
    </property>
    <property>  
        <name>tez.container.max.java.heap.fraction</name>
        <value>0.3</value>  
    </property>  
</configuration>

1.4.4 编辑mapred-site.xml 将mapreduce.framework.name的值从yarn改为yarn-tez即可。 1.4.5 修改hadoop-env.sh 追加以下内容:

export TEZ_CONF_DIR=/u01/hadoop/etc/hadoop/tez-site.xml  
export TEZ_JARS=/u01/tez  
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

1.4.6 同步文件 这里需要将tez-site.xml、mapred-site.xml、hadoop-env.sh以及/u01/tez目录同步到集群其他节点,如下:

[hadoop@hdp01 hadoop]$ for i in {2..4};do scp hadoop-env.sh hdp0$i:/u01/hadoop/etc/hadoop/;done
[hadoop@hdp01 hadoop]$ for i in {2..4};do scp mapred-site.xml hdp0$i:/u01/hadoop/etc/hadoop/;done
[hadoop@hdp01 hadoop]$ for i in {2..4};do scp tez-site.xml hdp0$i:/u01/hadoop/etc/hadoop/;done
[hadoop@hdp01 hadoop]$ for i in {2..4};do scp -r /u01/tez hdp0$i:/u01;done

1.4.7 重启hadoop集群

[hadoop@hdp01 hadoop]$ stop-yarn.sh;stop-dfs.sh
[hadoop@hdp01 hadoop]$ start-dfs.sh;start-yarn.sh

到此,整个tez安装已完成。 二、测试验证 2.1 准备测试文件

[hadoop@hdp01 ~]$ echo "Hello World Hello Tez" > file01  
[hadoop@hdp01 ~]$ echo "Hello World Goodbye Tez" > file02
[hadoop@hdp01 ~]$ hdfs dfs -mkdir /user/tez/input
[hadoop@hdp01 ~]$ hdfs dfs -mkdir /user/tez/output
[hadoop@hdp01 ~]$ hdfs dfs -put file0*  /user/tez/input

2.2 使用以下命令验证

[hadoop@hdp01 ~]$ cd /u01/tez
[hadoop@hdp01 tez]$ [hadoop@hdp01 tez]$ hadoop jar tez-examples-0.9.0.jar orderedwordcount /user/tez/input /user/tez/output 
17/12/26 11:49:47 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.7.4, majorVersion=2, minorVersion=7
17/12/26 11:49:47 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim27, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.7.4, majorVersion=2, minorVersion=7
17/12/26 11:49:47 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.9.0, revision=0873a0118a895ca84cbdd221d8ef56fedc4b43d0, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2017-07-18T05:41:23Z ]
17/12/26 11:49:48 INFO examples.OrderedWordCount: Running OrderedWordCount
17/12/26 11:49:48 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
17/12/26 11:49:48 INFO client.TezClient: Submitting DAG application with id: application_1513929521869_0023
17/12/26 11:49:48 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://hdp01:9000/user/tez/tez.tar.gz
17/12/26 11:49:48 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
17/12/26 11:49:48 INFO client.TezClient: Tez system stage directory hdfs://hdp01:9000/tmp/hadoop/tez/staging/.tez/application_1513929521869_0023 doesn't exist and is created
17/12/26 11:49:49 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1513929521869_0023, dagName=OrderedWordCount, callerContext={ context=TezExamples, callerType=null, callerId=null }
17/12/26 11:49:49 INFO impl.YarnClientImpl: Submitted application application_1513929521869_0023
17/12/26 11:49:49 INFO client.TezClient: The url to track the Tez AM: http://hdp04:8088/proxy/application_1513929521869_0023/
17/12/26 11:49:53 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running
17/12/26 11:49:53 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:53 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:53 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:53 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 33.33% TotalTasks: 3 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress: 100% TotalTasks: 3 Succeeded: 3 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED

执行成功后,查看output下面的文件,如下:

[hadoop@hdp02 ~]$ hdfs dfs -ls /user/tez/output
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2017-12-26 11:49 /user/tez/output/_SUCCESS
-rw-r--r--   3 hadoop supergroup         32 2017-12-26 11:49 /user/tez/output/part-v002-o000-r-00000
[hadoop@hdp02 ~]$ hdfs dfs -text /user/tez/output/part-v002-o000-r-00000
Goodbye 1
Tez     2
World   2
Hello   3

三、Hive操作验证 在hive控制台指定execution engine为tez即可,默认是mr(mapreduce)。

hive> set hive.execution.engine=tez;
hive> use hivedb;
hive> select count(*) from xj_student;

如果修改默认值为tez,需要编辑hive-site.xml文件,修改execution engine为tez,重启hive服务即可。 参考文献: 1、安装Tez 0.9.0 2、Install/Deploy Instructions for Tez