一、tez简介

tez官网:http://tez.apache.org

在使用tez作为计算引擎使用之前先说明下tez-ui。tez-ui是查看tez任务执行日志的的web界面,依赖于yarn的timeline服务。tez0.8.3中又增加了tez-ui2。

timeline服务是apache hadoop2.6.0之后加入作为yarn的一个子服务。jobhistoryserver只能储存Mapreduce的历史日志,但是不支持诸如tez、spark等其他计算引擎历史日志的访问,所以在2.6.0中增加了timeline服务。timelineserver同时支持mapreduce、tez、spark on yarn等计算引擎任务在非本地模式的历史日志访问,当然jobhistoryserver还是可以同时使用的。

建议使用apache hadoop2.6.4+和apche hadoop2.7.2+,低版本较多的timeline服务bug。

详细的版本改进和.BUG修复可以参照http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-yarn/CHANGES.txt。

 


二、编译过程

1、安装jdk1.7,maven3.3.*,protobuf2.5.0

2、通过https://mirrors.tuna.tsinghua.edu.cn/apache/tez/,下载源码(由于tez-ui是0.6.*版本后支持,所以建议使用0.7.*版本或者0.8.*,0.8.4之后的版本可以直接下载bin包)。

解压至如下目录:${project_home}/apache-tez-0.8.3-src

3、修改pom.xml中参数

指定hadoop版本

<hadoop.version>2.6.4</hadoop.version>

protobuf安装之后protoc命令的位置

<protoc.path>/usr/local/protobuf-2.5.0/bin/protoc</protoc.path>

4、改完配置文件就能在src目录下执行编译命令了(当然你也可以在Eclipse或者IntelliJ IDEA中进行编译):

mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

然后就是刷屏刷屏...,最后一堆SUCCESS。(当然也可能是Failed)

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] tez ............................................... SUCCESS [1.626s]
[INFO] hadoop-shim ....................................... SUCCESS [1.720s]
[INFO] tez-api ........................................... SUCCESS [5.598s]
[INFO] tez-common ........................................ SUCCESS [0.425s]
[INFO] tez-runtime-internals ............................. SUCCESS [0.612s]
[INFO] tez-runtime-library ............................... SUCCESS [1.688s]
[INFO] tez-mapreduce ..................................... SUCCESS [0.988s]
[INFO] tez-examples ...................................... SUCCESS [0.202s]
[INFO] tez-dag ........................................... SUCCESS [2.407s]
[INFO] tez-tests ......................................... SUCCESS [0.572s]
[INFO] tez-ext-service-tests ............................. SUCCESS [0.487s]
[INFO] tez-ui ............................................ SUCCESS [10.163s]
[INFO] tez-ui2 ........................................... SUCCESS [1:51.654s]
[INFO] tez-plugins ....................................... SUCCESS [0.023s]
[INFO] tez-yarn-timeline-history ......................... SUCCESS [0.383s]
[INFO] tez-yarn-timeline-history-with-acls ............... SUCCESS [0.254s]
[INFO] tez-history-parser ................................ SUCCESS [7.432s]
[INFO] tez-tools ......................................... SUCCESS [0.022s]
[INFO] tez-perf-analyzer ................................. SUCCESS [0.022s]
[INFO] tez-job-analyzer .................................. SUCCESS [0.272s]
[INFO] tez-javadoc-tools ................................. SUCCESS [0.095s]
[INFO] hadoop-shim-impls ................................. SUCCESS [0.021s]
[INFO] hadoop-shim-2.6 ................................... SUCCESS [0.118s]
[INFO] tez-dist .......................................... SUCCESS [9.088s]
[INFO] Tez ............................................... SUCCESS [0.052s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:36.974s
[INFO] Finished at: Sun Apr 24 19:10:56 CST 2016
[INFO] Final Memory: 94M/1298M
[INFO] ------------------------------------------------------------------------

Process finished with exit code 0

生成安装包

${project_home}/apache-tez-0.8.3-src/tez-dist/target/tez-0.8.3-minimal.tar.gz 
${project_home}/apache-tez-0.8.3-src/tez-dist/target/tez-0.8.3.tar.gz

 

phantomjs没有安装所以导致编译源码失败。安装并加入环境变量PATH中。

其他tez-ui编译问题参考官方文档https://cwiki.apache.org//confluence/display/TEZ/Build+errors+and+solutions

 



三、引擎使用



1、配置修改



1.1、tez-site.xml

$HADOOP_HOME/etc/hadoop目录下增加tez-site.xml文件,增加内容如下(还有一堆性能参数,自己根据实际环境添加吧):

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
   <name>tez.lib.uris</name>
   <value>hdfs://beh/engine/tez/tez.tar.gz</value>
   <!--<value>file:///opt/beh/core/hadoop/lib/tez.tar.gz</value>-->
</property>

<property>
   <name>tez.history.logging.service.class</name>
   <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>

<property>
  <description>Publish configuration information to Timeline server.</description>
  <name>tez.runtime.convert.user-payload.to.history-text</name>
  <value>true</value>
</property>

<property>
  <description>URL for where the Tez UI is hosted</description>
  <name>tez.tez-ui.history-url.base</name>
  <value>http://hadoop001:8280/tez-ui/</value>
</property>

<property> 
<name>tez.allow.disabled.timeline-domains</name> 
<value>true</value> 
</property> 

</configuration>

备注:

#这个参数指定的是编译完成的tez包,建议将tar包直接传至hdfs,最好不要使用本地存储tar包。这里可以直接使用mini包,也可以使用完整包。
<property>
   <name>tez.lib.uris</name>
   <value>hdfs://beh/engine/tez/tez.tar.gz</value>
   <!--<value>file:///opt/beh/core/hadoop/lib/tez.tar.gz</value>-->
</property>
#这个参数是使用tez-ui的web服务相关地址,可以使用主机名或者ip地址,端口自选。由于tez-ui是个web app依赖于web服务器,我这里选的tomcat服务器,怎么使用后面讲。
<property>
  <description>URL for where the Tez UI is hosted</description>
  <name>tez.tez-ui.history-url.base</name>
  <value>http://hadoop001:8280/tez-ui/</value>
</property>

 



1.2、hadoop-env.sh

$HADOOP_HOME/etc/hadoop/hadoop-env.sh中添加tez的环境变量:

 

##tez
export BEH_HOME=/opt/beh
export TEZ_HOME=${BEH_HOME}/core/tez
export TEZ_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${TEZ_CONF_DIR}:${TEZ_HOME}/*:${TEZ_HOME}/lib/*

TEZ_HOME是你解压tez安装包的位置。

 



1.3、mapred-site.xml

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn-tez</value>
</property>

Optional: If running existing MapReduce jobs on Tez. Modify mapred-site.xml to change “mapreduce.framework.name” property from its default value of “yarn” to “yarn-tez”

 



1.4、yarn-site.xml

在$HADOOP_HOME/etc/hadoop/yarn-site.xml中设置timeline服务。

相关设置参考yarn官网和tez官网设置。

 



1.5.hive-site.xml

$HIVE_HOME/conf/hive-site.xml修改并添加如下设置:

<!--tez Start-->
<property>
   <name>hive.execution.engine</name>
   <value>tez</value>
</property>

<property>
   <name>hive.tez.container.size</name>
   <value>4096</value>
</property>

<property>
   <name>hive.tez.java.opts</name>
   <value>-server -Xmx4096m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC</value>
</property>

<property>
   <name>hive.server2.tez.initialize.default.sessions</name>
   <value>false</value>
</property>

<property>
   <name>hive.server2.tez.default.queues</name>
   <value>default</value>
</property>


<property>
   <name>hive.tez.input.format</name>
   <value>org.apache.hadoop.hive.ql.io.HiveInputFormat</value>
</property>

<property>
   <name>hive.server2.tez.sessions.per.default.queue</name>
   <value>1</value>
</property>
<!--tez End-->

 



2、tez-ui服务设置

tez安装包解压后产生tez-ui-0.8.3.war(当然你可能编译的其他版本),在这个war包下的sripts目录下的configs.js中修改resourcemanager服务地址端口和timeline服务地址端口。

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

App.setConfigs({

  /* Environment configurations */
  envDefaults: {
    version: "0.8.3",
    /*
     * By default TEZ UI looks for timeline server at http://localhost:8188, uncomment and change
     * the following value for pointing to a different domain.
     */
    // timelineBaseUrl: 'http://localhost:8188',
       timelineBaseUrl: 'http://hadoop001:8188',
    /*
     * By default RM web interface is expected to be at http://localhost:8088, uncomment and change
     * the following value to point to a different domain.
     */
    // RMWebUrl: 'http://localhost:8088',
       RMWebUrl: 'http://hadoop001:23188',
    /*
     * Ensures that some of the UI features work with old versions of Tez
     */
    compatibilityMode: false,

    /*
     * Default time zone for UI display. Set to undefined for local timezone
     * For configuration see http://momentjs.com/timezone/docs/
     */
    //timezone: "UTC",
  },

  /*
   * Visibility of table columns can be controlled using the column selector. Also an optional set of
   * file system counters can be enabled as columns for most of the tables. For adding more counters
   * as columns edit the following 'tables' object. Counters must be added as configuration objects
   * of the following format.
   *    {
   *      counterName: '<Counter ID>',
   *      counterGroupName: '<Group ID>',
   *    }
   *
   * Note: Till 0.6.0 the properties were counterId and groupId, their use is deprecated now.
   */
  tables: {
    /*
     * Entity specific columns must be added into the respective array.
     */
    entity: {
      dag: [
        // { // Following is a sample configuration object.
        //   counterName: 'FILE_BYTES_READ',
        //   counterGroupName: 'org.apache.tez.common.counters.FileSystemCounter',
        // }
      ],
      vertex: [],
      task: [],
      taskAttempt: [],
      tezApp: [],
    },
    /*
     * User sharedColumns to add counters that must be displayed in all tables.
     */
    sharedColumns:[]
  }

});

 

然后将war复制到tomcat安装目录的webapps下。然后就可以启动tomcat并且登录tez-ui网址了。

本文登陆地址是:

http://hadoop001:8280/tez-ui/