主要是以jmx_exporter、prometheus为主导进行对hadoop的metrics进行收集,通过grafana进行展示、预警。

1、安装jmx_exporter以及配置文件

1、通过

阿里云镜像里下载:https://maven.aliyun.com/mvn/search

搜索下载 jmx_prometheus_javaagent

2、创建配置文件:xxxx.yml(根据用途不同可以叫不同的名字,在启动时指定配置)

startDelaySeconds: 0
hostPort: localhost:1234  #1234为想设置的jmx端口(可设置为未被占用的端口)
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
startDelaySeconds: 0
hostPort: localhost:1235  #1235为想设置的jmx端口(可设置为未被占用的端口)
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false

3、将以上3个文件放到 /usr/local/prometheus_jmx_export_0.3.1

 并执行 chown -R hadoop:root /usr/local/prometheus_jmx_export_0.3.1 

4、修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh (提示:端口1234(1235)要与之前设置的jmx端口保持一致)

export HADOOP_NAMENODE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1234 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9222:/usr/local/prometheus_jmx_export_0.3.1/namenode.yaml"
 export HADOOP_DATANODE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1235 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9322:/usr/local/prometheus_jmx_export_0.3.1/datanode.yaml"

5、修改$HADOOP_HOME/bin/hdfs 修改 namenode、datanode启动参数如下

if [ "$COMMAND" = "namenode" ] ; then
   CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
   HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_JMX_OPTS $HADOOP_NAMENODE_OPTS"
 .......
 elif [ "$COMMAND" = "datanode" ] ; then
   CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
   HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_JMX_OPTS"
   if [ "$starting_secure_dn" = "true" ]; then
     HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
   else
     HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
   fi

6、重启 hadoop dfs集群,namenode机器访问 http://xxx:9222/metrics   datanode机器访问 http://xxx:9322/metrics 即可获得metrics信息

2、安装Prometheus以及配置文件

1、https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz 下载 prometheus linux版本到 /usr/local/ 下,

解压 并执行  chown -R hadoop:root prometheus-2.3.2.linux-amd64.tar.gz

2、修改配置文件 prometheus.yml(注意:以下代码只是在测试上执行的,对多少台机器进行监控就需要配置多少个job,配置文件注意缩进)

- job_name: hadoop-namenode
  static_configs:
  - targets: ['binamenode01:9222']
- job_name: hadoop-datanode
  static_configs:
  - targets: ['bidatanode01:9322']

3、用户hadoop 启动 prometheus 

cd /usr/local/prometheus-2.3.2.linux-amd64
./startPromethous.sh

4、http://master:9090/targets 查看是否添加成功(prometheus 执行默认端口9090)

通过点击http://bidatanode01:9222/metrics可以看到metrics数据

3、安装grafana以及配置文件

1、下载grafana,解压

cd /usr/local
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2.linux-amd64.tar.gz 
tar -zxvf grafana-5.2.2.linux-amd64.tar.gz 
chown -R hadoop:root grafana-5.2.2.linux-amd64

2、用户 hadoop 启动grafana

cd /usr/local/grafana-5.2.2/bin/
nohup ./grafana-server start &

3、启动后,即可通过http://master:3000/ 来访问了(默认账号密码是admin/admin,grafana默认端口3000)

4、关联Grafana和Prometheus

点击Data Sources 

点击Add data source,填写数据保存

4、配置grafana预警邮件发送

1、检查mailx是否安装 

rpm -qa | grep mailx

如果检查没有安装 则需要用一下命令安装

yum -y install mailx

2、编辑 /usr/local/grafana-5.2.2/conf/defaults.ini

...
  
 #################################### SMTP / Emailing #####################
 [smtp]
 enabled = true
 host = smtp.xx.com:587
 user = sys_sender@xx.com
 # 如果密码中包含#或者; 密码需要用三个双引号包围  例如:"""QWER123;4!@#$"""
 password = xxxxxxx #此为邮箱密码
 cert_file =
 key_file =
 skip_verify = true
 from_address = sys_sender@xx.com
 from_name = sys_sender
 ehlo_identity =
 [emails]
 welcome_email_on_sign_up = false
 templates_pattern = emails/*.html
  
 ...
  
 #################################### Alerting ############################
 [alerting]
 # Disable alerting engine & UI features
 enabled = true
 # Makes it possible to turn off alert rule execution but alerting UI is visible
 execute_alerts = true

3、测试 grafana 邮件发送

编辑发送邮件,点击测试 OK

=======================================================================================

2018-08-27追加:

对于yarn的接入也是大同小异 

对于${HADOOP_HOME}/etc/hadoop/yarn-env.sh 添加 metrics 开启信息并制定端口

export YARN_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1236 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9422:/usr/local/prometheus_jmx_export_0.3.1/yarn.yaml"
然后修改${HADOOP_HOME}/bin/yarn
elif [ "$COMMAND" = "resourcemanager" ] ; then
   CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/rm-config/log4j.properties
   CLASS='org.apache.hadoop.yarn.server.resourcemanager.ResourceManager'
   YARN_OPTS="$YARN_OPTS $YARN_JMX_OPTS $YARN_RESOURCEMANAGER_OPTS"
   if [ "$YARN_RESOURCEMANAGER_HEAPSIZE" != "" ]; then
     JAVA_HEAP_MAX="-Xmx""$YARN_RESOURCEMANAGER_HEAPSIZE""m"
   fi
 ......
 elif [ "$COMMAND" = "nodemanager" ] ; then
   CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/nm-config/log4j.properties
   CLASS='org.apache.hadoop.yarn.server.nodemanager.NodeManager'
   YARN_OPTS="$YARN_OPTS $YARN_JMX_OPTS -server $YARN_NODEMANAGER_OPTS"
   if [ "$YARN_NODEMANAGER_HEAPSIZE" != "" ]; then
     JAVA_HEAP_MAX="-Xmx""$YARN_NODEMANAGER_HEAPSIZE""m"
   fi


 

重启 yarn

添加 prometheus_jmx_export下的yarn.yaml文件

修改配置文件 prometheus.yml 

- job_name: yarn
  static_configs:
  - targets: ['binamenode01:9422']

重启 prometheus,即可

=======================================================================================

2018-08-29 添加

对于hbase的监控:

修改配置文件 $HBASE_HOME/bin/hbase

在文件

# figure out which class to run

位置添加:

#prometheus jmx export start
HBASE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1237 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9522:/usr/local/prometheus_jmx_export_0.3.1/hbase.yaml"
#prometheus jmx export end
 ......
 elif [ "$COMMAND" = "master" ] ; then
   CLASS='org.apache.hadoop.hbase.master.HMaster'
   if [ "$1" != "stop" ] && [ "$1" != "clear" ] ; then
     HBASE_OPTS="$HBASE_OPTS $HBASE_JMX_OPTS $HBASE_MASTER_OPTS"
   fi
 elif [ "$COMMAND" = "regionserver" ] ; then
   CLASS='org.apache.hadoop.hbase.regionserver.HRegionServer'
   if [ "$1" != "stop" ] ; then
     HBASE_OPTS="$HBASE_OPTS $HBASE_JMX_OPTS $HBASE_REGIONSERVER_OPTS"
   fi

重启 hbase

添加 prometheus_jmx_export下的hbase.yaml文件

修改配置文件 prometheus.yml 

- job_name: hbase
  static_configs:
  - targets: ['binamenode01:9522']

重启 prometheus,即可

=======================================================================================

2018-09-01 添加

kylin 监控添加 

修改 kylin.sh文件,其启动项 添加 配置

-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1239 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9722:/usr/local/prometheus_jmx_export_0.3.1/kylin.yaml \

重启 kylin

添加 prometheus_jmx_export下的kylin.yaml文件

修改配置文件 prometheus.yml 

- job_name: hbase
  static_configs:
  - targets: ['binamenode01:9722']

重启 prometheus,即可

=======================================================================================

2018-09-01 添加

hive 监控添加

修改文件 

${HIVE_HOME}/conf/hive-env.sh 添加如下代码

if [ "$SERVICE" = "hiveserver2" ] ; then
         HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1240 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9822:/usr/local/prometheus_jmx_export_0.3.1/hive_hiveserver2.yaml"
 fi
 if [ "$SERVICE" = "metastore" ] ; then
         HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1241 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9922:/usr/local/prometheus_jmx_export_0.3.1/hive_metastore.yaml"
 fi

添加 prometheus_jmx_export下的hive_metastore.yaml、hive_hiveserver2.yaml文件

 重启 hive的 metastore hiveserver2

修改配置文件 prometheus.yml  

- job_name: hbase
  static_configs:
  - targets: ['binamenode01:9822','binamenode01:9922']

 重启 prometheus,即可