一、Airflow安装部署
### --- 安装依赖
~~~ CentOS 7.X
~~~ Python 3.5或以上版本(推荐)
~~~ MySQL 5.7.x
~~~ Apache-Airflow 1.10.11
~~~ 虚拟机可上网,需在线安装包
~~~ # 备注:后面要安装的三个软件Airflow、Atlas、Griffin,相对Hadoop的安装都较为复杂
~~~ 正式安装软件之前给虚拟机做一个快照
~~~ 按照讲义中指定的软件安装
~~~ 按照讲义的步骤执行对应的命令,命令的遗漏会对后面的安装造成影响Airflow
~~~ 视频录制的时候选择的是最新版本,现在Airflow版本升级了,讲义中的安装步骤适合Airflow 1.10.11。
### --- 请注意以下两条命令:
~~~ # 下载的时候指定 airflow 的版本
pip install apache-airflow==1.10.11 -i https://pypi.douban.com/simple
~~~ # 下载的时候指定 mysqlclient 的版本
pip install mysqlclient==1.4.6
二、依赖环境:Python
### --- 查看linux系统自带的Python版本
~~~ 安装新版本后会自动覆盖该版本
[root@hadoop02 ~]# python -V
Python 2.7.5
### --- 安装mysql环境
~~~ 备注:提前下载 Python-3.6.6.tgz
~~~ 备注:使用Hadoop02安装
~~~ # 卸载 mariadb
[root@hadoop02 ~]# rpm -qa | grep mariadb
mariadb-libs-5.5.65-1.el7.x86_64
mariadb-5.5.65-1.el7.x86_64
mariadb-devel-5.5.65-1.el7.x86_64
[root@hadoop02 ~]# yum remove mariadb
[root@hadoop02 ~]# yum remove mariadb-libs
### --- 安装Python依赖
~~~ # 安装依赖
[root@hadoop02 ~]# rpm -ivh mysql57-community-release-el7-11.noarch.rpm
[root@hadoop02 ~]# yum install readline readline-devel -y
[root@hadoop02 ~]# yum install gcc -y
[root@hadoop02 ~]# yum install zlib* -y
[root@hadoop02 ~]# yum install openssl openssl-devel -y
[root@hadoop02 ~]# yum install sqlite-devel -y
[root@hadoop02 ~]# yum install python-devel mysql-devel -y
### --- 安装Python3
~~~ # 提前到python官网下载好包
[root@hadoop02 ~]# ll /opt/yanqi/software/Python-3.6.6.tgz
-rw-r--r-- 1 root root 22930752 Aug 14 2020 /opt/yanqi/software/Python-3.6.6.tgz
~~~ # 解压Python版本包
[root@hadoop02 ~]# cd /opt/yanqi/software/
[root@hadoop02 software]# tar -zxvf Python-3.6.6.tgz
~~~ # 安装 python3 运行环境
[root@hadoop02 ~]# cd /opt/yanqi/software/Python-3.6.6/
~~~ # configure文件是一个可执行的脚本文件。如果配置了--prefix,安装后的所有资源文件都会放在目录中
[root@hadoop02 Python-3.6.6]# ./configure --prefix=/opt/yanqi/servers/python3.6
[root@hadoop02 Python-3.6.6]# make && make install
~~~ # 安装Python3虚拟环境
[root@hadoop02 ~]# /opt/yanqi/servers/python3.6/bin/pip3 install virtualenv
~~~ # 启动 python3 环境
[root@hadoop02 ~]# cd /opt/yanqi/servers/python3.6/bin/
[root@hadoop02 bin]# ./virtualenv env
[root@hadoop02 bin]# . env/bin/activate
~~~ # 检查 python3 版本
(env) [root@hadoop02 ~]# python -V
Python 3.6.6
三、在mysql数据库下创建数据库环境
### --- 修改mysql密码等级
~~~ # 更改mysql密码等级
mysql> SHOW VARIABLES LIKE 'validate_password%';
| validate_password_policy | MEDIUM |
mysql> set global validate_password_policy=LOW;
mysql> SHOW VARIABLES LIKE 'validate_password%';
| validate_password_policy | LOW |
### --- 创建数据库用户并授权
~~~ # 创建数据库
mysql> create database airflowhadoop02;
~~~ # 创建用户airflow,设置所有ip均可以访问
mysql> create user 'airflow'@'%' identified by '12345678';
mysql> create user 'airflow'@'localhost' identified by '12345678';
~~~ # 用户授权,为新建的airflow用户授予Airflow库的所有权限
mysql> grant all on airflowhadoop02.* to 'airflow'@'%';
mysql> SET GLOBAL explicit_defaults_for_timestamp = 1;
mysql> flush privileges;
四、安装Airflow
### --- 下载Airflow源版本包
~~~ # 设置目录(配置文件)
~~~ 添加到配置文件/etc/profile。未设置是缺省值为 ~/airflow
(env) [root@hadoop02 ~]# vim /etc/profile
##AIRFLOW_HOME
export AIRFLOW_HOME=/opt/yanqi/servers/airflow
~~~ # 使环境变量生效
(env) [root@hadoop02 ~]# source /etc/profile
~~~ # 使用豆瓣源非常快。-i: 指定库的安装源(可选选项)
~~~ pip install apache-airflow==1.10.11这是从国外下载源的命令
~~~ 加入国内源地址就会下载国内源地址,
~~~ 此环境我们使用豆瓣的源地址。
(env) [root@hadoop02 ~]# pip install apache-airflow==1.10.11 -i https://pypi.douban.com/simple
~~~ # 备注:
~~~ apache-airflow==1.10.11,需要指定安装的版本,重要!!!
~~~ 软件安装路径在$AIRFLOW_HOME(缺省为~/airflow),此时目录不存在
~~~ 安装的是版本是1.10.11,不指定下载源时下载过程非常慢
### --- 安装Airflow在Python3下的依赖环境
~~~ # 解决报错问题
ModuleNotFoundError: No module named 'sqlalchemy.ext.declarative.clsregistry'
(env) [root@hadoop02 ~]# pip install SQLAlchemy==1.3.23
~~~ # 安装依赖环境
(env) [root@hadoop02 ~]# pip install mysqlclient==1.4.6
### --- 安装Airflow
~~~ # python3 环境中执行
(env) [root@hadoop02 ~]# airflow initdb
~~~ # 备注:
~~~ mysqlclient==1.4.6,需要指定安装的版本,重要!!!
~~~ 有可能在安装完Airflow找不到 $AIRFLOW_HOME/airflow.cfg 文件,
~~~ 执行完airflow initdb才会在对应的位置找到该文件。
五、修改AirflowDB配置
### --- 修改Airflow DB配置
~~~ # 修改 $AIRFLOW_HOME/airflow.cfg:
(env) [root@hadoop02 ~]# vim /opt/yanqi/servers/airflow/airflow.cfg
sql_alchemy_conn = mysql://airflow:12345678@hadoop05:3306/airflowhadoop02 # 约 75 行
~~~ # 重新执行
(env) [root@hadoop02 ~]# airflow initdb
### --- 报错解决方案
~~~ # 报错现象:
Exception: Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
~~~ # 解决方法:
mysql> SET GLOBAL explicit_defaults_for_timestamp = 1;
mysql> FLUSH PRIVILEGES;
六、安装密码模块
### --- 安装password组件:
~~~ 版本包比较小,可以直接使用国外的源
~~~ 若是版本包很大,可以指定一下豆瓣的源地址
(env) [root@hadoop02 ~]# pip install apache-airflow[password]
### --- 修改 airflow.cfg 配置文件(第一行修改,第二行增加):
(env) [root@hadoop02 ~]# vim /opt/yanqi/servers/airflow/airflow.cfg
[webserver] // 约 281 行
authenticate = True // 约 353行
auth_backend = airflow.contrib.auth.backends.password_auth
### --- 添加密码文件:python命令,执行一遍;添加用户登录,设置口令
~~~ # 进入python环境下
(env) [root@hadoop02 ~]# python
Python 3.6.6 (default, Oct 10 2021, 17:38:16)
~~~ 执行以下命令
import airflow
from airflow import models, settings
from airflow.contrib.auth.backends.password_auth import PasswordUser
user = PasswordUser(models.User())
user.username = 'airflow'
user.email = 'yanqi_vip@yeah.net'
user.password = 'airflow123'
session = settings.Session()
session.add(user)
session.commit()
session.close()
exit()
七、启动airflow服务
### --- 进入Python3的环境
~~~ # 退出虚拟环境命令
(env) [root@hadoop02 ~]# deactivate
~~~ # 备注:要先进入python3的运行环境
[root@hadoop02 ~]# cd /opt/yanqi/servers/python3.6/bin/
[root@hadoop02 ~]# ./virtualenv env
[root@hadoop02 ~]# . env/bin/activate
### --- 启动airflow
~~~ # 启动scheduler调度器:
(env) [root@hadoop02 ~]# airflow scheduler -D
~~~输出参数
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
~~~ # 启动之后相关的日志文件已经产生
(env) [root@hadoop02 ~]# ll /opt/yanqi/servers/airflow/
-rw-r--r-- 1 root root 37765 Oct 10 17:59 airflow.cfg
-rw-r--r-- 1 root root 83968 Oct 10 17:50 airflow.db
-rw-r--r-- 1 root root 0 Oct 10 18:07 airflow-scheduler.err
-rw-r--r-- 1 root root 613 Oct 10 18:07 airflow-scheduler.log
-rw-r--r-- 1 root root 1022 Oct 10 18:07 airflow-scheduler.out
-rw-r--r-- 1 root root 6 Oct 10 18:07 airflow-scheduler.pid
drwxr-xr-x 5 root root 69 Oct 10 18:07 logs
-rw-r--r-- 1 root root 2554 Oct 10 17:50 unittests.cfg
~~~ # 服务页面启动:
(env) [root@hadoop02 ~]# airflow webserver -D
~~~ # 会产生2个pid文件
(env) [root@hadoop02 ~]# ll /opt/yanqi/servers/airflow/
-rw-r--r-- 1 root root 0 Oct 10 18:09 airflow-webserver.err
-rw-r--r-- 1 root root 0 Oct 10 18:09 airflow-webserver.log
-rw-r--r-- 1 root root 6 Oct 10 18:09 airflow-webserver-monitor.pid
-rw-r--r-- 1 root root 0 Oct 10 18:09 airflow-webserver.out
-rw-r--r-- 1 root root 6 Oct 10 18:09 airflow-webserver.pid
~~~ airflow命令所在位置:/opt/yanqi/servers/python3.6/bin/env/bin/airflow
### --- Chrome访问:安装完成,可以使用浏览器登录
~~~ hadoop02:8080;输入用户名、口令:airflow /airflow123
~~~ Airflow后台管理界面
附录一:报错处理一:
### --- 报错现象:
(env) [root@hadoop02 ~]# airflow initdb
Traceback (most recent call last):
File "/opt/yanqi/servers/python3.6/bin/env/bin/airflow", line 26, in <module>
from airflow.bin.cli import CLIFactory
File "/opt/yanqi/servers/python3.6/bin/env/lib/python3.6/site-packages/airflow/bin/cli.py", line 80, in <module>
from airflow.www.app import (cached_app, create_app)
File "/opt/yanqi/servers/python3.6/bin/env/lib/python3.6/site-packages/airflow/www/app.py", line 38, in <module>
from airflow.www.blueprints import routes
File "/opt/yanqi/servers/python3.6/bin/env/lib/python3.6/site-packages/airflow/www/blueprints.py", line 25, in <module>
from airflow.www import utils as wwwutils
File "/opt/yanqi/servers/python3.6/bin/env/lib/python3.6/site-packages/airflow/www/utils.py", line 36, in <module>
import flask_admin.contrib.sqla.filters as sqlafilters
File "/opt/yanqi/servers/python3.6/bin/env/lib/python3.6/site-packages/flask_admin/contrib/sqla/__init__.py", line 2, in <module>
from .view import ModelView
File "/opt/yanqi/servers/python3.6/bin/env/lib/python3.6/site-packages/flask_admin/contrib/sqla/view.py", line 18, in <module>
from flask_admin.contrib.sqla.tools import is_relationship
File "/opt/yanqi/servers/python3.6/bin/env/lib/python3.6/site-packages/flask_admin/contrib/sqla/tools.py", line 4, in <module>
from sqlalchemy.ext.declarative.clsregistry import _class_resolver
ModuleNotFoundError: No module named 'sqlalchemy.ext.declarative.clsregistry'
### --- 报错分析
~~~ 这是由于 SQLAlchemy 模块版本低导致的错误。
~~~ 执行以下命令后,重新执行 airflow initdb 命令。
### --- 解决方案
(env) [root@hadoop02 ~]# pip install SQLAlchemy==1.3.23
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor