###先浏览玩再确定是否搭建###
Airflow
Python3的安装
#下载Python3包
wget https://www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz
tar -zxvf Python-3.6.5.tgz
cd Python-3.6.5
#安装依赖
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
yum install gcc -y
yum install epel-release
# 缺少压缩包
yum -y install zlib*
#编译
./configure --prefix=/usr/local/python3
make && make install
#添加环境变量
vim /etc/profile
PATH=$PATH:/opt/Python-3.6.5/bin
source /etc/profile
#保持pip3到最新版本
pip3 install --upgrade pip
安装airflow
切换到root用户执行:
pip3 install apache-airflow
结果出现Successfully installed……为成功
#验证
airflow version
#结果如下为成功配置
[root@master bin]#airflow version
DEPRECATION:Python 2.7 will reach the end of its life on January 1st,2020.Airflow 1.10 will be the last release series t
support Python 2
1.10.15
WEB UI界面
修改配置(如果是第一次运行可忽略)
#启动Web服务器 airflow webserver
Airflow Demo
创建第一个AirFlowDAG
创建一个Hello World工作流程,其中除了向日志发送"Hello world"
创建dags_folder
,这就是DAG定义文件存储目录---$AIRFLOW_HOME/dags
。在该目录中创建一个名为hello_world.py的文件
添加代码如下:
# -*- coding: utf-8 -*-
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from datetime import timedelta
#-------------------------------------------------------------------------------
# these args will get passed on to each operator
# you can override them on a per-task basis during operator initialization
default_args = {
'owner': 'jifeng.si',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email': ['1203745031@qq.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
#-------------------------------------------------------------------------------
# dag
dag = DAG(
'example_hello_world_dag',
default_args=default_args,
description='my first DAG',
schedule_interval=timedelta(days=1))
#-------------------------------------------------------------------------------
# first operator
date_operator = BashOperator(
task_id='date_task',
bash_command='date',
dag=dag)
#-------------------------------------------------------------------------------
# second operator
sleep_operator = BashOperator(
task_id='sleep_task',
depends_on_past=False,
bash_command='sleep 5',
dag=dag)
#-------------------------------------------------------------------------------
# third operator
def print_hello():
return 'Hello world!'
hello_operator = PythonOperator(
task_id='hello_task',
python_callable=print_hello,
dag=dag)
#-------------------------------------------------------------------------------
# dependencies
sleep_operator.set_upstream(date_operator)
hello_operator.set_upstream(date_operator)
#运行demo python ~/airflow dags/dags/hello_word.py
没有出现异常,代表环境健全
测试DAG中的Task,使用如下命令查看example_hello_world_dagDAG下有什么Task:
airflow list_tasks example_hello_world_dag
#结果如下
date_task
hello_task
sleep_task
分别测试
#测试date_task
airflow test example_hello_world_dag date_task 20170803
#测试hello_task
airflow test example_hello_world_dag hello_task 20170803
如果没有问题,就可以运行DAG了
运行DAG
打开另一个终端,并通过如下命令来启动Airflow调度程序:
airflow scheduler
此时打开WEB UI界面,可以发现example_hello_world_dag文件
为了启动DAG Run,首先打开工作流(off键),然后单击Trigger Dag
按钮(Links 第一个按钮),最后单击Graph View
按钮(Links 第三个按钮)以查看运行进度:
重新加载图形视图,直到两个任务达到状态成功。完成后,单击hello_task,然后单击View Log
查看日志。如果一切都按预期工作,日志应该显示一些行,其中之一是这样的:
[2021-12-17 09:46:43,236] {base_task_runner.py:95} INFO - Subtask: [2017-08-03 09:46:43,235] {python_operator.py:81} INFO - Done. Returned value was: Hello world!
[2021-12-17 09:46:47,378] {jobs.py:2083} INFO - Task exited with return code 0
以上demo运行完成
Airflow与MySQL交互
停止所有airflow程序
把airflow全部关闭,需要airflow webserver查看,会出现一个already running的端口,关闭它
在MySQL中创建用户
进入MySQL数据库
mysql> create database airflow;
mysql> create user 'airflow'@'%' identified by '';
Query OK, 0 rows affected (0.00 sec)
mysql> create user 'airflow'@'localhost' identified by '';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on airflow.* to 'airflow'@'%';
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)
更改Airflow相关配置
[root@node-01 airflow]# vim airflow.cfg
web_server_port = 8082
sql_alchemy_conn = mysql://account:pwd@localhost:3306/airflow
mysql://帐号:密码@ip:port/db
default_timezone = Asia/Shanghai
load_examples = False
min_file_process_interval = 10
初始化Airflow数据库
airflow initdb
查看mysql中的airflow数据库
airflow数据库中已经出现相关数据表
Airflow调度shell脚本
shell脚本编写
#!/bin/bash/
time=$(date)
echo $time
要求:每5s执行一次shell命令并将结果保存在answer.txt中
1.编辑DAG文件 2.指定每5秒更新一次 3.将结果存储
编写test.py,内容如下
WEB UI中显示
运行的时候刷新answer.txt查看结果,DAG正常运行,每5s写入当前时间