###先浏览玩再确定是否搭建###

Airflow

Python3的安装


#下载Python3包
wget https://www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz
tar -zxvf Python-3.6.5.tgz
cd Python-3.6.5
 
#安装依赖
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
yum install gcc -y
yum install epel-release
# 缺少压缩包
yum -y install zlib*


#编译
./configure --prefix=/usr/local/python3
make && make install
 
#添加环境变量
vim /etc/profile

PATH=$PATH:/opt/Python-3.6.5/bin
source /etc/profile
 
#保持pip3到最新版本
pip3 install --upgrade pip


安装airflow

切换到root用户执行:


pip3 install apache-airflow


结果出现Successfully installed……为成功


#验证
airflow version
#结果如下为成功配置
[root@master bin]#airflow version
DEPRECATION:Python 2.7 will reach the end of its life on January 1st,2020.Airflow 1.10 will be the last release series t
support Python 2
1.10.15


WEB UI界面

修改配置(如果是第一次运行可忽略)

#启动Web服务器 airflow webserver


Airflow Demo

创建第一个AirFlowDAG

创建一个Hello World工作流程,其中除了向日志发送"Hello world"

创建dags_folder,这就是DAG定义文件存储目录---$AIRFLOW_HOME/dags。在该目录中创建一个名为hello_world.py的文件

添加代码如下:

# -*- coding: utf-8 -*-
 
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from datetime import timedelta
 
#-------------------------------------------------------------------------------
# these args will get passed on to each operator
# you can override them on a per-task basis during operator initialization
 
default_args = {
    'owner': 'jifeng.si',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email': ['1203745031@qq.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}
 
#-------------------------------------------------------------------------------
# dag
 
dag = DAG(
    'example_hello_world_dag',
    default_args=default_args,
    description='my first DAG',
    schedule_interval=timedelta(days=1))
 
#-------------------------------------------------------------------------------
# first operator
 
date_operator = BashOperator(
    task_id='date_task',
    bash_command='date',
    dag=dag)
 
#-------------------------------------------------------------------------------
# second operator
 
sleep_operator = BashOperator(
    task_id='sleep_task',
    depends_on_past=False,
    bash_command='sleep 5',
    dag=dag)
 
#-------------------------------------------------------------------------------
# third operator
 
def print_hello():
    return 'Hello world!'
 
hello_operator = PythonOperator(
    task_id='hello_task',
    python_callable=print_hello,
    dag=dag)
 
#-------------------------------------------------------------------------------
# dependencies
 
sleep_operator.set_upstream(date_operator)
hello_operator.set_upstream(date_operator)

#运行demo python ~/airflow dags/dags/hello_word.py


没有出现异常,代表环境健全

测试DAG中的Task,使用如下命令查看example_hello_world_dagDAG下有什么Task:
 
airflow list_tasks example_hello_world_dag
 
#结果如下
date_task
hello_task
sleep_task
 
分别测试
 
#测试date_task
airflow test example_hello_world_dag date_task 20170803
#测试hello_task
airflow test example_hello_world_dag hello_task 20170803


如果没有问题,就可以运行DAG了

运行DAG

打开另一个终端,并通过如下命令来启动Airflow调度程序:


airflow scheduler


此时打开WEB UI界面,可以发现example_hello_world_dag文件

为了启动DAG Run,首先打开工作流(off键),然后单击Trigger Dag按钮(Links 第一个按钮),最后单击Graph View按钮(Links 第三个按钮)以查看运行进度:

重新加载图形视图,直到两个任务达到状态成功。完成后,单击hello_task,然后单击View Log查看日志。如果一切都按预期工作,日志应该显示一些行,其中之一是这样的:


[2021-12-17 09:46:43,236] {base_task_runner.py:95} INFO - Subtask: [2017-08-03 09:46:43,235] {python_operator.py:81} INFO - Done. Returned value was: Hello world!
 
[2021-12-17 09:46:47,378] {jobs.py:2083} INFO - Task exited with return code 0


以上demo运行完成

Airflow与MySQL交互

停止所有airflow程序

把airflow全部关闭,需要airflow webserver查看,会出现一个already running的端口,关闭它

在MySQL中创建用户


进入MySQL数据库


mysql> create database airflow;
 
mysql> create user 'airflow'@'%' identified by '';
Query OK, 0 rows affected (0.00 sec)
 
mysql> create user 'airflow'@'localhost' identified by '';
Query OK, 0 rows affected (0.00 sec)
 
mysql> grant all on airflow.* to 'airflow'@'%';
Query OK, 0 rows affected (0.01 sec)
 
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)


更改Airflow相关配置


[root@node-01 airflow]# vim airflow.cfg
 
web_server_port = 8082
 
sql_alchemy_conn = mysql://account:pwd@localhost:3306/airflow
mysql://帐号:密码@ip:port/db
 
default_timezone = Asia/Shanghai
 
load_examples = False
 
min_file_process_interval = 10


初始化Airflow数据库


airflow initdb


查看mysql中的airflow数据库

airflow数据库中已经出现相关数据表

Airflow调度shell脚本

shell脚本编写


#!/bin/bash/
time=$(date)
echo $time


要求:每5s执行一次shell命令并将结果保存在answer.txt中


1.编辑DAG文件   2.指定每5秒更新一次     3.将结果存储


编写test.py,内容如下

WEB UI中显示

运行的时候刷新answer.txt查看结果,DAG正常运行,每5s写入当前时间