(文章目录)


本文主要介绍了azkaban的基础内容,以及azkaban的三种部署方式与验证、测试。 本文分为四个部分,即简介、solo-server、two-server和multiple-executor三种部署方式与验证。

一、 Azkaban简介

1、Azkaban介绍

Azkaban是由linkedin(领英)公司推出的一个批量工作流任务调度器,用于在一个工作流内以一个特定的顺序运行一组工作和流程。Azkaban使用job配置文件建立任务之间的依赖关系,并提供一个易于使用的web用户界面维护和跟踪你的工作流。 在这里插入图片描述

2、功能特点

  • 提供功能清晰,简单易用的Web UI界面
  • 提供job配置文件快速建立任务和任务之间的依赖关系
  • 提供模块化和可插拔的插件机制,原生支持command、Java、Hive、Pig、Hadoop
  • 基于Java开发,代码结构清晰,易于二次开发

3、服务类别

  • mysql服务器,存储元数据,如项目名称、项目描述、项目权限、任务状态、SLA规则等
  • AzkabanWebServer,对外提供web服务,使用户可以通过web页面管理。职责包括项目管理、权限授权、任务调度、监控executor
  • AzkabanExecutorServer,负责具体的工作流的提交、执行

4、Azkaban三种部署模式

1)、 solo server mode

该模式中webServer和executorServer运行在同一个进程中,进程名是AzkabanSingleServer。使用自带的H2数据库。这种模式包含Azkaban的所有特性,但一般用来学习和测试

2)、two-server mode

该模式使用MySQL数据库, Web Server和Executor Server运行在不同的进程中

3)、multiple-executor mode

该模式使用MySQL数据库, Web Server和Executor Server运行在不同的机器中。且有多个Executor Server。该模式适用于大规模应用

二、solo-server模式部署与验证

该种模式一个节点,Web Server和Executor Server同一进程。

1、解压、配置

mkdir /usr/local/bigdata/azkaban3.51.0/solo
#将编译好的azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz的文件上传至/usr/local/bigdata/azkaban3.51.0/solo
tar -zxvf azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz

# 修改配置 
vim conf/azkaban.properties
default.timezone.id=Asia/Shanghai #修改时区
 
vim  plugins/jobtypes/commonprivate.properties
# 添加:memCheck.enabled=false
# azkaban默认需要3G的内存,剩余内存不足则会报异常

2、启动

cd azkaban-solo-server-0.1.0-SNAPSHOT/
bin/start-solo.sh
# 注:启动/关闭必须进到azkaban-solo-server-0.1.0-SNAPSHOT/目录下。

3、验证

  • 启动成功后,查看进程 AzkabanSingleServer(对于Azkaban solo‐server模式,Exec Server和Web Server在同一个进程中) 在这里插入图片描述
  • 登录web页面 访问http://192.168.10.41:8081/ 默认用户名密码azkaban 在这里插入图片描述 在这里插入图片描述

4、测试

http://192.168.10.41:8081/index 登录=>Create Project=>Upload zip包 =>execute flow执行一步步操作即可。

1)、创建2个文件,即one.job、two.job,其内容分别如下:

#one.job
type=command
command=echo "this is job one"
#two.job
type=command
dependencies=one
command=echo "this is job two"

创建完成后,将one.job、two.job打包成一个文件az-solo-job.zip

2)、创建工程

在这里插入图片描述

3)、上传az-solo-job.zip

在这里插入图片描述 上传成功后,可以看出2个任务的依赖关系 在这里插入图片描述

4)、execute执行

执行页面 在这里插入图片描述 点击后弹出界面,可以设置计划,也可以直接执行 在这里插入图片描述 点击schedule按钮,弹出如下图 在这里插入图片描述 点击执行按钮,弹出 在这里插入图片描述 点击继续按钮

5)、结果查看

在这里插入图片描述 具体的job执行结果可以在details中查看,其他的信息可以在选项卡中看 在这里插入图片描述 one.job执行日志 在这里插入图片描述 two.job执行日志 在这里插入图片描述

三、 two-server模式部署及验证

1、节点

server1(192.168.10.41) web‐server server2(192.168.10.42) exec‐server

2、mysql配置初始化

该步骤主要是拿到初始化sql文件,进行数据库初始化。

# 创建azkaban的解压目录与解压
mkdir /usr/local/bigdata/azkaban
tar -zxvf azkaban-db-0.1.0-SNAPSHOT.tar.gz –C /usr/local/bigdata/azkaban

# 初始化azkaban存储数据
# Mysql上创建对应的库、增加权限、创建表
mysql> CREATE DATABASE azkaban_two_server; #创建数据库
mysql> use azkaban_two_server;
mysql> source /usr/local/bigdata/azkaban/azkaban-db-0.1.0-SNAPSHOT/create-all-sql-0.1.0-SNAPSHOT.sql; 

# 或者直接执行文件的sql,数据库名称:azkaban ,具体如下
#create-all-sql-0.1.0-SNAPSHOT.sql
CREATE TABLE active_executing_flows (
  exec_id     INT,
  update_time BIGINT,
  PRIMARY KEY (exec_id)
);
CREATE TABLE active_sla (
  exec_id    INT          NOT NULL,
  job_name   VARCHAR(128) NOT NULL,
  check_time BIGINT       NOT NULL,
  rule       TINYINT      NOT NULL,
  enc_type   TINYINT,
  options    LONGBLOB     NOT NULL,
  PRIMARY KEY (exec_id, job_name)
);
CREATE TABLE execution_dependencies(
  trigger_instance_id varchar(64),
  dep_name varchar(128),
  starttime bigint(20) not null,
  endtime bigint(20),
  dep_status tinyint not null,
  cancelleation_cause tinyint not null,

  project_id INT not null,
  project_version INT not null,
  flow_id varchar(128) not null,
  flow_version INT not null,
  flow_exec_id INT not null,
  primary key(trigger_instance_id, dep_name)
);

CREATE INDEX ex_end_time
  ON execution_dependencies (endtime);
CREATE TABLE execution_flows (
  exec_id     INT          NOT NULL AUTO_INCREMENT,
  project_id  INT          NOT NULL,
  version     INT          NOT NULL,
  flow_id     VARCHAR(128) NOT NULL,
  status      TINYINT,
  submit_user VARCHAR(64),
  submit_time BIGINT,
  update_time BIGINT,
  start_time  BIGINT,
  end_time    BIGINT,
  enc_type    TINYINT,
  flow_data   LONGBLOB,
  executor_id INT                   DEFAULT NULL,
  PRIMARY KEY (exec_id)
);

CREATE INDEX ex_flows_start_time
  ON execution_flows (start_time);
CREATE INDEX ex_flows_end_time
  ON execution_flows (end_time);
CREATE INDEX ex_flows_time_range
  ON execution_flows (start_time, end_time);
CREATE INDEX ex_flows_flows
  ON execution_flows (project_id, flow_id);
CREATE INDEX executor_id
  ON execution_flows (executor_id);
CREATE INDEX ex_flows_staus
  ON execution_flows (status);
CREATE TABLE execution_jobs (
  exec_id       INT          NOT NULL,
  project_id    INT          NOT NULL,
  version       INT          NOT NULL,
  flow_id       VARCHAR(128) NOT NULL,
  job_id        VARCHAR(512) NOT NULL,
  attempt       INT,
  start_time    BIGINT,
  end_time      BIGINT,
  status        TINYINT,
  input_params  LONGBLOB,
  output_params LONGBLOB,
  attachments   LONGBLOB,
  PRIMARY KEY (exec_id, job_id, flow_id, attempt)
);

CREATE INDEX ex_job_id
  ON execution_jobs (project_id, job_id);
-- In table execution_logs, name is the combination of flow_id and job_id
--
-- prefix support and lengths of prefixes (where supported) are storage engine dependent.
-- By default, the index key prefix length limit is 767 bytes for innoDB.
-- from: https://dev.mysql.com/doc/refman/5.7/en/create-index.html

CREATE TABLE execution_logs (
  exec_id     INT NOT NULL,
  name        VARCHAR(640),
  attempt     INT,
  enc_type    TINYINT,
  start_byte  INT,
  end_byte    INT,
  log         LONGBLOB,
  upload_time BIGINT,
  PRIMARY KEY (exec_id, name, attempt, start_byte)
);

CREATE INDEX ex_log_attempt
  ON execution_logs (exec_id, name, attempt);
CREATE INDEX ex_log_index
  ON execution_logs (exec_id, name);
CREATE INDEX ex_log_upload_time
  ON execution_logs (upload_time);
CREATE TABLE executor_events (
  executor_id INT      NOT NULL,
  event_type  TINYINT  NOT NULL,
  event_time  DATETIME NOT NULL,
  username    VARCHAR(64),
  message     VARCHAR(512)
);

CREATE INDEX executor_log
  ON executor_events (executor_id, event_time);
CREATE TABLE executors (
  id     INT         NOT NULL PRIMARY KEY AUTO_INCREMENT,
  host   VARCHAR(64) NOT NULL,
  port   INT         NOT NULL,
  active BOOLEAN                          DEFAULT FALSE,
  UNIQUE (host, port),
  UNIQUE INDEX executor_id (id)
);

CREATE INDEX executor_connection
  ON executors (host, port);
CREATE TABLE project_events (
  project_id INT     NOT NULL,
  event_type TINYINT NOT NULL,
  event_time BIGINT  NOT NULL,
  username   VARCHAR(64),
  message    VARCHAR(512)
);

CREATE INDEX log
  ON project_events (project_id, event_time);
CREATE TABLE project_files (
  project_id INT NOT NULL,
  version    INT NOT NULL,
  chunk      INT,
  size       INT,
  file       LONGBLOB,
  PRIMARY KEY (project_id, version, chunk)
);

CREATE INDEX file_version
  ON project_files (project_id, version);
CREATE TABLE project_flow_files (
  project_id        INT          NOT NULL,
  project_version   INT          NOT NULL,
  flow_name         VARCHAR(128) NOT NULL,
  flow_version      INT          NOT NULL,
  modified_time     BIGINT       NOT NULL,
  flow_file         LONGBLOB,
  PRIMARY KEY (project_id, project_version, flow_name, flow_version)
);
CREATE TABLE project_flows (
  project_id    INT    NOT NULL,
  version       INT    NOT NULL,
  flow_id       VARCHAR(128),
  modified_time BIGINT NOT NULL,
  encoding_type TINYINT,
  json          BLOB,
  PRIMARY KEY (project_id, version, flow_id)
);

CREATE INDEX flow_index
  ON project_flows (project_id, version);
CREATE TABLE project_permissions (
  project_id    VARCHAR(64) NOT NULL,
  modified_time BIGINT      NOT NULL,
  name          VARCHAR(64) NOT NULL,
  permissions   INT         NOT NULL,
  isGroup       BOOLEAN     NOT NULL,
  PRIMARY KEY (project_id, name)
);

CREATE INDEX permission_index
  ON project_permissions (project_id);
CREATE TABLE project_properties (
  project_id    INT    NOT NULL,
  version       INT    NOT NULL,
  name          VARCHAR(255),
  modified_time BIGINT NOT NULL,
  encoding_type TINYINT,
  property      BLOB,
  PRIMARY KEY (project_id, version, name)
);

CREATE INDEX properties_index
  ON project_properties (project_id, version);
CREATE TABLE project_versions (
  project_id  INT         NOT NULL,
  version     INT         NOT NULL,
  upload_time BIGINT      NOT NULL,
  uploader    VARCHAR(64) NOT NULL,
  file_type   VARCHAR(16),
  file_name   VARCHAR(128),
  md5         BINARY(16),
  num_chunks  INT,
  resource_id VARCHAR(512) DEFAULT NULL,
  PRIMARY KEY (project_id, version)
);

CREATE INDEX version_index
  ON project_versions (project_id);
CREATE TABLE projects (
  id               INT         NOT NULL PRIMARY KEY AUTO_INCREMENT,
  name             VARCHAR(64) NOT NULL,
  active           BOOLEAN,
  modified_time    BIGINT      NOT NULL,
  create_time      BIGINT      NOT NULL,
  version          INT,
  last_modified_by VARCHAR(64) NOT NULL,
  description      VARCHAR(2048),
  enc_type         TINYINT,
  settings_blob    LONGBLOB,
  UNIQUE INDEX project_id (id)
);

CREATE INDEX project_name
  ON projects (name);
CREATE TABLE properties (
  name          VARCHAR(64) NOT NULL,
  type          INT         NOT NULL,
  modified_time BIGINT      NOT NULL,
  value         VARCHAR(256),
  PRIMARY KEY (name, type)
);
-- This file collects all quartz table create statement required for quartz 2.2.1
--
-- We are using Quartz 2.2.1 tables, the original place of which can be found at
-- https://github.com/quartz-scheduler/quartz/blob/quartz-2.2.1/distribution/src/main/assembly/root/docs/dbTables/tables_mysql.sql


DROP TABLE IF EXISTS QRTZ_FIRED_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_PAUSED_TRIGGER_GRPS;
DROP TABLE IF EXISTS QRTZ_SCHEDULER_STATE;
DROP TABLE IF EXISTS QRTZ_LOCKS;
DROP TABLE IF EXISTS QRTZ_SIMPLE_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_SIMPROP_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_CRON_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_BLOB_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_TRIGGERS;
DROP TABLE IF EXISTS QRTZ_JOB_DETAILS;
DROP TABLE IF EXISTS QRTZ_CALENDARS;


CREATE TABLE QRTZ_JOB_DETAILS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    JOB_NAME  VARCHAR(200) NOT NULL,
    JOB_GROUP VARCHAR(200) NOT NULL,
    DESCRIPTION VARCHAR(250) NULL,
    JOB_CLASS_NAME   VARCHAR(250) NOT NULL,
    IS_DURABLE VARCHAR(1) NOT NULL,
    IS_NONCONCURRENT VARCHAR(1) NOT NULL,
    IS_UPDATE_DATA VARCHAR(1) NOT NULL,
    REQUESTS_RECOVERY VARCHAR(1) NOT NULL,
    JOB_DATA BLOB NULL,
    PRIMARY KEY (SCHED_NAME,JOB_NAME,JOB_GROUP)
);

CREATE TABLE QRTZ_TRIGGERS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    JOB_NAME  VARCHAR(200) NOT NULL,
    JOB_GROUP VARCHAR(200) NOT NULL,
    DESCRIPTION VARCHAR(250) NULL,
    NEXT_FIRE_TIME BIGINT(13) NULL,
    PREV_FIRE_TIME BIGINT(13) NULL,
    PRIORITY INTEGER NULL,
    TRIGGER_STATE VARCHAR(16) NOT NULL,
    TRIGGER_TYPE VARCHAR(8) NOT NULL,
    START_TIME BIGINT(13) NOT NULL,
    END_TIME BIGINT(13) NULL,
    CALENDAR_NAME VARCHAR(200) NULL,
    MISFIRE_INSTR SMALLINT(2) NULL,
    JOB_DATA BLOB NULL,
    PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (SCHED_NAME,JOB_NAME,JOB_GROUP)
        REFERENCES QRTZ_JOB_DETAILS(SCHED_NAME,JOB_NAME,JOB_GROUP)
);

CREATE TABLE QRTZ_SIMPLE_TRIGGERS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    REPEAT_COUNT BIGINT(7) NOT NULL,
    REPEAT_INTERVAL BIGINT(12) NOT NULL,
    TIMES_TRIGGERED BIGINT(10) NOT NULL,
    PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
        REFERENCES QRTZ_TRIGGERS(SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ_CRON_TRIGGERS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    CRON_EXPRESSION VARCHAR(200) NOT NULL,
    TIME_ZONE_ID VARCHAR(80),
    PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
        REFERENCES QRTZ_TRIGGERS(SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ_SIMPROP_TRIGGERS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    STR_PROP_1 VARCHAR(512) NULL,
    STR_PROP_2 VARCHAR(512) NULL,
    STR_PROP_3 VARCHAR(512) NULL,
    INT_PROP_1 INT NULL,
    INT_PROP_2 INT NULL,
    LONG_PROP_1 BIGINT NULL,
    LONG_PROP_2 BIGINT NULL,
    DEC_PROP_1 NUMERIC(13,4) NULL,
    DEC_PROP_2 NUMERIC(13,4) NULL,
    BOOL_PROP_1 VARCHAR(1) NULL,
    BOOL_PROP_2 VARCHAR(1) NULL,
    PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
    REFERENCES QRTZ_TRIGGERS(SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ_BLOB_TRIGGERS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    BLOB_DATA BLOB NULL,
    PRIMARY KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
        REFERENCES QRTZ_TRIGGERS(SCHED_NAME,TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ_CALENDARS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    CALENDAR_NAME  VARCHAR(200) NOT NULL,
    CALENDAR BLOB NOT NULL,
    PRIMARY KEY (SCHED_NAME,CALENDAR_NAME)
);

CREATE TABLE QRTZ_PAUSED_TRIGGER_GRPS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    TRIGGER_GROUP  VARCHAR(200) NOT NULL,
    PRIMARY KEY (SCHED_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ_FIRED_TRIGGERS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    ENTRY_ID VARCHAR(95) NOT NULL,
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    INSTANCE_NAME VARCHAR(200) NOT NULL,
    FIRED_TIME BIGINT(13) NOT NULL,
    SCHED_TIME BIGINT(13) NOT NULL,
    PRIORITY INTEGER NOT NULL,
    STATE VARCHAR(16) NOT NULL,
    JOB_NAME VARCHAR(200) NULL,
    JOB_GROUP VARCHAR(200) NULL,
    IS_NONCONCURRENT VARCHAR(1) NULL,
    REQUESTS_RECOVERY VARCHAR(1) NULL,
    PRIMARY KEY (SCHED_NAME,ENTRY_ID)
);

CREATE TABLE QRTZ_SCHEDULER_STATE
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    INSTANCE_NAME VARCHAR(200) NOT NULL,
    LAST_CHECKIN_TIME BIGINT(13) NOT NULL,
    CHECKIN_INTERVAL BIGINT(13) NOT NULL,
    PRIMARY KEY (SCHED_NAME,INSTANCE_NAME)
);

CREATE TABLE QRTZ_LOCKS
  (
    SCHED_NAME VARCHAR(120) NOT NULL,
    LOCK_NAME  VARCHAR(40) NOT NULL,
    PRIMARY KEY (SCHED_NAME,LOCK_NAME)
);


commit;
CREATE TABLE triggers (
  trigger_id     INT    NOT NULL AUTO_INCREMENT,
  trigger_source VARCHAR(128),
  modify_time    BIGINT NOT NULL,
  enc_type       TINYINT,
  data           LONGBLOB,
  PRIMARY KEY (trigger_id)
);

3、web-server解压

server1(192.168.10.41) 上操作

mkdir /usr/local/bigdata/azkaban3.51.0/web-server
#需要将azkaban-web-server-0.1.0-SNAPSHOT.tar.gz文件放在/usr/local/bigdata/azkaban3.51.0/web-server
tar -zxvf azkaban-web-server-0.1.0-SNAPSHOT.tar.gz 

4、exec-server解压

server2(192.168.10.42)上操作

mkdir /usr/local/bigdata/azkaban3.51.0/exec-server
#需要将azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz文件放在/usr/local/bigdata/azkaban3.51.0/exec-server
tar -zxvf azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz 

5、web-server服务器配置

配置conf/azkaban.properties

# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=../web/
default.timezone.id=Asia/Shanghai

# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=../conf/azkaban-users.xml

# Loader for projects
executor.global.properties=../conf/global.properties
azkaban.project.dir=projects

# Velocity dev mode
velocity.dev.mode=false

# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.ssl.port=8443
jetty.maxThreads=25
jetty.port=8081

# Azkaban Executor settings 配置执行器地址
executor.host=192.168.10.42
executor.port=12321

# KeyStore for SSL ssl
jetty.keystore=keystore
jetty.password=123456
jetty.keypassword=123456
jetty.truststore=keystore
jetty.trustpassword=123456

# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081

job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache

# JMX stats
jetty.connector.stats=true
executor.connector.stats=true

# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=192.168.10.44
mysql.database=azkaban
mysql.user=root
mysql.password=rootroot
mysql.numconnections=100

#Multiple Executor
azkaban.use.multiple.executors=false
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1

添加azkaban.native.lib=false 和 execute.as.user=false属性

mkdir -p plugins/jobtypes
vim commonprivate.properties

zkaban.native.lib=false
execute.as.user=false
memCheck.enabled=false

6、exec-server服务器配置

1)、配置conf/azkaban.properties

# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=../web/
default.timezone.id=Asia/Shanghai

# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=../conf/azkaban-users.xml

# Loader for projects
executor.global.properties=../conf/global.properties
azkaban.project.dir=projects

# Velocity dev mode
velocity.dev.mode=false

# Where the Azkaban web server is located
azkaban.webserver.url=http://192.168.10.41:8081

# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache

# JMX stats
jetty.connector.stats=true
executor.connector.stats=true

# Azkaban plugin settings
azkaban.jobtype.plugin.dir=../plugins/jobtypes


# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=192.168.10.44
mysql.database=azkaban
mysql.user=root
mysql.password=6666666
mysql.numconnections=100

# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30
executor.port=12321

2)、commonprivate.properties设置

  • 配置azkaban.native.lib对应的文件
#这是Azkaban单机部署(或web-server和exec-server在同一台机器上)解决方式,不适合集群部署,因为集群部署azkaban.native.lib是需要配置,不能配置为false
execute.as.user=false
azkaban.native.lib=false

#配置azkaban.native.lib的关键是execute-as-user这个文件,azkaban.native.lib路径是能对应上execute-as-user这个文件

# 重新编译execute-as-user
# execute-as-user.c文件地址:azkaban-3.51.0/az-exec-util/src/main/c/execute-as-user.c
#将该文件复制至/usr/local/bigdata/azkaban3.51.0/exec-server/azkaban-exec-server-0.1.0-SNAPSHOT目录下
#(具体位置可随意,但在commonprivate.properties中需要指定具体的位置)
#在当前目录下执行以下三个命令
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# gcc execute-as-user.c -o execute-as-user
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# ll
总用量 36
drwxr-xr-x 5 root root  4096 8月  17 13:32 bin
drwxr-xr-x 2 root root  4096 8月  17 10:00 conf
-rwxr-xr-x 1 root root 10225 8月  17 13:32 execute-as-user
-rw-r--r-- 1 root root  3976 8月  17 13:32 execute-as-user.c
drwxr-xr-x 2 root root  4096 5月  29 2019 lib
drwxr-xr-x 2 root root  4096 8月  17 11:13 logs
drwxr-xr-x 3 root root  4096 5月  29 2019 plugins
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# chown root execute-as-user
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# 
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# chmod 6050 execute-as-user
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# ll
总用量 36
drwxr-xr-x 5 root root  4096 8月  17 13:32 bin
drwxr-xr-x 2 root root  4096 8月  17 10:00 conf
---Sr-s--- 1 root root 10225 8月  17 13:32 execute-as-user
-rw-r--r-- 1 root root  3976 8月  17 13:32 execute-as-user.c
drwxr-xr-x 2 root root  4096 5月  29 2019 lib
drwxr-xr-x 2 root root  4096 8月  17 11:13 logs
drwxr-xr-x 3 root root  4096 5月  29 2019 plugins

#命令解释:
#1.使用 gcc execute-as-user.c -o execute-as-user 命令编译
#2.然后使用chown root execute-as-user 和 chmod 6050 execute-as-user 设置权限
#设置完权限后,ls -l 看下,这个文件的属性这样的
---Sr-s--- 1 root root 10225 8月  17 13:32 execute-as-user
#如果我们要把这个文件复制到其他目录下,就又会变成普通文件,记得复制完了再做一次 chmod 6050 execute-as-user
#https://www.jianshu.com/p/bff11c87565b?utm_campaign=maleskine
#https://blog.csdn.net/u011487470/article/details/115941582
  • commonprivate.properties配置

在/usr/local/bigdata/azkaban3.51.0/exec-server/azkaban-exec-server-0.1.0-SNAPSHOT/plugins/jobtypes目录下创建commonprivate.properties

如果要关闭execute-as-user特性,就在这里写execute.as.user=false(默认是true)

如果设置为true,则该模式下任务将不能成功运行。出现的异常如下:

java.lang.RuntimeException: Not permitted to proxy as 'azkaban' through Azkaban
	at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:240)
	at azkaban.execapp.JobRunner.runJob(JobRunner.java:784)
	at azkaban.execapp.JobRunner.doRun(JobRunner.java:600)
	at azkaban.execapp.JobRunner.run(JobRunner.java:561)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
17-08-2022 14:29:04 CST one ERROR - Not permitted to proxy as 'azkaban' through Azkaban cause: null
17-08-2022 14:29:04 CST one INFO - Finishing job one at 1660717744956 with status FAILED

如果设置为false,则该模式下任务能正常运行 然后添加如下配置

# set execute-as-user
execute.as.user=false 
azkaban.native.lib=/usr/local/bigdata/azkaban3.51.0/exec-server/azkaban-exec-server-0.1.0-SNAPSHOT
azkaban.group.name=root

7、集群启动

  • 先启动exec-server
[root@localhost bin]# ll
总用量 12
drwxrwxr-x 2 root root 4096 8月  16 15:24 internal
-rwxr-xr-x 1 root root  214 8月   9 2018 shutdown-exec.sh
-rwxr-xr-x 1 root root  207 8月   9 2018 start-exec.sh
[root@localhost bin]# ./start-exec.sh 
[root@localhost bin]# jps
14136 AzkabanExecutorServer
22120 Jps
# 启动注意事项,需要手动激活executor,每次启动(前提是启动成功后)。或者启动后,直接修改数据库的active的值为1即可
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# pwd
/usr/local/bigdata/azkaban3.51.0/exec-server/azkaban-exec-server-0.1.0-SNAPSHOT
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# curl -G "server2:$(<./bin/executor.port)/executor?action=activate" && echo
{"status":"success"}

在这里插入图片描述 由0变成1 在这里插入图片描述

  • 再启动web-server
[root@server bin]# ll
总用量 12
drwxrwxr-x 2 root root 4096 8月   9 2018 internal
-rwxr-xr-x 1 root root  211 8月   9 2018 shutdown-web.sh
-rwxr-xr-x 1 root root  123 8月   9 2018 start-web.sh
[root@server bin]# ./start-web.sh 
[root@server bin]# jps
17779 Jps
14251 AzkabanWebServer

8、验证

浏览器登录http://192.168.10.41:8081/ ,默认用户名密码azkaban 在这里插入图片描述

9、测试

测试方式与上面的solo-server部署的测试方式相同,不再赘述。

四、multiple-executor模式部署与验证

multiple-executor模式是多个executor Server分布在不同服务器上,只需要将azkaban-exec-server安装包拷贝到不同机器上即可组成分布式。 在这里插入图片描述

1、web-server部署

参考本文的第三部分关于web-server的部署方式。 修改配置文件内容如下,起他的不变

# Azkaban Executor settings
# 去掉了具体执行器的ip,以下为源码中的说明
// The property is used for the web server to get the host name of the executor when running in SOLO mode.
//public static final String EXECUTOR_HOST = "executor.host";
// The property is used for the web server to get the port of the executor when running in SOLO mode.
//public static final String EXECUTOR_PORT = "executor.port";
executor.port=12321

#Multiple Executor
azkaban.use.multiple.executors=true

2、exec-server部署

参考本文的第三部分关于exec-server的部署方式,本部分部署是在two-server的基础上进行的,故server2的配置不变。 下文是部署server3上的exec-server角色

1)、将server2中的部署文件拷贝纸server3同目录下

/usr/local/bigdata/azkaban3.51.0/exec-server/azkaban-exec-server-0.1.0-SNAPSHOT 
#server3上目录不存在,则创建
# 建议还是按照解压、配置、启动,否则可能出现各个文件夹权限的问题

2)、修改配置

  • 执行chmod 6050 execute-as-user命令 在/usr/local/bigdata/azkaban3.51.0/exec-server/azkaban-exec-server-0.1.0-SNAPSHOT目录下执行该命令
[root@localhost ~]# cd /usr/local/bigdata/azkaban3.51.0/exec-server/azkaban-exec-server-0.1.0-SNAPSHOT
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# ll
总用量 40
drwxr-xr-x 3 root root  4096 8月  17 15:08 bin
drwxr-xr-x 2 root root  4096 8月  17 15:07 conf
-rw-r--r-- 1 root root     6 8月  17 15:07 currentpid
-rw-r--r-- 1 root root 10225 8月  17 15:07 execute-as-user
-rw-r--r-- 1 root root  3976 8月  17 15:07 execute-as-user.c
drwxr-xr-x 2 root root  4096 8月  17 15:07 lib
drwxr-xr-x 2 root root  4096 8月  17 15:08 logs
drwxr-xr-x 3 root root  4096 8月  17 15:07 plugins
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# chmod 6050 execute-as-user
[root@localhost azkaban-exec-server-0.1.0-SNAPSHOT]# ll
总用量 40
drwxr-xr-x 3 root root  4096 8月  17 15:08 bin
drwxr-xr-x 2 root root  4096 8月  17 15:07 conf
-rw-r--r-- 1 root root     6 8月  17 15:07 currentpid
---Sr-s--- 1 root root 10225 8月  17 15:07 execute-as-user
-rw-r--r-- 1 root root  3976 8月  17 15:07 execute-as-user.c
drwxr-xr-x 2 root root  4096 8月  17 15:07 lib
drwxr-xr-x 2 root root  4096 8月  17 15:08 logs
drwxr-xr-x 3 root root  4096 8月  17 15:07 plugins
  • 在mysql中增加exec-server 在这里插入图片描述

3、启动集群

与上面two-server模式部署一样,请参考。

4、验证与测试

与上文中的验证方式一样,请参考。

以上,完成了azkaban的介绍与三种模式的部署与验证。