推荐的群集主机和角色分配

要点:本主题描述了Cloudera Manager管理的CDH群集的建议角色分配。您为部署选择的实际分配可能会有所不同,具体取决于工作负载的类型和数量,群集中部署的服务,硬件资源,配置和其他因素。

使用Cloudera Manager安装向导安装CDH时,Cloudera Manager会尝试根据主机中可用的资源在群集主机(分配给网关主机的角色除外)之间分配角色。您可以在向导中显示的“ 自定义角色分配”页面上更改这些分配。您也可以稍后使用Cloudera Manager更改和添加角色。请参阅角色实例。

如果您的群集使用静态数据加密,请参阅为密钥受托者服务器和密钥受托者KMS分配主机。

有关在何处找到Cloudera Manager和其他服务所需的各种数据库的信息,请参阅步骤4:安装和配置数据库。

继续阅读:

  • CDH群集主机和角色分配
  • 为密钥受托者服务器和密钥受托者KMS分配主机

 

CDH群集主机和角色分配

群集主机可以大致描述为以下类型:

  • 主主机运行Hadoop主进程,例如HDFS NameNode和YARN Resource Manager。
  • 实用程序主机运行不是主进程的其他集群进程,例如Cloudera Manager和Hive Metastore。
  • 网关主机是用于在群集中启动作业的客户端访问点。所需的网关主机数量取决于工作负载的类型和大小。
  • 工作者主机主要运行DataNode和其他分布式进程,例如Impalad。

重要提示: Cloudera建议您在生产环境中使用CDH时始终启用高可用性。

下表描述了针对不同群集大小的建议角色分配:

  • 3 - 10名没有高可用性的工作人员主机
  • 3 - 20名具有高可用性的工作人员主机
  • 20 - 80个具有高可用性的工作主机
  • 80 - 200个具有高可用性的工作主机
  • 200 - 500个具有高可用性的工作主机
  • 具有高可用性的500-1000个工作主机

3 - 10 Worker Hosts without High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Master Host 1:

  • NameNode
  • YARN ResourceManager
  • JobHistory Server
  • ZooKeeper
  • Kudu master
  • Spark History Server

One host for all Utility and Gateway roles:

  • Secondary NameNode
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • HiveServer2
  • Impala Catalog Server
  • Impala StateStore
  • Hue
  • Oozie
  • Flume
  • Gateway configuration

3 - 10 Worker Hosts:

  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

3 - 20 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Master Host 1:

  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • JobHistory Server
  • Spark History Server
  • Kudu master


Master Host 2:

  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master


Master Host 3:

  • Kudu master (Kudu requires an odd number of masters for HA.)

Utility Host 1:

  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
  • ZooKeeper (requires dedicated disk)
  • JournalNode (requires dedicated disk)

One or more Gateway Hosts:

  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration

3 - 20 Worker Hosts:

  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

20 - 80 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Master Host 1:

  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master


Master Host 2:

  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master


Master Host 3:

  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Spark History Server
  • Kudu master


Utility Host 1:

  • Cloudera Manager


Utility Host 2:

  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Oozie


One or more Gateway Hosts:

  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration

20 - 80 Worker Hosts:

  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

80 - 200 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Master Host 1:

  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master


Master Host 2:

  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master


Master Host 3:

  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Spark History Server
  • Kudu master


Utility Host 1:

  • Cloudera Manager


Utility Host 2:

  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie


Utility Host 3:

  • Activity Monitor


Utility Host 4:

  • Host Monitor


Utility Host 5:

  • Navigator Audit Server


Utility Host 6:

  • Navigator Metadata Server


Utility Host 7:

  • Reports Manager


Utility Host 8:

  • Service Monitor


One or more Gateway Hosts:

  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration

80 - 200 Worker Hosts:

  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

200 - 500 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Master Host 1:

  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master


Master Host 2:

  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master


Master Host 3:

  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master


Master Host 4:

  • YARN ResourceManager
  • ZooKeeper
  • JournalNode


Master Host 5:

  • JobHistory Server
  • Spark History Server
  • ZooKeeper
  • JournalNode


We recommend no more than three Kudu masters.

Utility Host 1:

  • Cloudera Manager


Utility Host 2:

  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie


Utility Host 3:

  • Activity Monitor


Utility Host 4:

  • Host Monitor


Utility Host 5:

  • Navigator Audit Server


Utility Host 6:

  • Navigator Metadata Server


Utility Host 7:

  • Reports Manager


Utility Host 8:

  • Service Monitor


One or more Gateway Hosts:

  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration

200 - 500 Worker Hosts:

  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

500 -1000 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Master Host 1:

  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master


Master Host 2:

  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master


Master Host 3:

  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master


Master Host 4:

  • YARN ResourceManager
  • ZooKeeper
  • JournalNode


Master Host 5:

  • JobHistory Server
  • Spark History Server
  • ZooKeeper
  • JournalNode


We recommend no more than three Kudu masters.

Utility Host 1:

  • Cloudera Manager


Utility Host 2:

  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie


Utility Host 3:

  • Activity Monitor


Utility Host 4:

  • Host Monitor


Utility Host 5:

  • Navigator Audit Server


Utility Host 6:

  • Navigator Metadata Server


Utility Host 7:

  • Reports Manager


Utility Host 8:

  • Service Monitor


One or more Gateway Hosts:

  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration

500 - 1000 Worker Hosts:

  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

为密钥受托者服务器和密钥受托者KMS分配主机

如果要为CDH群集启用静态数据加密,Cloudera建议您通过在Cloudera Manager管理的单独群集中的专用主机上部署密钥受托者服务器,将密钥受托者服务器与其他企业数据中心(EDH)服务隔离开来。Cloudera还建议在与需要访问Key Trustee Server的EDH服务相同的群集中的专用主机上部署Key Trustee KMS。此体系结构允许多个群集共享相同的密钥托管服务器,并避免在重新启动群集时重新启动密钥托管服务器。

有关在EDH中加密静态数据的详细信息,请参阅加密静态数据。

对于一般的生产环境,或者如果您已启用HDFS的高可用性并且正在使用静态数据加密,Cloudera建议您为密钥受托服务器和密钥受托者KMS启用高可用性。

看到:

  • Cloudera Navigator密钥受托服务器高可用性
  • 启用密钥受托者KMS高可用性