环境:

系统:统信有岳 1060A

集群:统信有雀裸金属部署

问题来源

有雀集群基础节点DNS配置

[root@bastion ~]# cat /etc/coredns/Corefile 
.:53 {
    template IN A apps.utccp.example.com {
    match .*apps\.utccp\.example\.com
    answer "{{ .Name }} 60 IN A 10.12.24.125"
    fallthrough
    }
    hosts {
        10.12.24.125 api.utccp.example.com
        10.12.24.125 api-int.utccp.example.com
        10.12.24.125 bastion.utccp.example.com
        10.12.24.127 master1.utccp.example.com
        10.12.24.128 master2.utccp.example.com
        10.12.24.129 master3.utccp.example.com
        fallthrough
    }
    prometheus
    cache 160
    forward . 114.114.114.114
    log
}
已发现问题:

环境: 有雀默认配置, 运行48小时数据.总请求数: 660,100

类型

数量

问题

比例

ipv4请求

398826

正常请求

0.60419

ipv6请求

206733

集群默认为ipv4,无ipv6网络

0.31318

重复基域请求

54541

集群基域重复会造成 NXDOMAIN ,无法解析

0.08262

据以上数据可以看出非正常请求比例在40%左右.

问提分析

  1. 首先得排查ipv6请求来源.
    在离线集群默认状况下, DNS请求基本都来源于各个组件的容器,系统服务的DNS请求可以忽略不记.
  2. 排查重复基域出现原因.
    默认状况下,各个节点 hostname 都会带有基域, hostname 是通过 hostnamectl set-hostname 配置的,可能会出现问题.

关闭 ipv6 DNS解析请求 (客户端侧)

  1. 在NetworkManager 关闭ipv6.
# nmcli connection modify enp1s0 ipv6.method disabled
# systemctl restart NetworkManager

发现依然会有ipv6 dns请求到达基础节点.

  1. 关闭系统的内核参数, 在各节点执行
# sysctl -w net.ipv6.conf.all.disable_ipv6=1
# sysctl -w net.ipv6.conf.all.disable_policy=1

经过以上两个步骤还是会有ipv6 dns请求到达基础节点.

  1. 关闭avahi-daemon 的ipv6
# vim /etc/avahi/avahi-daemon.conf
设置
use-ipv6=no

# systemctl restart avahi-daemon

依然不行

  1. 设置 /etc/resolv.conf
options single-request-reopen

# systemctl restart NetworkManager

依然不行

  1. 修改 OVS配置
# vim /etc/openvswitch/ovs-vswitchd.conf.db
other_config:ipv6_prefix=[]
# systemctl restart openvswitch.service

依然不行

  1. 在 /etc/hosts 注释ipv6本地
[root@worker1 ~]# cat /etc/hosts 
 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
 #::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

依然不行

  1. 禁用 /etc/gai.conf
precedence ::ffff:0:0/96  100

依然不行

  1. 去掉内核模块
modprobe -r ipv6
 # 内建模块无法卸载
  1. 在经过查找文档,发现 rfc4472 第5.1章 描述
5.1.  DNS Lookups May Query IPv6 Records Prematurely

The system library that implements the getaddrinfo() function for
looking up names is a critical piece when considering the robustness
of enabling IPv6; it may come in basically three flavors:

1.  The system library does not know whether IPv6 has been enabled in
    the kernel of the operating system: it may start looking up AAAA
    records with getaddrinfo() and AF_UNSPEC hint when the system is
    upgraded to a system library version that supports IPv6.

2.  The system library might start to perform IPv6 queries with
    getaddrinfo() only when IPv6 has been enabled in the kernel.
    However, this does not guarantee that there exists any useful
    IPv6 connectivity (e.g., the node could be isolated from the
    other IPv6 networks, only having link-local addresses).

3.  The system library might implement a toggle that would apply some
    heuristics to the "IPv6-readiness" of the node before starting to
    perform queries; for example, it could check whether only link-
    local IPv6 address(es) exists, or if at least one global IPv6
    address exists.

     First, let us consider generic implications of unnecessary queries
    for AAAA records: when looking up all the records in the DNS, AAAA
    records are typically tried first, and then A records.  These are
    done in serial, and the A query is not performed until a response is
    received to the AAAA query.  Considering the misbehavior of DNS
    servers and load-balancers, as described in Section 3.1, the lookup
    delay for AAAA may incur additional unnecessary latency, and
    introduce a component of unreliability.
  1. 加 grub参数禁用ipv6
ipv6.disable=1
[root@worker1 ~]# ss -tunlp
 Netid             State              Recv-Q             Send-Q                         Local Address:Port                          Peer Address:Port             Process                                             
 udp               UNCONN             0                  0                                    0.0.0.0:111                                0.0.0.0:*                 users:(("rpcbind",pid=793,fd=6))                   
 udp               UNCONN             0                  0                                    0.0.0.0:33062                              0.0.0.0:*                 users:(("avahi-daemon",pid=799,fd=16))             
 udp               UNCONN             0                  0                                  127.0.0.1:323                                0.0.0.0:*                 users:(("chronyd",pid=817,fd=6))                   
 udp               UNCONN             0                  0                                    0.0.0.0:4789                               0.0.0.0:*                                                                    
 udp               UNCONN             0                  0                                    0.0.0.0:5353                               0.0.0.0:*                 users:(("avahi-daemon",pid=799,fd=15))             
 udp               UNCONN             0                  0                                    0.0.0.0:55162                              0.0.0.0:*                 users:(("rpcbind",pid=793,fd=7))                   
 tcp               LISTEN             0                  128                                  0.0.0.0:111                                0.0.0.0:*                 users:(("rpcbind",pid=793,fd=8))                   
 tcp               LISTEN             0                  128                                  0.0.0.0:22                                 0.0.0.0:*                 users:(("sshd",pid=1208,fd=3))                     
 tcp               LISTEN             0                  5                                  127.0.0.1:631                                0.0.0.0:*                 users:(("cupsd",pid=1327,fd=10))

虽然关闭了所有ipv6地址相关信息,但还是无法阻止ipv6dns 解析请求。

  1. 经谷歌发现,在glibc 2.36 版本修复此bug,在 /etc/resolv.conf 添加了 option no-aaaa 。 链接
* The “no-aaaa” DNS stub resolver option has been added.  System
   administrators can use it to suppress AAAA queries made by the stub
   resolver, including AAAA lookups triggered by NSS-based interfaces
   such as getaddrinfo.  Only DNS lookups are affected: IPv6 data in
   /etc/hosts is still used, getaddrinfo with AI_PASSIVE will still
   produce IPv6 addresses, and configured IPv6 name servers are still
   used.  To produce correct Name Error (NXDOMAIN) results, AAAA queries
   are translated to A queries.  The new resolver option is intended
   primarily for diagnostic purposes, to rule out that AAAA DNS queries
   have adverse impact.  It is incompatible with EDNS0 usage and DNSSEC
   validation by applications.

目前我使用的系统glibc版本还不够高,又不能随便升级。所有为了规避A AAAA 同时请求造成的超时(按照 /etc/resolv.conf 解释说,当请求A和AAAA记录时,A请求到了但AAAA没请求到会有5秒的超时时间-----恐怖)。 所以就有了下文。

服务端侧(规避)

既然我无法解决glibc的问题,那我就对基础节点的coredns下手。

解释: 默认情况的DNS请求,会同时请求A和AAAA记录,但如果在服务端立即打回AAAA记录,可能会比A记录更快,这就避免了5秒的timeout时间了。

目前想到两种解决方法

  1. 用rewrite 直接拒绝AAAA记录 (保险 安全, 推荐)
.:53 {
     rewrite stop type AAAA A

     template IN A apps.utccp.example.com {
     match .*apps\.utccp\.example\.com
     answer "{{ .Name }} 60 IN A 10.12.24.125"
     fallthrough
     }
     hosts {
         10.12.24.125 api.utccp.example.com
         10.12.24.125 api-int.utccp.example.com
         10.12.24.125 bastion.utccp.example.com
         10.12.24.127 master1.utccp.example.com
         10.12.24.128 master2.utccp.example.com
         10.12.24.129 master3.utccp.example.com
         fallthrough
     }
     prometheus
     cache 160
     forward . 114.114.114.114
     log
 }
  1. 让AAAA记录打回 NXDOMAIN (不怎么推荐,因为AAAA记录会有错误信息)
.:53 {
     template IN AAAA {
       rcode NXDOMAIN
     }

     template IN A apps.utccp.example.com {
     match .*apps\.utccp\.example\.com
     answer "{{ .Name }} 60 IN A 10.12.24.125"
     fallthrough
     }
     hosts {
         10.12.24.125 api.utccp.example.com
         10.12.24.125 api-int.utccp.example.com
         10.12.24.125 bastion.utccp.example.com
         10.12.24.127 master1.utccp.example.com
         10.12.24.128 master2.utccp.example.com
         10.12.24.129 master3.utccp.example.com
         fallthrough
     }
     prometheus
     cache 160
     forward . 114.114.114.114
     log
 }

时间关系,还有一个重复基域的问题留到下期。

参考链接:

  1. resolv手册:https://man7.org/linux/man-pages/man5/resolv.conf.5.html
  2. dns标准: https://www.rfc-editor.org/rfc/rfc4472.html
  3. openshift 文档: https://docs.openshift.com/container-platform/4.13/rest_api/operator_apis/dns-operator-openshift-io-v1.html#spec-upstreamresolvers
  4. 好心人提的bug:https://bugzilla.redhat.com/show_bug.cgi?id=1027452
  5. glibc更新日志: https://lists.gnu.org/archive/html/info-gnu/2022-08/msg00000.html
  6. 好心人克隆的glibc源码: https://github.com/bminor/glibc/tree/ibm/2.28/master
  7. 好心人提的glib patch: https://sourceware.org/pipermail/libc-alpha/2022-June/139341.html
  8. 有道翻译:https://fanyi.youdao.com