前些天,给同事 review 一个 MR。MR 本身没什么问题,merge 完之后突发奇想跑了一下 golangci-lint 看看有没有啥问题。看到一个 issue 如下所示:

main.go:102:16: SA1017: the channel used with signal.Notify should be buffered (staticcheck)
	signal.Notify(ch, syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGINT)

很好奇,以前从来没见过这个 issue。于是查看了一下源码发现了问题。

虽然以前看网上的代码 signal.Notify 也注意到别人都有分配了带 buffer 的 channel,但是也没有细想。查看 signal.Notify 的源码,在 signal.go 中:

// Notify causes package signal to relay incoming signals to c.
// If no signals are provided, all incoming signals will be relayed to c.
// Otherwise, just the provided signals will.
//
// Package signal will not block sending to c: the caller must ensure
// that c has sufficient buffer space to keep up with the expected
// signal rate. For a channel used for notification of just one signal value,
// a buffer of size 1 is sufficient.
//
// It is allowed to call Notify multiple times with the same channel:
// each call expands the set of signals sent to that channel.
// The only way to remove signals from the set is to call Stop.
//
// It is allowed to call Notify multiple times with different channels
// and the same signals: each channel receives copies of incoming
// signals independently.
func Notify(c chan<- os.Signal, sig ...os.Signal) {
	if c == nil {
		panic("os/signal: Notify using nil channel")
	}

	handlers.Lock()
	defer handlers.Unlock()

	h := handlers.m[c]
	if h == nil {
		if handlers.m == nil {
			handlers.m = make(map[chan<- os.Signal]*handler)
		}
		h = new(handler)
		handlers.m[c] = h
	}

	add := func(n int) {
		if n < 0 {
			return
		}
		if !h.want(n) {
			h.set(n)
			if handlers.ref[n] == 0 {
				enableSignal(n)

				// The runtime requires that we enable a
				// signal before starting the watcher.
				watchSignalLoopOnce.Do(func() {
					if watchSignalLoop != nil {
						go watchSignalLoop()
					}
				})
			}
			handlers.ref[n]++
		}
	}

	if len(sig) == 0 {
		for n := 0; n < numSig; n++ {
			add(n)
		}
	} else {
		for _, s := range sig {
			add(signum(s))
		}
	}
}

注释中明确说明了需要传递带 buffer 的 channel。关注其中的 go watchSignalLoop(),在 signal_unix.go 中:

func loop() {
	for {
		process(syscall.Signal(signal_recv()))
	}
}

func init() {
	watchSignalLoop = loop
}

process(sig os.Signal) 函数定义又在 signal.go 中:

func process(sig os.Signal) {
	n := signum(sig)
	if n < 0 {
		return
	}

	handlers.Lock()
	defer handlers.Unlock()

	for c, h := range handlers.m {
		if h.want(n) {
			// send but do not block for it
			select {
			case c <- sig:
			default:
			}
		}
	}

	// Avoid the race mentioned in Stop.
	for _, d := range handlers.stopping {
		if d.h.want(n) {
			select {
			case d.c <- sig:
			default:
			}
		}
	}
}

注意中段的 select 代码块和注释,发现 sig 并不会阻塞发送给 c,如果 c 当前没有被 recv,则 sig 会被丢弃。这就造成了 sig 可能丢失的情况产生,也就是 golangci-lint 中提示的问题。


看 os.signal 的代码还是设计的相当精巧和高效的。

var handlers struct {
	sync.Mutex
	// Map a channel to the signals that should be sent to it.
	m map[chan<- os.Signal]*handler
	// Map a signal to the number of channels receiving it.
	ref [numSig]int64
	// Map channels to signals while the channel is being stopped.
	// Not a map because entries live here only very briefly.
	// We need a separate container because we need m to correspond to ref
	// at all times, and we also need to keep track of the *handler
	// value for a channel being stopped. See the Stop function.
	stopping []stopping
}

用一个 handlers 来存储关系。m 映射接收 channel 到相关 signal 的关系,ref 映射每一类 signal 有几个 channel 需要接收。其中 handler 结构体定义:

type handler struct {
	mask [(numSig + 31) / 32]uint32
}

func (h *handler) want(sig int) bool {
	return (h.mask[sig/32]>>uint(sig&31))&1 != 0
}

func (h *handler) set(sig int) {
	h.mask[sig/32] |= 1 << uint(sig&31)
}

func (h *handler) clear(sig int) {
	h.mask[sig/32] &^= 1 << uint(sig&31)
}

用三个长度的 uint32 来存储所有的 signal。每个 signal 占 1 个 bit 位。

还是不得不感叹大师级别的程序员写的东西,连一个字节都不舍得浪费。