golang 的服务调优大体需要从cpu, heap,gc入手。
cpu分析是从抽样程序里各个模块的运行效率,单位是ms
heap分析是分析抽样程序各个模块的内存损耗, 单位是 MB
gc分析是记录gc时的一些样本状态,比如gc的次数,gc时对象数量,gc后对象数量,堆大小等

代码

type JR struct {
Name string `json:"name"`
Data json.RawMessage `json:"data"`
}

func main() {
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to `file`")
var memprofile = flag.String("memprofile", "", "write memory profile to `file`")
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal("could not create CPU profile: ", err)
}
if err := pprof.StartCPUProfile(f); err != nil {
log.Fatal("could not start CPU profile: ", err)
}
defer pprof.StopCPUProfile()
}

// ... rest of the program ...
var cm = make(map[string]JR)
var tmp string
for i := 0; i < 100000; i++ {
tmp = fmt.Sprintf("user_%d", i)
cm[tmp] = JR{Name: tmp, Data: []byte(`{"kk":9}`)}
}
fmt.Println(len(cm))

if *memprofile != "" {
f, err := os.Create(*memprofile)
if err != nil {
log.Fatal("could not create memory profile: ", err)
}
runtime.GC() // get up-to-date statistics
if err := pprof.WriteHeapProfile(f); err != nil {
log.Fatal("could not write memory profile: ", err)
}
f.Close()
}
}

一.GC

执行
​​​GODEBUG=gctrace=1 go run main.go 2>xx.log​​​ ,标准输出流是​​GODEBUG=gctrace=1 go run main.go​​​,打印至控制台
会在同路径下生成 xx.log,内含gc信息

gc 1 @0.802s 0%: 0+0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc 2 @2.229s 0%: 0+0.97+0 ms clock, 0+0/0.97/1.9+0 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc 3 @3.462s 0%: 0+0+0.99 ms clock, 0+0/0/0+3.9 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc 4 @4.528s 0%: 0+1.0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 5 @5.179s 0%: 0+0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 6 @5.190s 0%: 0+1.0+1.0 ms clock, 0+1.0/1.0/2.0+4.0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 7 @5.268s 0%: 0+0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 8 @5.314s 0%: 0+0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 9 @5.440s 0%: 0+0.87+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
# command-line-arguments
gc 1 @0.005s 0%: 0+1.9+1.0 ms clock, 0+0.99/0.99/1.9+4.1 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc 2 @0.010s 0%: 0+2.0+0 ms clock, 0+0.99/0/3.0+0 ms cpu, 6->6->4 MB, 8 MB goal, 4 P
# command-line-arguments
gc 1 @0.002s 0%: 0+1.9+0 ms clock, 0+0/1.9/0+0 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc 2 @0.010s 0%: 0+2.9+0 ms clock, 0+0/1.9/0+0 ms cpu, 7->8->7 MB, 9 MB goal, 4 P
gc 3 @0.016s 0%: 0+5.0+0 ms clock, 0+0/5.0/0+0 ms cpu, 13->14->13 MB, 15 MB goal, 4 P
gc 4 @0.099s 0%: 0+5.8+0 ms clock, 0+5.8/4.8/0+0 ms cpu, 26->26->24 MB, 27 MB goal, 4 P
gc 5 @0.297s 0%: 0+13+0 ms clock, 0+1.9/13/6.9+0 ms cpu, 46->48->42 MB, 48 MB goal, 4 P
gc 1 @0.006s 0%: 0+1.9+0 ms clock, 0+0/1.9/0+0 ms cpu, 5->5->3 MB, 6 MB goal, 4 P
gc 2 @0.012s 0%: 0+1.9+1.0 ms clock, 0+0/0.99/2.9+4.0 ms cpu, 8->8->6 MB, 9 MB goal, 4 P
gc 3 @0.023s 0%: 0+6.9+0 ms clock, 0+0/6.9/0+0 ms cpu, 16->16->13 MB, 17 MB goal, 4 P

解释:

gc 1 @0.038s 1%: 0.55+0.12+0.081 ms clock, 2.2+0/0.42/1.1+0.32 ms cpu, 4->4->0 MB, 5 MB goal, 4 P。

1 表示第一次执行

@0.038s 表示程序执行的总时间

1% 垃圾回收时间占用总的运行时间百分比

0.018+1.3+0.076 ms clock 垃圾回收的时间,分别为STW(stop-the-world)清扫的时间, 并发标记和扫描的时间,STW标记的时间

0.054+0.35/1.0/3.0+0.23 ms cpu 垃圾回收占用cpu时间

4->4->3 MB 堆的大小,gc后堆的大小,存活堆的大小

5 MB goal 整体堆的大小

4 P 使用的处理器数量

二. cpu分析

执行
​​​go run main.go --cpuprofile=cpu.prof​​​ 会在当前路径下生成cpu.prof 文件,然后执行
​go tool pprof main.go cpu.prof​​ 进入cpu分析模块。
​top10​​ 消耗前10
​web​​ web展现
其它命令不介绍了。

flat  flat%   sum%        cum   cum%
10ms 100% 100% 10ms 100% runtime.cgocall
....
flat: 自相关参数,只受自己模块影响
cum: 联合参数, 是包括自己以及它调用的函数栈的总和.

三. heap分析

执行
​​​go run main.go --memprofile=mem.prof​​​​go tool pprof main.go mem.prof​​​​top10​​​​web​

所有效果和cpu类似,只是单位从ms 换成了MB

四. 逃逸分析

执行
​​​go build -gcflags '-m -l' main.go​​​ 或者 go build -gcflags ‘-m -m’ main.go
前者消除内联了,
​​​go build -gcflags '-m -l' main.go​​​ -l 一个,表示消除内敛
​​​go build -gcflags '-m -l -l' main.go​​​ -l 两个 ,表示内联级别比默认强
​​​go build -gcflags '-m -l' main.go​​​ -l 3个,强内敛,二进制包体积变大,但是不稳定,可能有bug
4就不讨论了,推荐用0个,默认,或者2个,高内敛

$ go build -gcflags '-m -l -l' main.go
# command-line-arguments
.\main.go:25:14: "could not create CPU profile: " escapes to heap
.\main.go:25:14: err escapes to heap
.\main.go:27:34: f escapes to heap
.\main.go:28:14: "could not start CPU profile: " escapes to heap
.\main.go:28:14: err escapes to heap
.\main.go:37:21: i escapes to heap
.\main.go:38:39: ([]byte)("{\"kk\":9}") escapes to heap
.\main.go:40:17: len(cm) escapes to heap
.\main.go:45:14: "could not create memory profile: " escapes to heap
.\main.go:45:14: err escapes to heap
.\main.go:48:35: f escapes to heap
.\main.go:49:14: "could not write memory profile: " escapes to heap
.\main.go:49:14: err escapes to heap
.\main.go:25:13: main ... argument does not escape
.\main.go:28:13: main ... argument does not escape
.\main.go:34:15: main make(map[string]JR) does not escape
.\main.go:37:20: main ... argument does not escape
.\main.go:40:13: main ... argument does not escape
.\main.go:45:13: main ... argument does not escape
.\main.go:49:13: main ... argument does not escape
<autogenerated>:1: os.(*File).close .this does not escape
<autogenerated>:1: os.(*File).isdir .this does not escape

先说几个结论:
1.魔术字符串 var a ="",会发生逃逸.
2.log和fmt会发生逃逸,返回指针类型的函数,会逃逸。
3.传递指针类型的变量不会逃逸

逃逸的后果,就是对象的寿命变长,对象数量居高不下,gc频率变高,stop the world变长。

以上, cpu,heap, gc, 逃逸, 就是我们需要优化的几个方向。
基于程序代码,做了以上优化以后,如果还遇到瓶颈,还有一些其它调优手段。
linux
​​​cat /proc/cpuinfo| grep "processor"| wc -l​​​ 查看逻辑核数,将一些代理如nginx,haproxy的worker设置成该数量
​​​ps aux|head -1;ps aux|grep -v PID|sort -rn -k +3|head​​ 查看高cpu,mem消耗的pid,关掉一下不必要的。