文章目录

  • 前言
  • 流量控制
  • 超过qps直接拒绝
  • 匀速请求(超速排队超时丢弃)
  • 请求预热/冷启动
  • 多种规则同时运行
  • 动态修改限流规则
  • 熔断降级
  • 错误数熔断
  • 错误比率熔断
  • 慢响应比率熔断

前言

目前比较主流的限流开源框架有Sentinel 与 Hystrix、resilience4j,其中 Sentinel 为阿里开源

微服务流量染色可以实现哪些功能_流量控制

本文主要使用 sentinel 和 go 语言实现限流,官方文档:

https://sentinelguard.io/zh-cn/docs/golang/quick-start.html

文档中讲解了从服务层面流量控制的概念,对应的不同的流控规则和流控策略,以及熔断降级的介绍和实现等等

文档讲解的非常详细,这里就不做搬砖的动作了,主要记录下使用go 结合 sentinel 实现流控的demo

流量控制

流控基本字段解释

  • Resource:资源名,即规则的作用目标。
  • TokenCalculateStrategy: 当前流量控制器的Token计算策略。Direct表示直接使用字段 Threshold 作为阈值;WarmUp表示使用预热方式计算Token的阈值。
  • ControlBehavior: 表示流量控制器的控制策略;Reject表示超过阈值直接拒绝,Throttling表示匀速排队。
  • Threshold: 表示流控阈值;如果字段 StatIntervalInMs 是1000(也就是1秒),那么Threshold就表示QPS,流量控制器也就会依据资源的QPS来做流控。
  • RelationStrategy: 调用关系限流策略,CurrentResource表示使用当前规则的resource做流控;AssociatedResource表示使用关联的resource做流控,关联的resource在字段 RefResource 定义;
  • RefResource: 关联的resource;
  • WarmUpPeriodSec: 预热的时间长度,该字段仅仅对 WarmUp 的TokenCalculateStrategy生效;
  • WarmUpColdFactor: 预热的因子,默认是3,该值的设置会影响预热的速度,该字段仅仅对 WarmUp 的TokenCalculateStrategy生效;
  • MaxQueueingTimeMs: 匀速排队的最大等待时间,该字段仅仅对 Throttling ControlBehavior生效;
  • StatIntervalInMs: 规则对应的流量控制器的独立统计结构的统计周期。如果StatIntervalInMs是1000,也就是统计QPS。

流控策略

Sentinel 的流量控制策略由规则中的 TokenCalculateStrategy 和 ControlBehavior 两个字段决定。TokenCalculateStrategy 表示流量控制器的Token计算方式,目前Sentinel支持两种:

  1. Direct表示直接使用规则中的 Threshold 表示当前统计周期内的最大Token数量。
  2. WarmUp表示通过预热的方式计算当前统计周期内的最大Token数量,预热的计算方式会根据规则中的字段 WarmUpPeriodSec 和 WarmUpColdFactor 来决定预热的曲线。

超过qps直接拒绝

  • 设置规则qps 最大为2,超出qps的请求直接丢弃
package main

import (
	"fmt"
	sentinel "github.com/alibaba/sentinel-golang/api"
	"github.com/alibaba/sentinel-golang/core/base"
	"github.com/alibaba/sentinel-golang/core/flow"
	"log"
)

func main() {
	//先初始化sentinel
	err := sentinel.InitDefault()
	if err != nil {
		log.Fatalf("初始化sentinel 异常: %v", err)
	}

	//配置限流规则
	_, err = flow.LoadRules([]*flow.Rule{
		{
			Resource:               "some-test",
			TokenCalculateStrategy: flow.Direct,
			ControlBehavior:        flow.Reject,
			Threshold:              2,
			StatIntervalInMs:       1000,
		},
	})

	if err != nil {
		log.Fatalf("加载规则失败: %v", err)
	}

	// 模仿流量请求
	ch := make(chan struct{})
	for i := 0; i < 10; i++ {
		go func() {
			// 使用流量控制的业务逻辑必须先实例化一个 sentinel
			// 这里是对资源 some-test 进行流量访问 qps <= 2 否则就会
			e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound))
			if b != nil {
				fmt.Println("限流了")
			} else {
				fmt.Println("检查通过")
				e.Exit()
			}
		}()
	}
	<-ch
}

输出结果:

检查通过
检查通过
限流了
限流了
检查通过
限流了
限流了
限流了
限流了
限流了

匀速请求(超速排队超时丢弃)

  • 需要设置一个请求数量阈值(Threshold)和请求时间段(StatIntervalInMs), sentinel 会算出请求的最小时间间隔,让请求以固定的时间间隔请求
  • 当下个请求在固定时间间隔内到达时,该请求将排队,在使用时将设置一个最大排队超时时间(MaxQueueingTimeMs),当请求排队超过该时间时丢弃掉

这种方式主要用于处理间隔性突发的流量,例如消息队列。想象一下这样的场景,在某一秒有大量的请求到来,而接下来的几秒则处于空闲状态,我们希望系统能够在接下来的空闲期间逐渐处理这些请求,而不是在第一秒直接拒绝多余的请求。

以下规则代表每 100ms 最多通过一个请求,多余的请求将会排队等待通过,若排队时队列长度大于 500ms 则直接拒绝:

{
	Resource:          "some-test",
    TokenCalculateStrategy: flow.Direct,
	ControlBehavior:   flow.Throttling, // 流控效果为匀速排队
    Threshold:         10, // 请求的间隔控制在 1000/10=100 ms
	MaxQueueingTimeMs: 500, // 最长排队等待时间
}

上面 Threshold 是 10,Sentinel 默认使用1s作为控制周期,表示1秒内10个请求匀速排队,所以排队时间就是 1000ms/10 = 100ms;

特别地,MaxQueueingTimeMs 设为 0 时代表不允许排队,只控制请求时间间隔,多余的请求将会直接拒绝。

package main

import (
	"fmt"
	sentinel "github.com/alibaba/sentinel-golang/api"
	"github.com/alibaba/sentinel-golang/core/base"
	"github.com/alibaba/sentinel-golang/core/flow"
	"log"
)

func main() {
	//先初始化sentinel
	err := sentinel.InitDefault()
	if err != nil {
		log.Fatalf("初始化sentinel 异常: %v", err)
	}

	//配置限流规则
	_, err = flow.LoadRules([]*flow.Rule{
		{
			Resource:               "some-test",
			TokenCalculateStrategy: flow.Direct,
			ControlBehavior:        flow.Throttling,
			Threshold:              10, // 请求的间隔控制在 1000/10=100 ms
			StatIntervalInMs:       1000,
			MaxQueueingTimeMs:      500, // 最长排队等待时间
		},
	})

	if err != nil {
		log.Fatalf("加载规则失败: %v", err)
	}

	// 模仿流量请求
	ch := make(chan struct{})
	for i := 0; i < 10; i++ {
		go func() {
			// 使用流量控制的业务逻辑必须先实例化一个 sentinel
			// 这里是对资源 some-test 进行流量访问控制
			e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound))
			if b != nil {
				fmt.Println("限流了")
			} else {
				fmt.Println("检查通过")
				e.Exit()
			}
		}()
	}
	<-ch
}

输出结果:

检查通过
限流了
限流了
限流了
限流了
检查通过
检查通过
检查通过
检查通过
检查通过

将排队最大超时时间改为5000 ms:

_, err = flow.LoadRules([]*flow.Rule{
		{
			Resource:               "some-test",
			TokenCalculateStrategy: flow.Direct,
			ControlBehavior:        flow.Throttling,
			Threshold:              10, // 请求的间隔控制在 1000/10=100 ms
			StatIntervalInMs:       1000,
			MaxQueueingTimeMs:      5000, // 最长排队等待时间
		},
	})

输出结果,将发现所有请求都通过了:

检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过

请求预热/冷启动

  • 需要将流量控制策略设置为 WarmUp, 通过"冷启动",让通过的流量缓慢增加,在一定时间内逐渐增加到阈值上限,给冷系统一个预热的时间,避免冷系统被压垮
  • 需要设置一个预热时长(warmUpPeriodSec)代表期待系统进入稳定状态的时间

和预热的因子(WarmUpColdFactor),默认 coldFactor 为 3,即请求 QPS 从 threshold / 3 开始,经预热时长逐渐升至设定的 QPS 阈值。

微服务流量染色可以实现哪些功能_微服务流量染色可以实现哪些功能_02

冷启动10s,流控策略为Reject

package main

import (
	"fmt"
	sentinel "github.com/alibaba/sentinel-golang/api"
	"github.com/alibaba/sentinel-golang/core/base"
	"github.com/alibaba/sentinel-golang/core/flow"
	"github.com/alibaba/sentinel-golang/util"
	"log"
	"math/rand"
	"sync/atomic"
	"time"
)

type Counter struct {
	pass  *int64
	block *int64
	total *int64
}

var routineCount = 30

func main() {
	//先初始化sentinel
	err := sentinel.InitDefault()
	if err != nil {
		log.Fatalf("初始化sentinel 异常: %v", err)
	}

	//配置限流规则
	_, err = flow.LoadRules([]*flow.Rule{
		{
			Resource:               "some-test",
			TokenCalculateStrategy: flow.WarmUp,
			ControlBehavior:        flow.Reject,
			WarmUpPeriodSec:        10, // 冷启动 10 s, 之后请求处于稳定状态
			WarmUpColdFactor:       3,  // 预热因子,算出预热趋于稳定的趋势
			Threshold:              100,	// 设置qps 为100
			StatIntervalInMs:       1000,
		},
	})

	if err != nil {
		log.Fatalf("加载规则失败: %v", err)
	}

	counter := Counter{pass: new(int64), block: new(int64), total: new(int64)}
	go timerTask(&counter)

	// 模仿流量请求
	ch := make(chan struct{})
	//warmUp task
	for i := 0; i < 3; i++ {
		go Task(&counter)
	}
	time.Sleep(3 * time.Second)
	//sentinel task
	for i := 0; i < routineCount; i++ {
		go Task(&counter)
	}
	<-ch
}

func Task(counter *Counter) {
	for {
		atomic.AddInt64(counter.total, 1)
		e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound))
		if b != nil {
			atomic.AddInt64(counter.block, 1)
		} else {
			// Be sure the entry is exited finally.
			e.Exit()
			atomic.AddInt64(counter.pass, 1)
		}
		time.Sleep(time.Duration(rand.Uint64()%50) * time.Millisecond)
	}
}

func timerTask(counter *Counter) {
	fmt.Println("begin to statistic!!!")
	var (
		oldTotal, oldPass, oldBlock int64
	)
	for {
		time.Sleep(1 * time.Second)
		globalTotal := atomic.LoadInt64(counter.total)
		oneSecondTotal := globalTotal - oldTotal
		oldTotal = globalTotal

		globalPass := atomic.LoadInt64(counter.pass)
		oneSecondPass := globalPass - oldPass
		oldPass = globalPass

		globalBlock := atomic.LoadInt64(counter.block)
		oneSecondBlock := globalBlock - oldBlock
		oldBlock = globalBlock
		fmt.Println(util.CurrentTimeMillis()/1000, "total:", oneSecondTotal, " pass:", oneSecondPass, " block:", oneSecondBlock)
	}
}

输出结果,发现前10s 通过的请求逐渐增加至指定的qps,之后趋于稳定:

1661851798 total: 121  pass: 54  block: 67
1661851799 total: 124  pass: 45  block: 79
1661851800 total: 145  pass: 42  block: 103
1661851801 total: 1314  pass: 41  block: 1273
1661851802 total: 1327  pass: 44  block: 1283
1661851803 total: 1342  pass: 47  block: 1295
1661851804 total: 1324  pass: 52  block: 1272
1661851805 total: 1322  pass: 58  block: 1264
1661851806 total: 1355  pass: 68  block: 1287
1661851807 total: 1322  pass: 83  block: 1239
1661851808 total: 1288  pass: 100  block: 1188
1661851809 total: 1315  pass: 100  block: 1215
1661851810 total: 1341  pass: 100  block: 1241
1661851811 total: 1347  pass: 100  block: 1246

冷启动10s,流控策略为Throttling

  • 当为匀速请求时,需要填写最长排队等待时间,qps 为100 最长排队时间为5s 的冷启动
_, err = flow.LoadRules([]*flow.Rule{
		{
			Resource:               "some-test",
			TokenCalculateStrategy: flow.WarmUp,
			ControlBehavior:        flow.Throttling,
			WarmUpPeriodSec:        10,  // 冷启动 10 s, 之后请求处于稳定状态
			WarmUpColdFactor:       3,   // 预热因子,算出预热趋于稳定的趋势
			Threshold:              100, // 设置qps 为100
			StatIntervalInMs:       1000,
			MaxQueueingTimeMs:      5000, // 最长排队等待时间
		},
	})

输出结果发现没有被丢弃掉的请求:

1661855369 total: 37  pass: 34  block: 0
1661855370 total: 34  pass: 35  block: 0
1661855371 total: 67  pass: 36  block: 0
1661855372 total: 37  pass: 37  block: 0
1661855373 total: 38  pass: 40  block: 0
1661855374 total: 42  pass: 42  block: 0
1661855375 total: 46  pass: 45  block: 0
1661855376 total: 49  pass: 49  block: 0
1661855377 total: 52  pass: 54  block: 0
1661855378 total: 63  pass: 62  block: 0
1661855379 total: 74  pass: 73  block: 0
1661855380 total: 92  pass: 91  block: 0
1661855381 total: 98  pass: 100  block: 0
1661855382 total: 101  pass: 101  block: 0
1661855383 total: 98  pass: 100  block: 0
1661855384 total: 102  pass: 100  block: 0
1661855385 total: 99  pass: 100  block: 0
1661855386 total: 102  pass: 100  block: 0
1661855387 total: 98  pass: 100  block: 0
1661855388 total: 100  pass: 100  block: 0

调大预热因子

  • 调节预热因子至50:WarmUpColdFactor: 50

当预热因子越大时,预热时间内可以通过的请求的数量越少:

1661855000 total: 121  pass: 4  block: 117
1661855001 total: 124  pass: 2  block: 122
1661855002 total: 143  pass: 2  block: 141
1661855003 total: 1315  pass: 2  block: 1313
1661855004 total: 1327  pass: 2  block: 1324
1661855005 total: 1344  pass: 2  block: 1343
1661855006 total: 1330  pass: 4  block: 1326
1661855007 total: 1319  pass: 3  block: 1316
1661855008 total: 1353  pass: 4  block: 1347
1661855009 total: 1325  pass: 5  block: 1321
1661855010 total: 1287  pass: 8  block: 1280
1661855011 total: 1315  pass: 44  block: 1271
1661855012 total: 1340  pass: 101  block: 1237
1661855013 total: 1349  pass: 100  block: 1251
1661855014 total: 1320  pass: 100  block: 1220
1661855015 total: 1345  pass: 100  block: 1245
1661855016 total: 1329  pass: 100  block: 1229

多种规则同时运行

  • 在配置限流规则的时候可以同时配置多条规则,对应不同的资源使用不同的限流策略
_, err = flow.LoadRules([]*flow.Rule{
		{
			Resource:               "some-test",
			TokenCalculateStrategy: flow.Direct,
			ControlBehavior:        flow.Throttling, //匀速通过
			Threshold:              100,             //100ms只能就已经来了1W的并发, 1s就是10W的并发
			StatIntervalInMs:       1000,
		},

		{
			Resource:               "some-test2",
			TokenCalculateStrategy: flow.Direct,
			ControlBehavior:        flow.Reject, //直接拒绝
			Threshold:              10,
			StatIntervalInMs:       1000,
		},
	})

然后在对不同资源进行请求的时候,都需要实例化对应的sentinel:

for i := 0; i < 12; i++ {
    e, b := sentinel.Entry("some-test", sentinel.WithTrafficType(base.Inbound))
    if b != nil {
        fmt.Println("限流了")
    } else {
        fmt.Println("检查通过")
        e.Exit()
    }
    time.Sleep(11 * time.Millisecond)
}
for i := 0; i < 12; i++ {
    e2, b2 := sentinel.Entry("some-test2", sentinel.WithTrafficType(base.Inbound))
    if b2 != nil {
        fmt.Println("限流了")
    } else {
        fmt.Println("检查通过")
        e2.Exit()
    }
    time.Sleep(11 * time.Millisecond)
}

动态修改限流规则

  • 在一个goroutine 中启动一个新的限流规则,2s 后生效
package main

import (
	"fmt"
	"log"
	"math/rand"
	"time"

	sentinel "github.com/alibaba/sentinel-golang/api"
	"github.com/alibaba/sentinel-golang/core/base"
	"github.com/alibaba/sentinel-golang/core/config"
	"github.com/alibaba/sentinel-golang/core/flow"
	"github.com/alibaba/sentinel-golang/logging"
)

const resName = "example-flow-qps-resource"

func main() {
	// We should initialize Sentinel first.
	conf := config.NewDefaultConfig()
	// for testing, logging output to console
	conf.Sentinel.Log.Logger = logging.NewConsoleLogger()
	err := sentinel.InitWithConfig(conf)
	if err != nil {
		log.Fatal(err)
	}

	_, err = flow.LoadRules([]*flow.Rule{
		{
			Resource:               resName,
			TokenCalculateStrategy: flow.Direct,
			ControlBehavior:        flow.Reject,
			Threshold:              10,
			StatIntervalInMs:       1000,
		},
	})
	if err != nil {
		log.Fatalf("Unexpected error: %+v", err)
		return
	}

	ch := make(chan struct{})
	for i := 0; i < 3; i++ {
		go func() {
			for {
				e, b := sentinel.Entry(resName, sentinel.WithTrafficType(base.Inbound))
				if b != nil {
					// Blocked. We could get the block reason from the BlockError.
					fmt.Println("限流了")
					time.Sleep(time.Duration(rand.Uint64()%10) * time.Millisecond)
				} else {
					// Passed, wrap the logic here.
					time.Sleep(time.Duration(rand.Uint64()%10) * time.Millisecond)
					fmt.Println("检查通过")
					// Be sure the entry is exited finally.
					e.Exit()
				}
				time.Sleep(100 * time.Millisecond)
			}
		}()
	}

	// Simulate a scenario in which flow rules are updated concurrently
	go func() {
		time.Sleep(time.Second * 2)
		_, err = flow.LoadRules([]*flow.Rule{
			{
				Resource:               resName,
				TokenCalculateStrategy: flow.Direct,
				ControlBehavior:        flow.Reject,
				Threshold:              80,
				StatIntervalInMs:       1000,
			},
		})
		if err != nil {
			log.Fatalf("Unexpected error: %+v", err)
			return
		}
	}()
	<-ch
}

输出结果,发现一开始qps为10的时候有部分请求被拒绝了,2s 后qps 变为了80,所有请求都被可以通过了:

{"timestamp":"2022-08-31 08:07:08.8310","caller":"config.go:78","logLevel":"INFO","msg":"[Config] Print effective global config","globalConfig":{"Version":"v1","Sentinel":{"App":{"Name":"unknown_go_service","Type":0},"Exporter":{"Metric":{"HttpAddr":"","HttpPath":""}},"Log":{"Logger":{},"Dir":"/Users/zhouzhiyong/logs/csp","UsePid":false,"Metric":{"SingleFileMaxSize":52428800,"MaxFileCount":8,"FlushIntervalSec":1}},"Stat":{"GlobalStatisticSampleCountTotal":20,"GlobalStatisticIntervalMsTotal":10000,"MetricStatisticSampleCount":2,"MetricStatisticIntervalMs":1000,"System":{"CollectIntervalMs":1000,"CollectLoadIntervalMs":1000,"CollectCpuIntervalMs":1000,"CollectMemoryIntervalMs":150}},"UseCacheTime":false}}}
{"timestamp":"2022-08-31 08:07:08.8310","caller":"writer.go:189","logLevel":"INFO","msg":"[MetricWriter] Metric log file removed in DefaultMetricLogWriter.removeDeprecatedFiles()","filename":"/Users/zhouzhiyong/logs/csp/unknown_go_service-metrics.log.2022-08-30.34"}
{"timestamp":"2022-08-31 08:07:08.8310","caller":"writer.go:196","logLevel":"INFO","msg":"[MetricWriter] Metric index file removed","idxFilename":"/Users/zhouzhiyong/logs/csp/unknown_go_service-metrics.log.2022-08-30.34.idx"}
{"timestamp":"2022-08-31 08:07:08.8310","caller":"writer.go:247","logLevel":"INFO","msg":"[MetricWriter] New metric log file created","filename":"/Users/zhouzhiyong/logs/csp/unknown_go_service-metrics.log.2022-08-31.7"}
{"timestamp":"2022-08-31 08:07:08.8310","caller":"writer.go:254","logLevel":"INFO","msg":"[MetricWriter] New metric log index file created","idxFile":"/Users/zhouzhiyong/logs/csp/unknown_go_service-metrics.log.2022-08-31.7.idx"}
{"timestamp":"2022-08-31 08:07:08.8310","caller":"rule_manager.go:173","logLevel":"INFO","msg":"[FlowRuleManager] Flow rules were loaded","rules":[{"resource":"example-flow-qps-resource","tokenCalculateStrategy":0,"controlBehavior":0,"threshold":10,"relationStrategy":0,"refResource":"","maxQueueingTimeMs":0,"warmUpPeriodSec":0,"warmUpColdFactor":0,"statIntervalInMs":1000,"lowMemUsageThreshold":0,"highMemUsageThreshold":0,"memLowWaterMarkBytes":0,"memHighWaterMarkBytes":0}]}
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
限流了
限流了
检查通过
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
限流了
限流了
限流了
限流了
限流了
限流了
检查通过
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
限流了
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
{"timestamp":"2022-08-31 08:07:10.10310","caller":"rule_manager.go:173","logLevel":"INFO","msg":"[FlowRuleManager] Flow rules were loaded","rules":[{"resource":"example-flow-qps-resource","tokenCalculateStrategy":0,"controlBehavior":0,"threshold":80,"relationStrategy":0,"refResource":"","maxQueueingTimeMs":0,"warmUpPeriodSec":0,"warmUpColdFactor":0,"statIntervalInMs":1000,"lowMemUsageThreshold":0,"highMemUsageThreshold":0,"memLowWaterMarkBytes":0,"memHighWaterMarkBytes":0}]}
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过
检查通过

熔断降级

流量控制是为了针对别人访问自己时设置的保护策略,防止自己由于资源不足等情况,对方不断重试,导致服务服务雪崩

熔断降级就是针对自己访问别人时设置的保护策略 ,防止别人由于资源不足等情况,自己不断重试,导致服务雪崩

在使用熔断降级前需要了解熔断的基本知识

熔断器模型

微服务流量染色可以实现哪些功能_限流_03

熔断器有三种状态:

  1. Closed:也是初始状态,该状态下,熔断器会保持闭合,对资源的访问直接通过熔断器的检查
  2. Open:断开状态,熔断器处于开启状态,对资源的访问会被切断
  3. Half-Open:半开状态,该状态下除了探测流量,其余对资源的访问也会被切断。探测流量指熔断器处于半开状态时,会周期性的允许一定数目的探测请求通过,如果探测请求能够正常的返回,代表探测成功,此时熔断器会重置状态到 Closed 状态,结束熔断;如果探测失败,则回滚到 Open 状态。

三种状态之间的转换关系:

  1. 初始状态下,熔断器处于 Closed 状态。如果基于熔断器的统计数据表明当前资源触发了设定的阈值,那么熔断器会切换状态到 Open 状态;
  2. Open 状态即代表熔断状态,所有请求都会直接被拒绝。熔断器规则中会配置一个熔断超时重试的时间,经过熔断超时重试时长后熔断器会将状态置为 Half-Open 状态,从而进行探测机制;
  3. 处于 Half-Open 状态的熔断器会周期性去做探测。

熔断器的设计

  1. 基于熔断器的状态机来判断对资源是否可以访问;
  2. 对不可访问的资源会有探测机制,探测机制保障了对资源访问的弹性恢复;
  3. 熔断器会在对资源访问的完成态去更新统计,然后基于熔断规则更新熔断器状态机。

熔断策略

  • 静默期: 指一个最小的静默请求数,在一个统计周期内,对于这些请求不进行熔断

比如如果我们的熔断策略是使用的慢调用,刚好第一个请求就是满请求,这个时候慢调用比例就是100%,不合理,所以静默期提高了熔断的准确性

  • 三种熔断策略:
  • 慢调用比例策略 (SlowRequestRatio): 在非静默期内,当响应慢的请求(需要设置RT响应临界值)达到一定的比例并且触发了阈值(需要设置)的条件,就会熔断
  • 错误比例策略 (ErrorRatio):在非静默期内,并且在统计周期内资源请求访问异常的比例大于设定的阈值,则接下来的熔断周期内对资源的访问会自动地被熔断。
  • 错误计数策略 (ErrorCount):在非静默期内,并且在统计周期内资源请求访问异常数大于设定的阈值,则接下来的熔断周期内对资源的访问会自动地被熔断。

错误数熔断

  • 当对资源访问失败的时候必须向sentinel 发送一个错误追踪
sentinel.TraceError(e, errors.New("biz error"))
  • Sentinel 提供了监听器去监听熔断器状态机的三种状态的转换,方便用户去自定义扩展:
// StateChangeListener listens on the circuit breaker state change event.
type StateChangeListener interface {
        // 熔断器切换到 Closed 状态时候会调用改函数, prev代表切换前的状态,rule表示当前熔断器对应的规则
	OnTransformToClosed(prev State, rule Rule)
        // 熔断器切换到 Open 状态时候会调用改函数, prev代表切换前的状态,rule表示当前熔断器对应的规则, snapshot表示触发熔断的值
	OnTransformToOpen(prev State, rule Rule, snapshot interface{})
        // 熔断器切换到 HalfOpen 状态时候会调用改函数, prev代表切换前的状态,rule表示当前熔断器对应的规则
	OnTransformToHalfOpen(prev State, rule Rule)
}

通过上面的三个 hook 函数,用户可以很容易拿到熔断器每次状态切换的事件,以及熔断器对应的 Rule。

Note 1: 这里需要注意的是,监听器 hook 里面携带的规则是基于 copy 的,也就是用户在监听器里面更改 Rule 不会影响到熔断器。此外这里基于拷贝是有一定性能开销的,用户要尽可能减少无效的监听器注册。

Note 2: 熔断器监听器的注册和清除是非线程安全的,用户必须要在服务启动时配置 Sentinel 时候就注册对应的监听器,应用运行中禁止更改熔断器状态机的监听器。

模拟10s 内错误数量达到10的时候熔断

package main

import (
	"errors"
	"fmt"
	"log"
	"math/rand"
	"time"

	sentinel "github.com/alibaba/sentinel-golang/api"
	"github.com/alibaba/sentinel-golang/core/circuitbreaker"
	"github.com/alibaba/sentinel-golang/core/config"
	"github.com/alibaba/sentinel-golang/logging"
	"github.com/alibaba/sentinel-golang/util"
)

type stateChangeTestListener struct {
}

func (s *stateChangeTestListener) OnTransformToClosed(prev circuitbreaker.State, rule circuitbreaker.Rule) {
	fmt.Printf("rule.steategy: %+v, From %s to Closed, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}

func (s *stateChangeTestListener) OnTransformToOpen(prev circuitbreaker.State, rule circuitbreaker.Rule, snapshot interface{}) {
	fmt.Printf("rule.steategy: %+v, From %s to Open, snapshot: %d, time: %d\n", rule.Strategy, prev.String(), snapshot, util.CurrentTimeMillis())
}

func (s *stateChangeTestListener) OnTransformToHalfOpen(prev circuitbreaker.State, rule circuitbreaker.Rule) {
	fmt.Printf("rule.steategy: %+v, From %s to Half-Open, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}

func main() {
	conf := config.NewDefaultConfig()
	// for testing, logging output to console
	conf.Sentinel.Log.Logger = logging.NewConsoleLogger()
	err := sentinel.InitWithConfig(conf)
	if err != nil {
		log.Fatal(err)
	}
	ch := make(chan struct{})
	// Register a state change listener so that we could observer the state change of the internal circuit breaker.
	circuitbreaker.RegisterStateChangeListeners(&stateChangeTestListener{})

	_, err = circuitbreaker.LoadRules([]*circuitbreaker.Rule{
		// Statistic time span=5s, recoveryTimeout=3s, maxErrorCount=50
		{
			Resource: "abc",
			// 设置通断策略为错误数
			Strategy: circuitbreaker.ErrorCount,
			// 熔断触发后持续的时间(单位为 ms)
			RetryTimeoutMs: 1000,
			// 静默数量,对资源的访问小于静默数,熔断器处于静默状态
			MinRequestAmount: 3,
			// 熔断器的统计周期,单位是毫秒, 一般情况下设置10秒左右都OK
			StatIntervalMs: 10000,
			// 熔断器的统计周期内,统计滑动窗口的桶数,默认为 1
			// 随着桶数的增加,统计数据会更精确,但内存消耗也会增加
			// 以下必须为真- " StatIntervalMs % StatSlidingWindowBucketCount == 0 ",
			// 否则 StatSlidingWindowBucketCount 将被替换为1。
			StatSlidingWindowBucketCount: 10,
			// 错误数的阈值
			Threshold: 10,
		},
	})
	if err != nil {
		log.Fatal(err)
	}

	logging.Info("[CircuitBreaker ErrorCount] Sentinel Go circuit breaking demo is running. You may see the pass/block metric in the metric log.")
	go func() {
		for {
			e, b := sentinel.Entry("abc")
			if b != nil {
				// g1 blocked
				//fmt.Println("熔断")
				time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
			} else {
				//fmt.Println("通过")
				// 模拟访问资源错误逻辑
				if rand.Uint64()%20 > 9 {
					fmt.Println("访问资源失败")
					// Record current invocation as error.
					sentinel.TraceError(e, errors.New("biz error"))
				}
				// g1 passed
				time.Sleep(time.Duration(rand.Uint64()%80+10) * time.Millisecond)
				e.Exit()
			}
		}
	}()
	go func() {
		for {
			e, b := sentinel.Entry("abc")
			if b != nil {
				//fmt.Println("熔断")
				// g2 blocked
				time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
			} else {
				//fmt.Println("通过")
				// g2 passed
				time.Sleep(time.Duration(rand.Uint64()%80) * time.Millisecond)
				e.Exit()
			}
		}
	}()
	<-ch
}

输出结果:

访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
rule.steategy: ErrorCount, From Closed to Open, snapshot: 10, time: 1661908499422
rule.steategy: ErrorCount, From Open to Half-Open, time: 1661908500426
rule.steategy: ErrorCount, From HalfOpen to Closed, time: 1661908500453
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
rule.steategy: ErrorCount, From Closed to Open, snapshot: 10, time: 1661908502326
rule.steategy: ErrorCount, From Open to Half-Open, time: 1661908503330
rule.steategy: ErrorCount, From HalfOpen to Closed, time: 1661908503381
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
rule.steategy: ErrorCount, From Closed to Open, snapshot: 10, time: 1661908504531
rule.steategy: ErrorCount, From Open to Half-Open, time: 1661908505534
rule.steategy: ErrorCount, From HalfOpen to Closed, time: 1661908505593
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
访问资源失败
rule.steategy: ErrorCount, From Closed to Open, snapshot: 10, time: 1661908506462

错误比率熔断

package main

import (
	"errors"
	"fmt"
	"log"
	"math/rand"
	"time"

	sentinel "github.com/alibaba/sentinel-golang/api"
	"github.com/alibaba/sentinel-golang/core/circuitbreaker"
	"github.com/alibaba/sentinel-golang/core/config"
	"github.com/alibaba/sentinel-golang/logging"
	"github.com/alibaba/sentinel-golang/util"
)

type stateChangeTestListener struct {
}

func (s *stateChangeTestListener) OnTransformToClosed(prev circuitbreaker.State, rule circuitbreaker.Rule) {
	fmt.Printf("rule.steategy: %+v, From %s to Closed, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}

func (s *stateChangeTestListener) OnTransformToOpen(prev circuitbreaker.State, rule circuitbreaker.Rule, snapshot interface{}) {
	fmt.Printf("rule.steategy: %+v, From %s to Open, snapshot: %.2f, time: %d\n", rule.Strategy, prev.String(), snapshot, util.CurrentTimeMillis())
}

func (s *stateChangeTestListener) OnTransformToHalfOpen(prev circuitbreaker.State, rule circuitbreaker.Rule) {
	fmt.Printf("rule.steategy: %+v, From %s to Half-Open, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}

func main() {
	conf := config.NewDefaultConfig()
	// for testing, logging output to console
	conf.Sentinel.Log.Logger = logging.NewConsoleLogger()
	err := sentinel.InitWithConfig(conf)
	if err != nil {
		log.Fatal(err)
	}
	ch := make(chan struct{})
	// Register a state change listener so that we could observer the state change of the internal circuit breaker.
	circuitbreaker.RegisterStateChangeListeners(&stateChangeTestListener{})

	_, err = circuitbreaker.LoadRules([]*circuitbreaker.Rule{
		// Statistic time span=5s, recoveryTimeout=3s, maxErrorRatio=40%
		{
			Resource: "abc",
			// 设置通断策略为错误比例
			Strategy: circuitbreaker.ErrorRatio,
			// 熔断触发后持续的时间(单位为 ms)
			RetryTimeoutMs: 3000,
			// 静默数量,对资源的访问小于静默数,熔断器处于静默状态
			MinRequestAmount: 10,
			// 熔断器的统计周期,单位是毫秒, 一般情况下设置10秒左右都OK
			StatIntervalMs: 10000,
			// 熔断器的统计周期内,统计滑动窗口的桶数,默认为 1
			// 随着桶数的增加,统计数据会更精确,但内存消耗也会增加
			// 以下必须为真- " StatIntervalMs % StatSlidingWindowBucketCount == 0 ",
			// 否则 StatSlidingWindowBucketCount 将被替换为1。
			StatSlidingWindowBucketCount: 10,
			// 错误比例的阈值 20.1%
			Threshold: 0.201,
		},
	})
	if err != nil {
		log.Fatal(err)
	}

	logging.Info("[CircuitBreaker ErrorRatio] Sentinel Go circuit breaking demo is running. You may see the pass/block metric in the metric log.")
	go func() {
		for {
			e, b := sentinel.Entry("abc")
			if b != nil {
				// g1 blocked
				time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
			} else {
				if rand.Uint64()%20 > 12 {
					// Record current invocation as error.
					// 模拟访问资源错误逻辑
					//fmt.Println("访问资源失败")
					sentinel.TraceError(e, errors.New("biz error"))
				}
				// g1 passed
				time.Sleep(time.Duration(rand.Uint64()%80+20) * time.Millisecond)
				e.Exit()
			}
		}
	}()
	go func() {
		for {
			e, b := sentinel.Entry("abc")
			if b != nil {
				// g2 blocked
				time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
			} else {
				// g2 passed
				time.Sleep(time.Duration(rand.Uint64()%80+40) * time.Millisecond)
				e.Exit()
			}
		}
	}()
	<-ch
}

输出结果:

rule.steategy: ErrorRatio, From Open to Half-Open, time: 1661910601762
rule.steategy: ErrorRatio, From Closed to Open, snapshot: 0.27, time: 1661910601762
rule.steategy: ErrorRatio, From HalfOpen to Closed, time: 1661910601873
rule.steategy: ErrorRatio, From Closed to Open, snapshot: 0.23, time: 1661910602414
rule.steategy: ErrorRatio, From Open to Half-Open, time: 1661910605428
rule.steategy: ErrorRatio, From HalfOpen to Closed, time: 1661910605470
rule.steategy: ErrorRatio, From Closed to Open, snapshot: 0.23, time: 1661910606195
rule.steategy: ErrorRatio, From Open to Half-Open, time: 1661910609195
rule.steategy: ErrorRatio, From HalfOpen to Closed, time: 1661910609298
rule.steategy: ErrorRatio, From Closed to Open, snapshot: 0.25, time: 1661910609707
rule.steategy: ErrorRatio, From Open to Half-Open, time: 1661910612709
rule.steategy: ErrorRatio, From HalfOpen to Closed, time: 1661910612764
rule.steategy: ErrorRatio, From Closed to Open, snapshot: 0.30, time: 1661910613089
rule.steategy: ErrorRatio, From Open to Half-Open, time: 1661910616091
rule.steategy: ErrorRatio, From HalfOpen to Closed, time: 1661910616159
rule.steategy: ErrorRatio, From Closed to Open, snapshot: 0.30, time: 1661910616541
rule.steategy: ErrorRatio, From Open to Half-Open, time: 1661910619547
rule.steategy: ErrorRatio, From HalfOpen to Closed, time: 1661910619569
rule.steategy: ErrorRatio, From Closed to Open, snapshot: 0.21, time: 1661910620221

慢响应比率熔断

package main

import (
	"errors"
	"fmt"
	"log"
	"math/rand"
	"time"

	sentinel "github.com/alibaba/sentinel-golang/api"
	"github.com/alibaba/sentinel-golang/core/circuitbreaker"
	"github.com/alibaba/sentinel-golang/core/config"
	"github.com/alibaba/sentinel-golang/logging"
	"github.com/alibaba/sentinel-golang/util"
)

type stateChangeTestListener struct {
}

func (s *stateChangeTestListener) OnTransformToClosed(prev circuitbreaker.State, rule circuitbreaker.Rule) {
	fmt.Printf("rule.steategy: %+v, From %s to Closed, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}

func (s *stateChangeTestListener) OnTransformToOpen(prev circuitbreaker.State, rule circuitbreaker.Rule, snapshot interface{}) {
	fmt.Printf("rule.steategy: %+v, From %s to Open, snapshot: %.2f, time: %d\n", rule.Strategy, prev.String(), snapshot, util.CurrentTimeMillis())
}

func (s *stateChangeTestListener) OnTransformToHalfOpen(prev circuitbreaker.State, rule circuitbreaker.Rule) {
	fmt.Printf("rule.steategy: %+v, From %s to Half-Open, time: %d\n", rule.Strategy, prev.String(), util.CurrentTimeMillis())
}

func main() {
	conf := config.NewDefaultConfig()
	// for testing, logging output to console
	conf.Sentinel.Log.Logger = logging.NewConsoleLogger()
	err := sentinel.InitWithConfig(conf)
	if err != nil {
		log.Fatal(err)
	}
	ch := make(chan struct{})
	// Register a state change listener so that we could observer the state change of the internal circuit breaker.
	circuitbreaker.RegisterStateChangeListeners(&stateChangeTestListener{})

	_, err = circuitbreaker.LoadRules([]*circuitbreaker.Rule{
		// Statistic time span=10s, recoveryTimeout=3s, slowRtUpperBound=50ms, maxSlowRequestRatio=50%
		{
			Resource: "abc",
			// 设置通断策略为慢调用
			Strategy: circuitbreaker.SlowRequestRatio,
			// 熔断触发后持续的时间(单位为 ms)
			RetryTimeoutMs: 3000,
			// 静默数量,对资源的访问小于静默数,熔断器处于静默状态
			MinRequestAmount: 10,
			// 熔断器的统计周期,单位是毫秒, 一般情况下设置10秒左右都OK
			StatIntervalMs: 5000,
			// 慢调用响应时间阈值,RT大于该值的请求判断为慢响应
			MaxAllowedRtMs: 50,
			// 慢调用比例的阈值(小数表示,比如0.1表示10%)
			Threshold: 0.5,
		},
	})
	if err != nil {
		log.Fatal(err)
	}

	logging.Info("[CircuitBreaker SlowRtRatio] Sentinel Go circuit breaking demo is running. You may see the pass/block metric in the metric log.")
	go func() {
		for {
			e, b := sentinel.Entry("abc")
			if b != nil {
				// g1 blocked
				time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
			} else {
				if rand.Uint64()%20 > 9 {
					// Record current invocation as error.
					sentinel.TraceError(e, errors.New("biz error"))
				}
				// g1 passed
                // 这里sleep模拟随机慢响应,随机值在10——90ms内
				time.Sleep(time.Duration(rand.Uint64()%80+10) * time.Millisecond)
				e.Exit()
			}
		}
	}()
	go func() {
		for {
			e, b := sentinel.Entry("abc")
			if b != nil {
				// g2 blocked
				time.Sleep(time.Duration(rand.Uint64()%20) * time.Millisecond)
			} else {
				// g2 passed
				time.Sleep(time.Duration(rand.Uint64()%80) * time.Millisecond)
				e.Exit()
			}
		}
	}()
	<-ch
}

输出结果:

rule.steategy: SlowRequestRatio, From Closed to Open, snapshot: 0.50, time: 1661910990905
rule.steategy: SlowRequestRatio, From Open to Half-Open, time: 1661910993908
rule.steategy: SlowRequestRatio, From HalfOpen to Open, snapshot: 1.00, time: 1661910993962
rule.steategy: SlowRequestRatio, From Open to Half-Open, time: 1661910996964
rule.steategy: SlowRequestRatio, From HalfOpen to Open, snapshot: 1.00, time: 1661910997052
rule.steategy: SlowRequestRatio, From Open to Half-Open, time: 1661911000055
rule.steategy: SlowRequestRatio, From HalfOpen to Open, snapshot: 1.00, time: 1661911000110
rule.steategy: SlowRequestRatio, From Open to Half-Open, time: 1661911003119
rule.steategy: SlowRequestRatio, From HalfOpen to Open, snapshot: 1.00, time: 1661911003187
rule.steategy: SlowRequestRatio, From Open to Half-Open, time: 1661911006193
rule.steategy: SlowRequestRatio, From HalfOpen to Closed, time: 1661911006227
rule.steategy: SlowRequestRatio, From Closed to Open, snapshot: 0.50, time: 1661911015189

参考文章:sentinel 官方文档

限流熔断技术选型:从 Hystrix 到 Sentinel - 开发者头条

什么是TCP拥塞控制及谷歌的BBR算法-51CTO.COM

限流算法与Guava RateLimiter解析