前言
基于Spring Cloud的全链路灰度蓝绿发布功能,其中一个场景是,基于Header传递的全链路灰度路由,采用配置中心配置路由策略映射在网关或者服务上,支持根据用户自定义Header跟路由策略整合,最终转化为路由Header信息而实现,路由策略传递到全链路服务中。这是一个非常普遍的需求,但如果业务方用了服务之间异步调用的方式,会导致存储在ThreadLocal里的Header丢失的情况,导致全链路灰度蓝绿失效
方案调研
通过采用类似Hystrix线程池装饰方式来实现
方案比较简单,代码也不复杂,但对业务侵入非常明显,即凡是涉及到Java异步场景丢失线程上下文的场景中的线程都需要手工逐一去装饰。故而放弃
通过阿里巴巴的开源TTL来实现
查看了相关文档和源码,并咨询了作者,似乎仍旧难以满足笔者的需求。请参考如下链接:
https://github.com/alibaba/transmittable-thread-local/issues/171
同时根据官网上的性能压测数据,让笔者还是有点担心,TTL的Full GC次数每分钟是Java 标准Theadlocal的300多倍。请参考如下链接:
https://github.com/alibaba/transmittable-thread-local/blob/master/docs/performance-test.md
故而也放弃
通过Java Agent技术来实现
也许通过Java Agent字节码增强方式可以解决?笔者对Java Agent技术并不是非常有经验,故邀请了一个这方面的高手@zifeihan。经过几个月的努力,大功告成,即DiscoveryAgent在Nepxion官方Github开源
DiscoveryAgent
灰度路由Header和调用链Span在Hystrix线程池隔离模式下或者线程、线程池、@Async注解等异步调用Feign或者RestTemplate时,通过线程上下文切换会存在丢失Header的问题,通过下述步骤解决,同时适用于网关端和服务端。该方案可以替代Hystrix线程池隔离模式下的解决方案,也适用于其它有相同使用场景的基础框架和业务场景,例如:Dubbo
涵盖所有Java框架的异步场景,解决如下7个异步场景下丢失线程上下文的问题
@Async
Hystrix Thread Pool Isolation
Runnable
Callable
Single Thread
Thread Pool
SLF4J MDC
插件获取
编译https://github.com/Nepxion/DiscoveryAgent产生discovery-agent目录
插件使用
discovery-agent-starter-
${discovery.version}.jar为Agent引导启动程序,JVM启动时进行加载;discovery-agent/plugin目录包含discovery-agent-starter-plugin-strategy-
${discovery.version}.jar为Nepxion Discovery自带的实现方案,业务系统可以自定义plugin,解决业务自己定义的上下文跨线程传递
通过如下-javaagent启动
-javaagent:/discovery-agent/discovery-agent-starter-${discovery.agent.version}.jar -Dthread.scan.packages=com.abc;com.xyz
参数说明
/discovery-agent:Agent所在的目录,需要对应到实际的目录上
-Dthread.scan.packages:Runnable,Callable对象所在的扫描目录,该目录下的Runnable,Callable对象都会被装饰。该目录最好精细和准确,这样可以减少被装饰的对象数,提高性能,目录如果有多个,用“;”分隔
-Dthread.request.decorator.enabled:异步调用场景下在服务端的Request请求的装饰,当主线程先于子线程执行完的时候,Request会被Destory,导致Header仍旧拿不到,开启装饰,就可以确保拿到。默认为开启,根据实践经验,大多数场景下,需要开启该开关
-Dthread.mdc.enabled:SLF4J MDC日志输出到异步子线程。默认关闭,如果需要,则开启该开关
扫描目录thread.scan.packages定义,该参数只作用于服务侧,网关侧不需要加
1. @Async场景下的扫描目录为org.springframework.aop.interceptor
2. Hystrix线程池隔离场景下的扫描目录为com.netflix.hystrix
3. 线程、线程池的扫描目录为自定义Runnable,Callable对象所在类的目录
参考指南示例中的异步服务启动参数。扫描目录中的三个包名,视具体场景按需配置
-javaagent:C:/opt/discovery-agent/discovery-agent-starter-${discovery.agent.version}.jar -Dthread.scan.packages=org.springframework.aop.interceptor;com.netflix.hystrix;com.nepxion.discovery.guide.service.feign
插件扩展
根据规范开发一个插件,插件提供了钩子函数,在某个类被加载的时候,可以注册一个事件到线程上下文切换事件当中,实现业务自定义ThreadLocal的跨线程传递
plugin目录为放置需要在线程切换时进行ThreadLocal传递的自定义插件。业务自定义插件开发完后,放入到plugin目录下即可
具体步骤介绍,如下
① SDK侧工作
新建ThreadLocal上下文类
public class MyContext{
private static final ThreadLocal THREAD_LOCAL = new ThreadLocal() {
@Override
protected MyContext initialValue(){
return new MyContext();
}
};
public static MyContext getCurrentContext(){
return THREAD_LOCAL.get();
}
public static void clearCurrentContext(){
THREAD_LOCAL.remove();
}
private Map attributes = new HashMap<>();
public Map getAttributes(){
return attributes;
}
public void setAttributes(Map attributes){
this.attributes = attributes;
}
}
② Agent侧工作
新建一个模块,引入如下依赖
com.nepxion
discovery-agent-starter
${discovery.agent.version}
provided
新建一个ThreadLocalHook类继承AbstractThreadLocalHook
public class MyContextHook extends AbstractThreadLocalHook{
@Override
public Object create(){
// 从主线程的ThreadLocal里获取并返回上下文对象
return MyContext.getCurrentContext().getAttributes();
}
@Override
public void before(Object object){
// 把create方法里获取到的上下文对象放置到子线程的ThreadLocal里
if (object instanceof Map) {
MyContext.getCurrentContext().setAttributes((Map) object);
}
}
@Override
public void after(){
// 线程结束,销毁上下文对象
MyContext.clearCurrentContext();
}
}
新建一个Plugin类继承AbstractPlugin
public class MyContextPlugin extends AbstractPlugin{
private Boolean threadMyPluginEnabled = Boolean.valueOf(System.getProperty("thread.myplugin.enabled", "false"));
@Override
protected String getMatcherClassName(){
// 返回存储ThreadLocal对象的类名,由于插件是可以插拔的,所以必须是字符串形式,不允许是显式引入类
return "com.nepxion.discovery.guide.sdk.MyContext";
}
@Override
protected String getHookClassName(){
// 返回ThreadLocalHook类名
return MyContextHook.class.getName();
}
@Override
protected boolean isEnabled(){
// 通过外部-Dthread.myplugin.enabled=true/false的运行参数来控制当前Plugin是否生效。该方法在父类中定义的返回值为true,即缺省为生效
return threadMyPluginEnabled;
}
}
定义SPI扩展,在src/main/resources/META-INF/services目录下定义SPI文件
名称为固定如下格式
com.nepxion.discovery.agent.plugin.Plugin
内容为Plugin类的全路径
com.nepxion.discovery.guide.agent.MyContextPlugin
执行Maven编译,把编译后的包放在discovery-agent/plugin目录下
给服务增加启动参数并启动,如下
-javaagent:C:/opt/discovery-agent/discovery-agent-starter-${discovery.agent.version}.jar -Dthread.scan.packages=com.nepxion.discovery.guide.application -Dthread.myplugin.enabled=true
③ Application侧工作
执行MyApplication,它模拟在主线程ThreadLocal放入Map数据,子线程通过DiscoveryAgent获取到该Map数据,并打印出来
@SpringBootApplication
@RestController
public class MyApplication{
private static final Logger LOG = LoggerFactory.getLogger(MyApplication.class);
public static void main(String[] args){
SpringApplication.run(MyApplication.class, args);
invoke();
}
public static void invoke(){
RestTemplate restTemplate = new RestTemplate();
for (int i = 1; i <= 10; i++) {
restTemplate.getForEntity("http://localhost:8080/index/" + i, String.class).getBody();
}
}
@GetMapping("/index/{value}")
public String index(@PathVariable(value = "value") String value) throws InterruptedException{
Map attributes = new HashMap();
attributes.put(value, "MyContext");
MyContext.getCurrentContext().setAttributes(attributes);
LOG.info("【主】线程ThreadLocal:{}", MyContext.getCurrentContext().getAttributes());
new Thread(new Runnable() {
@Override
public void run(){
LOG.info("【子】线程ThreadLocal:{}", MyContext.getCurrentContext().getAttributes());
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
LOG.info("Sleep 5秒之后,【子】线程ThreadLocal:{} ", MyContext.getCurrentContext().getAttributes());
}
}).start();
return "";
}
}
输出结果,如下
2020-10-18 18:38:22.670 INFO 3780 --- [nio-8080-exec-1] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{1=MyContext}
2020-10-18 18:38:22.738 INFO 3780 --- [ Thread-8] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{1=MyContext}
2020-10-18 18:38:22.759 INFO 3780 --- [nio-8080-exec-2] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{2=MyContext}
2020-10-18 18:38:22.760 INFO 3780 --- [ Thread-9] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{2=MyContext}
2020-10-18 18:38:22.763 INFO 3780 --- [nio-8080-exec-3] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{3=MyContext}
2020-10-18 18:38:22.764 INFO 3780 --- [ Thread-10] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{3=MyContext}
2020-10-18 18:38:22.772 INFO 3780 --- [nio-8080-exec-4] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{4=MyContext}
2020-10-18 18:38:22.773 INFO 3780 --- [ Thread-11] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{4=MyContext}
2020-10-18 18:38:22.775 INFO 3780 --- [nio-8080-exec-5] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{5=MyContext}
2020-10-18 18:38:22.776 INFO 3780 --- [ Thread-12] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{5=MyContext}
2020-10-18 18:38:22.778 INFO 3780 --- [nio-8080-exec-6] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{6=MyContext}
2020-10-18 18:38:22.779 INFO 3780 --- [ Thread-13] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{6=MyContext}
2020-10-18 18:38:22.782 INFO 3780 --- [nio-8080-exec-7] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{7=MyContext}
2020-10-18 18:38:22.783 INFO 3780 --- [ Thread-14] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{7=MyContext}
2020-10-18 18:38:22.785 INFO 3780 --- [nio-8080-exec-8] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{8=MyContext}
2020-10-18 18:38:22.786 INFO 3780 --- [ Thread-15] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{8=MyContext}
2020-10-18 18:38:22.788 INFO 3780 --- [nio-8080-exec-9] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{9=MyContext}
2020-10-18 18:38:22.789 INFO 3780 --- [ Thread-16] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{9=MyContext}
2020-10-18 18:38:22.791 INFO 3780 --- [io-8080-exec-10] c.n.d.guide.application.MyApplication : 【主】线程ThreadLocal:{10=MyContext}
2020-10-18 18:38:22.792 INFO 3780 --- [ Thread-17] c.n.d.guide.application.MyApplication : 【子】线程ThreadLocal:{10=MyContext}
2020-10-18 18:38:27.738 INFO 3780 --- [ Thread-8] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{1=MyContext}
2020-10-18 18:38:27.761 INFO 3780 --- [ Thread-9] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{2=MyContext}
2020-10-18 18:38:27.764 INFO 3780 --- [ Thread-10] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{3=MyContext}
2020-10-18 18:38:27.773 INFO 3780 --- [ Thread-11] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{4=MyContext}
2020-10-18 18:38:27.776 INFO 3780 --- [ Thread-12] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{5=MyContext}
2020-10-18 18:38:27.780 INFO 3780 --- [ Thread-13] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{6=MyContext}
2020-10-18 18:38:27.783 INFO 3780 --- [ Thread-14] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{7=MyContext}
2020-10-18 18:38:27.787 INFO 3780 --- [ Thread-15] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{8=MyContext}
2020-10-18 18:38:27.789 INFO 3780 --- [ Thread-16] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{9=MyContext}
2020-10-18 18:38:27.792 INFO 3780 --- [ Thread-17] c.n.d.guide.application.MyApplication : Sleep 5秒之后,【子】线程ThreadLocal:{10=MyContext}
完整示例,请参考https://github.com/Nepxion/DiscoveryAgentGuide。上述自定义插件的方式,即可解决使用者在线程切换时丢失ThreadLocal上下文的问题
附录