1. 问题:nacos 启动报错,启动失败,全部报错下面贴出。
2. 结论:排查结果为服务器带宽不够,mysql 查询的数量量太大,传输时间损坏在网络io上!
3. 下面开始回溯事故起因:
前期config_info 表数据量少的时候,并没有出现问题,当nacos 的配置中心的配置文件在逐渐增多之后,nacos 从偶尔启动失败,到一直启动失败。下图为错误堆栈信息:
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'memoryMonitor' defined in URL [jar:file:/E:/gitlab/partner-cloud-plus/partner-visual/partner-nacos/src/main/resources/lib/nacos-config-2.1.2.jar!/com/alibaba/nacos/config/server/monitor/MemoryMonitor.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'asyncNotifyService': Unsatisfied dependency expressed through field 'dumpService'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'externalDumpService': Invocation of init method failed; nested exception is ErrCode:500, ErrMsg:Nacos Server did not start because dumpservice bean construction failure :
PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:800)
at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:229)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1372)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1222)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:953)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583)
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:745)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:420)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:307)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1317)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1306)
at com.alibaba.nacos.Nacos.main(Nacos.java:41)
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'asyncNotifyService': Unsatisfied dependency expressed through field 'dumpService'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'externalDumpService': Invocation of init method failed; nested exception is ErrCode:500, ErrMsg:Nacos Server did not start because dumpservice bean construction failure :
PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:659)
at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:639)
at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:119)
at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:399)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1431)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:619)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1389)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1309)
at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:887)
at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:791)
... 19 common frames omitted
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'externalDumpService': Invocation of init method failed; nested exception is ErrCode:500, ErrMsg:Nacos Server did not start because dumpservice bean construction failure :
PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:160)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:440)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1389)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1309)
at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:656)
... 34 common frames omitted
Caused by: com.alibaba.nacos.api.exception.NacosException: Nacos Server did not start because dumpservice bean construction failure :
PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
at com.alibaba.nacos.config.server.service.dump.DumpService.dumpOperate(DumpService.java:225)
at com.alibaba.nacos.config.server.service.dump.ExternalDumpService.init(ExternalDumpService.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:389)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:333)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:157)
... 46 common frames omitted
Caused by: org.springframework.dao.QueryTimeoutException: PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
at org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:76)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:70)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:79)
at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1541)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:667)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:713)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:744)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:757)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:810)
at com.alibaba.nacos.config.server.service.repository.extrnal.ExternalStoragePaginationHelperImpl.fetchPageLimit(ExternalStoragePaginationHelperImpl.java:200)
at com.alibaba.nacos.config.server.service.repository.extrnal.ExternalStoragePersistServiceImpl.findAllConfigInfoFragment(ExternalStoragePersistServiceImpl.java:1446)
at com.alibaba.nacos.config.server.service.dump.processor.DumpAllProcessor.process(DumpAllProcessor.java:54)
at com.alibaba.nacos.config.server.service.dump.DumpService.dumpConfigInfo(DumpService.java:282)
at com.alibaba.nacos.config.server.service.dump.DumpService.dumpOperate(DumpService.java:195)
... 54 common frames omitted
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:113)
at com.mysql.cj.jdbc.StatementImpl.checkCancelTimeout(StatementImpl.java:2167)
at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:1133)
at com.mysql.cj.NativeSession.execSQL(NativeSession.java:665)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:893)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:972)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:722)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:651)
... 63 common frames omitted
2023-01-20 08:05:05.655 WARN 9964 --- [ main] c.a.nacos.sys.file.WatchFileCenter : [WatchFileCenter] start close
2023-01-20 08:05:05.655 WARN 9964 --- [ main] c.a.nacos.sys.file.WatchFileCenter : [WatchFileCenter] start to shutdown this watcher which is watch : C:\Users\Administrator\nacos\conf
2023-01-20 08:05:05.655 WARN 9964 --- [ main] c.a.nacos.sys.file.WatchFileCenter : [WatchFileCenter] start to shutdown this watcher which is watch : C:\Users\Administrator\nacos\data\loader
2023-01-20 08:05:05.656 WARN 9964 --- [ main] c.a.nacos.sys.file.WatchFileCenter : [WatchFileCenter] start to shutdown this watcher which is watch : C:\Users\Administrator\nacos\data\tps
2023-01-20 08:05:05.656 WARN 9964 --- [ main] c.a.nacos.sys.file.WatchFileCenter : [WatchFileCenter] already closed
2023-01-20 08:05:05.656 WARN 9964 --- [ main] c.a.nacos.common.notify.NotifyCenter : [NotifyCenter] Start destroying Publisher
2023-01-20 08:05:05.656 WARN 9964 --- [ main] c.a.nacos.common.notify.NotifyCenter : [NotifyCenter] Destruction of the end
2023-01-20 08:05:05.656 ERROR 9964 --- [ main] c.a.n.c.l.StartingApplicationListener : Nacos failed to start, please see C:\Users\Administrator\nacos\logs\nacos.log for more details.
2023-01-20 08:05:05.668 INFO 9964 --- [ main] ConditionEvaluationReportLoggingListener :
2. 排查经过:
a. 开始进行了搜索, 一些博客指出在jdbc 连接加上queryTimeout=2400 试了之后并没有用
又去github 的 nacos官网 issue 也进行查看,并没有解决办法,反而issue说是mysql问题,我对此事疑惑的因为我查看的报错的sql ,是他慢呀,(心想是不是你sql 设计的有问题呀,索引没加? 最终是我错了...)怎么就是mysql问题了 ==!
SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?
b. debug跟踪nacos 的逻辑,看看是在什么地方调用了这条sql 缺省参数又是什么?
1. 这块有个启动方法 跟进去
2. 就是DumpAllProcessor 的循环。在全量扫描config_info 表。这个方法找到 sql 的参数
SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > 0 ORDER BY id ASC LIMIT 0,1000
3. 查看mysql 索引,判断是否是索引问题,进行新增,删减索引。nacos依旧启动失败!
4. 执行计划: 也没看出来哪里有问题
explain SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > 0 ORDER BY id ASC LIMIT 0,1000
5. 查看config_info 表 第一页数据时,加载就很慢。使用sql ;select * from config_info 4s左右的查询时间。nacos连接肯定慢呀,但是又找不到加大nacos 连接时间的参数。使用驱动连接,进行查询
这条sql 查询时间是4s 多。现在思路就是要不加大 限制时间(queryTimeout 并没有用,也没找到..) 要不mysql调优将sql 查询时间降下来。
6. 查看my.cnf 配置文件,增加配置,重启之后 错误依旧。查询了很多配置优化也并没有什么效果,依旧查询超时。
7. 现在的sql 如果将content 的字段 删除。 navicat 查询就很快,sql 将content 去掉 也很快。
然后将content字段设置为varchar 8099 并没有用,字段数据量太大,设置失败。设置为blob, text 设置成功,nacos依旧启动失败.
8. 单位发的电脑。配置不行,我将这个表复制给同事,跑在本地环境,navicat 是很快的,貌似是配置不行,去服务器,在本地查询这条sql。 发现很快就返回结构。此时思路有点清晰,但是还是有一层窗户纸,只知道这个中间传输环节出错了
9. 由于服务器的mysql ,和nacos 在相同地方,接下来,我又将 地址换为localhost 进行更换,nacos启动成功 ,暂时先用着。换为ip 地址就启动失败
10. 晚上,我将config_info 表 复制到买的小水管机器 也是很慢。但是复制到另外一台8核心32g的 腾讯云的机器上。速度竟然明显快。而且是快好几倍。产生了咨询下阿里云的想法。提交了下工单。进行问题的反馈
11. 最终确定问题,服务器带宽不够,这条sql 查询出来的数据量比较大,从本地查询出来,浪费在网络传输,5m 传的慢。 阿里客服那边有不同带宽可以测试,10 m 是两秒 100m 是0. 2s 意思
sql 的剖析得话, executing 有点浪费时间,也不知道executing 代表啥
总结:
1. 时间耗费:两天
2. 客观条件: 公司网络不好,给配的机器配置低,导致不能快速排查问题,中间时断时续,搞不清,到底是哪里慢 = _=!(准备买台笔记本)
3. 我的问题: 对于网络io,网络排查经验,排查工具不足。这块还要进行学习。补下 tcp/ip netty等
4. 最后:学到varchar 设置最大数据 , github issue 提问题,现在已经到了问题都搜索不到的阶段, 需要研究源码,研究底层..
2023年祝大家宏图大展
写于腊月29