1. 问题:nacos 启动报错,启动失败,全部报错下面贴出。

2. 结论:排查结果为服务器带宽不够,mysql 查询的数量量太大,传输时间损坏在网络io上!

nacos 对接istio nacos connect timeout_nacos

 

3. 下面开始回溯事故起因:

     前期config_info 表数据量少的时候,并没有出现问题,当nacos 的配置中心的配置文件在逐渐增多之后,nacos 从偶尔启动失败,到一直启动失败。下图为错误堆栈信息:

org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'memoryMonitor' defined in URL [jar:file:/E:/gitlab/partner-cloud-plus/partner-visual/partner-nacos/src/main/resources/lib/nacos-config-2.1.2.jar!/com/alibaba/nacos/config/server/monitor/MemoryMonitor.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'asyncNotifyService': Unsatisfied dependency expressed through field 'dumpService'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'externalDumpService': Invocation of init method failed; nested exception is ErrCode:500, ErrMsg:Nacos Server did not start because dumpservice bean construction failure :
PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:800)
	at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:229)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1372)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1222)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
	at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:953)
	at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918)
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583)
	at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145)
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:745)
	at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:420)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:307)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1317)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1306)
	at com.alibaba.nacos.Nacos.main(Nacos.java:41)
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'asyncNotifyService': Unsatisfied dependency expressed through field 'dumpService'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'externalDumpService': Invocation of init method failed; nested exception is ErrCode:500, ErrMsg:Nacos Server did not start because dumpservice bean construction failure :
PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
	at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:659)
	at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:639)
	at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:119)
	at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:399)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1431)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:619)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
	at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
	at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1389)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1309)
	at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:887)
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:791)
	... 19 common frames omitted
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'externalDumpService': Invocation of init method failed; nested exception is ErrCode:500, ErrMsg:Nacos Server did not start because dumpservice bean construction failure :
PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
	at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:160)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:440)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
	at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
	at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1389)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1309)
	at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:656)
	... 34 common frames omitted
Caused by: com.alibaba.nacos.api.exception.NacosException: Nacos Server did not start because dumpservice bean construction failure :
PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
	at com.alibaba.nacos.config.server.service.dump.DumpService.dumpOperate(DumpService.java:225)
	at com.alibaba.nacos.config.server.service.dump.ExternalDumpService.init(ExternalDumpService.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:389)
	at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:333)
	at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:157)
	... 46 common frames omitted
Caused by: org.springframework.dao.QueryTimeoutException: PreparedStatementCallback; SQL [SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?]; Statement cancelled due to timeout or client request; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
	at org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:76)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:70)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:79)
	at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1541)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:667)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:713)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:744)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:757)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:810)
	at com.alibaba.nacos.config.server.service.repository.extrnal.ExternalStoragePaginationHelperImpl.fetchPageLimit(ExternalStoragePaginationHelperImpl.java:200)
	at com.alibaba.nacos.config.server.service.repository.extrnal.ExternalStoragePersistServiceImpl.findAllConfigInfoFragment(ExternalStoragePersistServiceImpl.java:1446)
	at com.alibaba.nacos.config.server.service.dump.processor.DumpAllProcessor.process(DumpAllProcessor.java:54)
	at com.alibaba.nacos.config.server.service.dump.DumpService.dumpConfigInfo(DumpService.java:282)
	at com.alibaba.nacos.config.server.service.dump.DumpService.dumpOperate(DumpService.java:195)
	... 54 common frames omitted
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:113)
	at com.mysql.cj.jdbc.StatementImpl.checkCancelTimeout(StatementImpl.java:2167)
	at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:1133)
	at com.mysql.cj.NativeSession.execSQL(NativeSession.java:665)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:893)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:972)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
	at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:722)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:651)
	... 63 common frames omitted

2023-01-20 08:05:05.655  WARN 9964 --- [           main] c.a.nacos.sys.file.WatchFileCenter       : [WatchFileCenter] start close
2023-01-20 08:05:05.655  WARN 9964 --- [           main] c.a.nacos.sys.file.WatchFileCenter       : [WatchFileCenter] start to shutdown this watcher which is watch : C:\Users\Administrator\nacos\conf
2023-01-20 08:05:05.655  WARN 9964 --- [           main] c.a.nacos.sys.file.WatchFileCenter       : [WatchFileCenter] start to shutdown this watcher which is watch : C:\Users\Administrator\nacos\data\loader
2023-01-20 08:05:05.656  WARN 9964 --- [           main] c.a.nacos.sys.file.WatchFileCenter       : [WatchFileCenter] start to shutdown this watcher which is watch : C:\Users\Administrator\nacos\data\tps
2023-01-20 08:05:05.656  WARN 9964 --- [           main] c.a.nacos.sys.file.WatchFileCenter       : [WatchFileCenter] already closed
2023-01-20 08:05:05.656  WARN 9964 --- [           main] c.a.nacos.common.notify.NotifyCenter     : [NotifyCenter] Start destroying Publisher
2023-01-20 08:05:05.656  WARN 9964 --- [           main] c.a.nacos.common.notify.NotifyCenter     : [NotifyCenter] Destruction of the end
2023-01-20 08:05:05.656 ERROR 9964 --- [           main] c.a.n.c.l.StartingApplicationListener    : Nacos failed to start, please see C:\Users\Administrator\nacos\logs\nacos.log for more details.
2023-01-20 08:05:05.668  INFO 9964 --- [           main] ConditionEvaluationReportLoggingListener :

2. 排查经过:

     a. 开始进行了搜索, 一些博客指出在jdbc 连接加上queryTimeout=2400 试了之后并没有用

        又去github 的 nacos官网 issue 也进行查看,并没有解决办法,反而issue说是mysql问题,我对此事疑惑的因为我查看的报错的sql  ,是他慢呀,(心想是不是你sql 设计的有问题呀,索引没加? 最终是我错了...)怎么就是mysql问题了  ==!

SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > ? ORDER BY id ASC LIMIT ?,?

     b. debug跟踪nacos 的逻辑,看看是在什么地方调用了这条sql  缺省参数又是什么?

       

nacos 对接istio nacos connect timeout_服务器_02

nacos 对接istio nacos connect timeout_nacos 对接istio_03

1. 这块有个启动方法 跟进去

nacos 对接istio nacos connect timeout_nacos_04

nacos 对接istio nacos connect timeout_nacos 对接istio_05

nacos 对接istio nacos connect timeout_nacos_06

 2. 就是DumpAllProcessor 的循环。在全量扫描config_info 表。这个方法找到 sql 的参数



 SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > 0 ORDER BY id ASC LIMIT 0,1000

 3. 查看mysql 索引,判断是否是索引问题,进行新增,删减索引。nacos依旧启动失败!

 4.  执行计划: 也没看出来哪里有问题

explain  SELECT id,data_id,group_id,tenant_id,app_name,content,md5,gmt_modified,type,encrypted_data_key FROM config_info WHERE id > 0 ORDER BY id ASC LIMIT 0,1000

nacos 对接istio nacos connect timeout_服务器_07

 5. 查看config_info 表 第一页数据时,加载就很慢。使用sql ;select * from config_info 4s左右的查询时间。nacos连接肯定慢呀,但是又找不到加大nacos 连接时间的参数。使用驱动连接,进行查询

nacos 对接istio nacos connect timeout_java_08

 

nacos 对接istio nacos connect timeout_nacos 对接istio_09

   这条sql 查询时间是4s 多。现在思路就是要不加大 限制时间(queryTimeout 并没有用,也没找到..) 要不mysql调优将sql 查询时间降下来。

  6. 查看my.cnf 配置文件,增加配置,重启之后 错误依旧。查询了很多配置优化也并没有什么效果,依旧查询超时。

nacos 对接istio nacos connect timeout_服务器_10

 7. 现在的sql 如果将content 的字段 删除。 navicat 查询就很快,sql 将content 去掉 也很快。

nacos 对接istio nacos connect timeout_nacos_11

   然后将content字段设置为varchar 8099 并没有用,字段数据量太大,设置失败。设置为blob, text 设置成功,nacos依旧启动失败.

8. 单位发的电脑。配置不行,我将这个表复制给同事,跑在本地环境,navicat 是很快的,貌似是配置不行,去服务器,在本地查询这条sql。 发现很快就返回结构。此时思路有点清晰,但是还是有一层窗户纸,只知道这个中间传输环节出错了

9. 由于服务器的mysql ,和nacos 在相同地方,接下来,我又将 地址换为localhost 进行更换,nacos启动成功 ,暂时先用着。换为ip 地址就启动失败

nacos 对接istio nacos connect timeout_nacos_12

 10. 晚上,我将config_info 表 复制到买的小水管机器 也是很慢。但是复制到另外一台8核心32g的 腾讯云的机器上。速度竟然明显快。而且是快好几倍。产生了咨询下阿里云的想法。提交了下工单。进行问题的反馈

nacos 对接istio nacos connect timeout_mysql_13

 11. 最终确定问题,服务器带宽不够,这条sql 查询出来的数据量比较大,从本地查询出来,浪费在网络传输,5m 传的慢。 阿里客服那边有不同带宽可以测试,10 m 是两秒 100m 是0. 2s 意思

sql 的剖析得话, executing 有点浪费时间,也不知道executing 代表啥

nacos 对接istio nacos connect timeout_服务器_14

总结:

               1. 时间耗费:两天

               2. 客观条件:  公司网络不好,给配的机器配置低,导致不能快速排查问题,中间时断时续,搞不清,到底是哪里慢 = _=!(准备买台笔记本)

               3. 我的问题: 对于网络io,网络排查经验,排查工具不足。这块还要进行学习。补下 tcp/ip netty等

               4.  最后:学到varchar 设置最大数据 , github issue 提问题,现在已经到了问题都搜索不到的阶段, 需要研究源码,研究底层..

                                                                                                             

                                                                                                                    2023年祝大家宏图大展

                                                                                                                           写于腊月29