背景
线上使用polarDB,基于mysql(5.7),架构为springboot+mybatisplus+durid连接池,部分业务场景涉及大表更新和查询操作,在查询慢sql且超过一定时间时就会报出"Communications link failure"异常,主要体现在界面查询或定时任务处理大批量数据是执行数据库的更新、查询异常。
现象
不区分业务高峰期,只要慢sql超过一定时间(本例为10秒)无论如何更新设置连接池参数和polarDB实例参数配置,异常依旧存在。
异常如下:
### Cause: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failureThe last packet successfully received from the server was 10,011 milliseconds ago. The last packet sent successfully to the server was 10,011 milliseconds ago.
; Communications link failureThe last packet successfully received from the server was 10,011 milliseconds ago. The last packet sent successfully to the server was 10,011 milliseconds ago.; nested exception is com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failureThe last packet successfully received from the server was 10,011 milliseconds ago. The last packet sent successfully to the server was 10,011 milliseconds ago.at org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:100)at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:91)at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:441)at com.sun.proxy.$Proxy152.selectList(Unknown Source)at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:224)at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.executeForIPage(MybatisMapperMethod.java:121)at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:85)at com.baomidou.mybatisplus.core.override.MybatisMapperProxy$PlainMethodInvoker.invoke(MybatisMapperProxy.java:148)at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:89)at com.sun.proxy.$Proxy535.selectPage(Unknown Source)at sun.reflect.GeneratedMethodAccessor2431.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
解决方案记录
根据ChatGPT或网上搜索的解决方案,对连接池大小、时间参数和连接池回收配置等都进行了优化配置,如max-wait、connect-timeout等超时配置,如下:
#mysql配置
spring.datasource.type = com.alibaba.druid.pool.DruidDataSource
#spring.datasource.druid.driverClassName = net.sf.log4jdbc.sql.jdbcapi.DriverSpy
spring.datasource.druid.driverClassName = com.mysql.cj.jdbc.Driver
#mysql 8.x使用 spring.datasource.druid.url
spring.datasource.druid.url = jdbc:mysql://xxx:3306/xxx?serverTimezone=GMT%2B8&characterEncoding=utf8&autoReconnect=true
spring.datasource.druid.username = xxx
spring.datasource.druid.password = xxx
spring.datasource.druid.initial-size = 5
spring.datasource.druid.min-idle = 10
spring.datasource.druid.max-active = 500
spring.datasource.druid.max-wait = 30000
spring.datasource.druid.connect-timeout = 30000
spring.datasource.druid.query-timeout = 30000
spring.datasource.druid.transaction-query-timeout = 30000
spring.datasource.druid.time-between-eviction-runs-millis = 60000
spring.datasource.druid.min-evictable-idle-time-millis = 300000
spring.datasource.druid.max-evictable-idle-time-millis = 900000
spring.datasource.druid.test-while-idle = true
spring.datasource.druid.test-on-borrow = false
spring.datasource.druid.test-on-return = false
spring.datasource.druid.validation-query = select 1
spring.datasource.druid.webStatFilter.enabled = true
spring.datasource.druid.stat-view-servlet.enabled = true
spring.datasource.druid.stat-view-servlet.url-pattern = /druid/*
spring.datasource.druid.stat-view-servlet.reset-enable = false
spring.datasource.druid.filter.stat.enabled = true
spring.datasource.druid.filter.stat.log-slow-sql = true
spring.datasource.druid.filter.stat.slow-sql-millis = 1000
spring.datasource.druid.filter.stat.merge-sql = true
spring.datasource.druid.filter.wall.config.multi-statement-allow = true
polarDB/mysql实例配置 connect_timeout参数超过10秒;
问题得不到解决,业务层对索引优化和强制索引等都进行了处理,部分数据迁移到ES/IOTDB,但根本问题在于MYSQL中依旧存在不可避免的大表慢sql,导致该问题依旧存在。
最后在druid的github issue中找到了高度相似问题,解决方案如下:
连接配置url中增加socketTimeout参数如下,
spring.datasource.druid.url = jdbc:mysql://xxx:3306/xxx?serverTimezone=Asia/Shanghai&characterEncoding=utf8&useSSL=false&autoReconnect=true&socketTimeout=30000
重启服务解决问题。
注意,该方式仅解决类似"The last packet successfully received from the server was 10,011 milliseconds ago. The last packet sent successfully to the server was 10,011 milliseconds ago."的异常提示问题。
根据堆栈或连接池druid DruidDataSource溯源源码,使用spring.datasource.druid.socket-timeout配置存在参数覆盖问题,即使使用了connect-timeout配置也无法解决该异常。
至于为什么部分查询未走索引导致慢sql导致超时问题,和索引顺序结构、查询条件、执行优化器等有很大关系,编码优化、索引优化或使用升级修复执行器索引的db版本,也是开发人员需要评估和优化的点。