背景:
楼主所在公司,因双11流量较大,现有数据库版本存在Bug且性能存在问题,故在双11前,切换到mysql5.7版本且服务器规格大幅度提升,线上顺利切换。
问题:
数据库切换后,应用的sql监控发现,新增了一条高频的sql:
select @@session.tx_read_only
同时数据库的监控发现:事物回滚数特别多。(因数据敏感,监控图不贴出来)
排查:
针对第一个问题:新增高频sql
查阅官方文档发现此属性为5.6.5版本后新增的。
https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_tx_read_only
同时该属性在5.7.20后被废弃,在5.8之后直接被删除。
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_tx_read_only
排查对应驱动的实现:
ConnectionImpl类中,如下实现:
/**
* Tests to see if the connection is in Read Only Mode. Note that prior to 5.6,
* we cannot really put the database in read only mode, but we pretend we can by
* returning the value of the readOnly flag
*
* @param useSessionStatus
* in some cases, for example when restoring connection with autoReconnect=true,
* we can rely only on saved readOnly state, so use useSessionStatus=false in that case
*
* @return true if the connection is read only
* @exception SQLException
* if a database access error occurs
*/
public boolean isReadOnly(boolean useSessionStatus) throws SQLException {
if (useSessionStatus && !this.isClosed && versionMeetsMinimum(5, 6, 5) && !getUseLocalSessionState() && getReadOnlyPropagatesToServer()) {
java.sql.Statement stmt = null;
java.sql.ResultSet rs = null;
try {
try {
stmt = getMetadataSafeStatement();
rs = stmt.executeQuery("select @@session.tx_read_only");
if (rs.next()) {
return rs.getInt(1) != 0; // mysql has a habit of tri+ state booleans
}
} catch (SQLException ex1) {
if (ex1.getErrorCode() != MysqlErrorNumbers.ER_MUST_CHANGE_PASSWORD || getDisconnectOnExpiredPasswords()) {
throw SQLError.createSQLException("Could not retrieve transation read-only status server", SQLError.SQL_STATE_GENERAL_ERROR, ex1,
getExceptionInterceptor());
}
}
} finally {
if (rs != null) {
try {
rs.close();
} catch (Exception ex) {
// ignore
}
rs = null;
}
if (stmt != null) {
try {
stmt.close();
} catch (Exception ex) {
// ignore
}
stmt = null;
}
}
}
return this.readOnly;
}
具体在什么场景调用此方法的,后面在介绍,如果不熟悉JDBC的,可参考下此文章:
针对第二个问题:事物回滚数多的问题,有以下猜想
第一猜想:应用事物代码编写存在问题,导致开启的事物存在大量的回滚操作。(系统在十来年前编写的,事物管理都是自己编码实现的,未使用事物管理框架)
而后分析了应用服务器的代码,事物存在回滚的数量原低于开启事物的数量,故否定这一问题导致的。
第二猜想:既然应用代码没有任何问题,那么问题很难定位到,潜在可能为数据库连接池导致的(我们使用的是dbcp 1.4)。
如何验证第二猜想呢?
验证猜想二
1、本地搭建一个mysql5.7。
使用docker快速创建一个Mysql5.7的实例(语法不介绍):
docker run --name mysql5.7 -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7
同时开启mysql的generate log来抓取数据库日志
( 具体介绍可参照官方文档:https://dev.mysql.com/doc/refman/5.7/en/query-log.html )。
2、本地编写简单的一个事物管理代码
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import javax.annotation.Resource;
import javax.sql.DataSource;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;
/**
*
* @author hzwumsh
* 创建时间 2018-03-16 14:04
*
*/
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = "classpath:spring-test.xml")
public class TestDb
{
@Resource
private DataSource dataSource;
@Test
public void test1() throws SQLException
{
for(int i = 1;i < 10;i++)
{
Connection connection = dataSource.getConnection();
connection.setTransactionIsolation(2);//设置隔离级别会读提交
PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)");
preparedStatement.setString(1, "1");
preparedStatement.executeUpdate();
preparedStatement.close();
connection.close();
}
for(int i = 1;i < 10;i++)
{
Connection connection = dataSource.getConnection();
connection.setTransactionIsolation(2);//设置隔离级别会读提交
connection.setAutoCommit(false);
PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)");
preparedStatement.setString(1, "1");
preparedStatement.executeUpdate();
connection.commit();
preparedStatement.close();
connection.close();
}
for(int i = 1;i < 10;i++)
{
Connection connection = dataSource.getConnection();
connection.setTransactionIsolation(2);//设置隔离级别会读提交
PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO user (test1) VALUE (?)");
preparedStatement.setString(1, "1");
preparedStatement.executeUpdate();
preparedStatement.close();
connection.close();
}
}
}
然后执行代码,观察generate log,以下为一次操作的完整日志:
event_time user_host thread_id server_id command_type argument
2018-03-16 06:34:59.121870 testdb[testdb] @ [172.17.0.1] 14 0 Query SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED
2018-03-16 06:34:59.122509 testdb[testdb] @ [172.17.0.1] 14 0 Query SET autocommit=0
2018-03-16 06:34:59.123312 testdb[testdb] @ [172.17.0.1] 14 0 Query INSERT INTO user (test1) VALUE ('1')
2018-03-16 06:34:59.123858 testdb[testdb] @ [172.17.0.1] 14 0 Query commit
2018-03-16 06:34:59.126554 testdb[testdb] @ [172.17.0.1] 14 0 Query select @@session.tx_read_only
2018-03-16 06:34:59.127432 testdb[testdb] @ [172.17.0.1] 14 0 Query rollback
2018-03-16 06:34:59.128008 testdb[testdb] @ [172.17.0.1] 14 0 Query SET autocommit=1
发现每次执行完事物操作后,均会有一次查询服务器的事物是否只读操作,然后做回滚,然后设置事物自动提交。(目前已经重现线上的此类问题)
然后debug跟进代码,同时观察数据库的日志,发现结果如下:
1、第一个不显式开启事物的,默认事物自动提交的情况下,不会出现最后三条sql。
2、第二个显式开启事物的,默认事物不自动提交的情况下,出现上述问题。
跟进代码为:
PoolableConnection的关闭方法:
/**
* Returns me to my pool.
*/
public synchronized void close() throws SQLException {
if (_closed) {
// already closed
return;
}
boolean isUnderlyingConectionClosed;
try {
isUnderlyingConectionClosed = _conn.isClosed();
} catch (SQLException e) {
try {
_pool.invalidateObject(this); // XXX should be guarded to happen at most once
} catch(IllegalStateException ise) {
// pool is closed, so close the connection
passivate();
getInnermostDelegate().close();
} catch (Exception ie) {
// DO NOTHING the original exception will be rethrown
}
throw (SQLException) new SQLException("Cannot close connection (isClosed check failed)").initCause(e);
}
if (!isUnderlyingConectionClosed) {
// Normal close: underlying connection is still open, so we
// simply need to return this proxy to the pool
try {
_pool.returnObject(this); // XXX should be guarded to happen at most once
} catch(IllegalStateException e) {
// pool is closed, so close the connection
passivate();
getInnermostDelegate().close();
} catch(SQLException e) {
throw e;
} catch(RuntimeException e) {
throw e;
} catch(Exception e) {
throw (SQLException) new SQLException("Cannot close connection (return to pool failed)").initCause(e);
}
} else {
// Abnormal close: underlying connection closed unexpectedly, so we
// must destroy this proxy
try {
_pool.invalidateObject(this); // XXX should be guarded to happen at most once
} catch(IllegalStateException e) {
// pool is closed, so close the connection
passivate();
getInnermostDelegate().close();
} catch (Exception ie) {
// DO NOTHING, "Already closed" exception thrown below
}
throw new SQLException("Already closed.");
}
}
会将本次的连接还回数据库连接池中,重点为:
_pool.returnObject(this);
dbcp默认的pool为:
GenericObjectPool或者
AbandonedObjectPool(继承自GenericObjectPool)
其中pool的returnObject的实现如下:
/**
* <p>Returns an object instance to the pool.</p>
*
* <p>If {@link #getMaxIdle() maxIdle} is set to a positive value and the number of idle instances
* has reached this value, the returning instance is destroyed.</p>
*
* <p>If {@link #getTestOnReturn() testOnReturn} == true, the returning instance is validated before being returned
* to the idle instance pool. In this case, if validation fails, the instance is destroyed.</p>
*
* <p><strong>Note: </strong> There is no guard to prevent an object
* being returned to the pool multiple times. Clients are expected to
* discard references to returned objects and ensure that an object is not
* returned to the pool multiple times in sequence (i.e., without being
* borrowed again between returns). Violating this contract will result in
* the same object appearing multiple times in the pool and pool counters
* (numActive, numIdle) returning incorrect values.</p>
*
* @param obj instance to return to the pool
*/
public void returnObject(Object obj) throws Exception {
try {
addObjectToPool(obj, true);
} catch (Exception e) {
if (_factory != null) {
try {
_factory.destroyObject(obj);
} catch (Exception e2) {
// swallowed
}
// TODO: Correctness here depends on control in addObjectToPool.
// These two methods should be refactored, removing the
// "behavior flag", decrementNumActive, from addObjectToPool.
synchronized(this) {
_numActive--;
allocate();
}
}
}
}
/**
* <p>Adds an object to the pool.</p>
*
* <p>Validates the object if testOnReturn == true and passivates it before returning it to the pool.
* if validation or passivation fails, or maxIdle is set and there is no room in the pool, the instance
* is destroyed.</p>
*
* <p>Calls {@link #allocate()} on successful completion</p>
*
* @param obj instance to add to the pool
* @param decrementNumActive whether or not to decrement the active count
* @throws Exception
*/
private void addObjectToPool(Object obj, boolean decrementNumActive) throws Exception {
boolean success = true;
if(_testOnReturn && !(_factory.validateObject(obj))) {
success = false;
} else {
_factory.passivateObject(obj);
}
boolean shouldDestroy = !success;
// Add instance to pool if there is room and it has passed validation
// (if testOnreturn is set)
synchronized (this) {
if (isClosed()) {
shouldDestroy = true;
} else {
if((_maxIdle >= 0) && (_pool.size() >= _maxIdle)) {
shouldDestroy = true;
} else if(success) {
// borrowObject always takes the first element from the queue,
// so for LIFO, push on top, FIFO add to end
if (_lifo) {
_pool.addFirst(new ObjectTimestampPair(obj));
} else {
_pool.addLast(new ObjectTimestampPair(obj));
}
if (decrementNumActive) {
_numActive--;
}
allocate();
}
}
}
// Destroy the instance if necessary
if(shouldDestroy) {
try {
_factory.destroyObject(obj);
} catch(Exception e) {
// ignored
}
// Decrement active count *after* destroy if applicable
if (decrementNumActive) {
synchronized(this) {
_numActive--;
allocate();
}
}
}
}
其中pool中的默认工厂为:PoolableConnectionFactory。
在还会连接到连接池中,会调用工厂的如下方法:
public void passivateObject(Object obj) throws Exception {
if(obj instanceof Connection) {
Connection conn = (Connection)obj;
if(!conn.getAutoCommit() && !conn.isReadOnly()) {
conn.rollback();
}
conn.clearWarnings();
if(!conn.getAutoCommit()) {
conn.setAutoCommit(true);
}
}
if(obj instanceof DelegatingConnection) {
((DelegatingConnection)obj).passivate();
}
}
从而发现,最外层将连接关闭时,连接会还回工厂同时进行如上判断。
自动提交在显式事物开启是false,同时服务器在5.6.5时,isReadOnly返回为false,所以这儿固定会有一次rollback的操作,同时设置事物自动提交。
至此问题基本找到,是否可以通过升级连接池的版本来解决此问题呢?
对比发现dbcp 2.x开始新增了一个属性来判断是否需要执行此操作:
public void passivateObject(final PooledObject<PoolableConnection> p)
throws Exception {
validateLifetime(p);
final PoolableConnection conn = p.getObject();
Boolean connAutoCommit = null;
if (rollbackOnReturn) {
connAutoCommit = Boolean.valueOf(conn.getAutoCommit());
if(!connAutoCommit.booleanValue() && !conn.isReadOnly()) {
conn.rollback();
}
}
conn.clearWarnings();
// DBCP-97 / DBCP-399 / DBCP-351 Idle connections in the pool should
// have autoCommit enabled
if (enableAutoCommitOnReturn) {
if (connAutoCommit == null) {
connAutoCommit = Boolean.valueOf(conn.getAutoCommit());
}
if(!connAutoCommit.booleanValue()) {
conn.setAutoCommit(true);
}
}
conn.passivate();
}
代码:
但是此操作存在潜在的风险,如果应用代码中,存在事物已经开启了,但是因异常等特殊场景,导致事物未提交,连接复用后,导致两个事物合并提交等问题。
调研阿里开源的数据库连接池发现,可解决此问题。
@Override
public void close() throws SQLException {
if (this.disable) {
return;
}
DruidConnectionHolder holder = this.holder;
if (holder == null) {
if (dupCloseLogEnable) {
LOG.error("dup close");
}
return;
}
DruidAbstractDataSource dataSource = holder.getDataSource();
boolean isSameThread = this.getOwnerThread() == Thread.currentThread();
if (!isSameThread) {
dataSource.setAsyncCloseConnectionEnable(true);
}
if (dataSource.isAsyncCloseConnectionEnable()) {
syncClose();
return;
}
for (ConnectionEventListener listener : holder.getConnectionEventListeners()) {
listener.connectionClosed(new ConnectionEvent(this));
}
List<Filter> filters = dataSource.getProxyFilters();
if (filters.size() > 0) {
FilterChainImpl filterChain = new FilterChainImpl(dataSource);
filterChain.dataSource_recycle(this);
} else {
recycle();
}
this.disable = true;
}
会执行recycle方法:
public void recycle() throws SQLException {
if (this.disable) {
return;
}
DruidConnectionHolder holder = this.holder;
if (holder == null) {
if (dupCloseLogEnable) {
LOG.error("dup close");
}
return;
}
if (!this.abandoned) {
DruidAbstractDataSource dataSource = holder.getDataSource();
dataSource.recycle(this);
}
this.holder = null;
conn = null;
transactionInfo = null;
closed = true;
}
会执行数据源的recycle方法:
/**
* 回收连接
*/
protected void recycle(DruidPooledConnection pooledConnection) throws SQLException {
final DruidConnectionHolder holder = pooledConnection.holder;
if (holder == null) {
LOG.warn("connectionHolder is null");
return;
}
if (logDifferentThread //
&& (!isAsyncCloseConnectionEnable()) //
&& pooledConnection.ownerThread != Thread.currentThread()//
) {
LOG.warn("get/close not same thread");
}
final Connection physicalConnection = holder.conn;
if (pooledConnection.traceEnable) {
Object oldInfo = null;
activeConnectionLock.lock();
try {
if (pooledConnection.traceEnable) {
oldInfo = activeConnections.remove(pooledConnection);
pooledConnection.traceEnable = false;
}
} finally {
activeConnectionLock.unlock();
}
if (oldInfo == null) {
if (LOG.isWarnEnabled()) {
LOG.warn("remove abandonded failed. activeConnections.size " + activeConnections.size());
}
}
}
final boolean isAutoCommit = holder.underlyingAutoCommit;
final boolean isReadOnly = holder.underlyingReadOnly;
final boolean testOnReturn = this.testOnReturn;
try {
// check need to rollback?
if ((!isAutoCommit) && (!isReadOnly)) {
pooledConnection.rollback();
}
// reset holder, restore default settings, clear warnings
boolean isSameThread = pooledConnection.ownerThread == Thread.currentThread();
if (!isSameThread) {
final ReentrantLock lock = pooledConnection.lock;
lock.lock();
try {
holder.reset();
} finally {
lock.unlock();
}
} else {
holder.reset();
}
if (holder.discard) {
return;
}
if (physicalConnection.isClosed()) {
lock.lock();
try {
activeCount--;
closeCount++;
} finally {
lock.unlock();
}
return;
}
if (testOnReturn) {
boolean validate = testConnectionInternal(holder, physicalConnection);
if (!validate) {
JdbcUtils.close(physicalConnection);
destroyCount.incrementAndGet();
lock.lock();
try {
activeCount--;
closeCount++;
} finally {
lock.unlock();
}
return;
}
}
if (!enable) {
discardConnection(holder.conn);
return;
}
boolean result;
final long lastActiveTimeMillis = System.currentTimeMillis();
lock.lock();
try {
activeCount--;
closeCount++;
result = putLast(holder, lastActiveTimeMillis);
recycleCount++;
} finally {
lock.unlock();
}
if (!result) {
JdbcUtils.close(holder.conn);
LOG.info("connection recyle failed.");
}
} catch (Throwable e) {
holder.clearStatementCache();
if (!holder.discard) {
this.discardConnection(physicalConnection);
holder.discard = true;
}
LOG.error("recyle error", e);
recycleErrorCountUpdater.incrementAndGet(this);
}
}
我们发现druid也和dbcp一样,有读取事物是否自动提交和是否只读的操作,为何druid可以解决这个问题呢?
原因为:
@Override
public void rollback() throws SQLException {
if (transactionInfo == null) {
return;
}
if (holder == null) {
return;
}
DruidAbstractDataSource dataSource = holder.getDataSource();
dataSource.incrementRollbackCount();
try {
conn.rollback();
} catch (SQLException ex) {
handleException(ex, null);
} finally {
handleEndTransaction(dataSource, null);
}
}
druid的数据连接,会保存本次操作的事物信息,如果事物已经进行提交,那么会清空当前的事物信息,从而判断是否需要回滚时,可以得知当前没有任何操作需要进行回滚操作,这个就是比dbcp高级的地方。
至此这个问题已经分析完毕,楼主目前使用druid解决此问题。