这篇文章来自我的mysql bug 报告,因此全部都写成了英文的。我前段时间发现在特定情况下mysql server可能在网络拥塞时候导致客户端进程永久阻塞,我解决了这个问题并把这个bug及其修复的patch 报告给了mysql 官方团队。
In this article I am sharing some of my finding about how mysql server handles net write timeout and how I located and fixed a bug that could cause a mysql client process to block forever under certain conditions. I've filed a bug here and contributed my patch.
In MySQL, there is a variable ‘net_write_timeout’, according to mysql documentation, whose meaning is ‘The number of seconds to wait for a block to be written to a connection before aborting the write. ’. And there is also a variable ‘net_read_timeout’, meaning ‘The number of seconds to wait for more data from a connection before aborting the read. ’And also the doc says ‘When the server is reading from the client, net_read_timeout is the timeout value controlling when to abort. When the server is writing to the client, net_write_timeout is the timeout value controlling when to abort. ’.
However, I recently found that under certain conditions, a client can be blocked permanently because of how mysqld server handles a timeout write or because of how a client reads from the mysqld server. To illustrate the issue, I’ll first talk about the implementation of the relevant features.
Implementation
The implementation of the client and server network communication feature is the VIO module. At server side, mysqld does a buffered network write --- each client connection has a net write buffer in which results to client is written and when the buffer is full, or when no more results to write, the ‘net’ module sends the buffered bytes to client in 16KB packets. At client side after a statement is sent to server in functions like mysql_real_query(), it calls mysql_store_result() which simply does a blocked read(recv()) to read from server. So the recv() syscall will return from block only if server sends more data or if server disconnects the socket connection.
The core functions are vio_write(), net_write_raw_loop() and vio_socket_io_wait(). The net_write_raw_loop() calls vio_write() to write the connection’s buffered result packet by packet to client side. The vio_write() does a non-block send() to send a packet, and if send() would block, it calls vio_socket_io_wait()->vio_io_wait() to wait for the socket to be writable(i.e. OS kernel’s socket buffer spared after the buffered data is written to network). And vio_io_wait() calls poll() to do a timed polling, and if the socket is found writable, vio_write() will try to send() again. However, if poll() times out, which could happen if ‘net_write_timeout’ was set small(e.g. 1) and a short network congestion happens, net_write_raw_loop() return error and the execution of the sql statement completes.The timeout error is simply ignored, and this is wrong! Server side should have disconnected the socket connection so that client side can return from blocked recv() syscall.
How to Reproduce
At server side prepare a table my_big_table with a huge amount of data, and set global and session net_write_timeout=1. And to imitate a network congestion that surely happens, we have to use gdb to block the execution of mysqld and client mysql at the right place. Use gdb to attach to the mysqld process and set a breakpoint at function vio_write() and vio_io_wait().Then in the client, issue a ‘select * from my_big_table’, and almost immediately use gdb to attach to the client mysql process, then you will most probably be blocked at such a callstack, and keep it blocked.
Then in the gdb attached to mysqld process, you will meet many breakpoint hits in vio_write() (server sending result packets to client) and then finally you will see vio_io_wait() (send() would block because OS socket write buffer is full) is called and in vio_io_wait(), this statement is executed in below call stack:
errno= SOCKET_ETIMEDOUT;
Then the stack unwinds and the statement execution finishes successfully, but only partial results are sent to the client. And if you executes ‘show processlist’ you will see something like below:
At client side, however, the query statement would block forever at below call stack:
Problem Analysis and Fix
Below is the code of net_write_raw_loop(), if vio_write() fails from timeout it returns VIO_SOCKET_ERROR and the while loop breaks out, and in the red box below, ER_NET_WRITE_INTERRUPTED error is reported, and net->error set to 2. However, neither net->error nor the ER_NET_WRITE_INTERRUPTED error is ever checked or any actions taken(nowhere in the entire code base) .
The right measure to take is to close the connection if thd->net.error is set non-zero as what’s done in my patch. Note that thd->net.error can be set to 1/2/3 for different types of errors, all of which will cause the sending of results to client to stop, but in some cases my_error() is called to report various errors and in other cases my_error() is not called at all, so it’s not reliable or convenient to check for specific errors reported by my_error().
I've filed a bug here and contributed my patch.