环境

  • Red Hat Enterprise Linux 4
  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

问题

  • I am seeing packet loss due to socket buffer overrun.
  • netstat -s reports packet loss due to a low socket buffer.

决议

  • The TCP socket buffer sizes are set by the following files:

​Raw​

$ cat /proc/sys/net/ipv4/tcp_rmem 
4096 87380 4194304
$ cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304

  • The TCP Socket buffer is a flexible buffer that handles incoming and outgoing packets at the kernel level. It will change in size based on the current load. The three numbers in the files above indicate the size limits of the buffers. The first number in the series is the smallest the buffer will get. The middle number is the default size the socket is opened at, and the final number is the largest the buffer will get.
  • These values can be changed by adding parameters in the /etc/sysctl.conf file

​Raw​

net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 87380 4194304

For machines with sufficient memory and 1GB or 10GB network cards the maximum can be increased to 16MB if buffer pruning is occurring. It is recommend that the machine is tested under similar load with the values before being put into production.

​Raw​

net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

  • These changes will need to be loaded by running:

​Raw​

# sysctl -p

  • Be careful not to set the buffers too large. By setting the buffers too large, there can also be a performance impact. These buffers are using physical memory on the system. Meaning, for each tcp connection there is a socket buffer taking that amount of memory. If the buffers are set too large, and there are a lot of connections, there could be an issue with a shortage of memory. Also, each time data is read/written to the buffers, the entire socket must be read. This means, if the buffer is larger that the typical packet that is received or sent, there will be overhead as the entire file is read by the application regardless of the amount of data in the socket.
  • For more information on implications of increasing socket buffer size please see article: What are the implications of changing socket buffer sizes?

* These settings depend on the application call ​​setsockopt(SO_RCVBUF)​​ to manually set the buffer.

If the the sysctl memory settings above don't provide any increase in throughput, then its recommended to either increase ​​setsockopt(SO_RCVBUF)​​ call or

remove the ​​setsockopt(SO_RCVBUF)​​ call from the application code. Removing the ​​setsockopt(SO_RCVBUF)​​ call will allow the kernel to auto-tune the memory values.

  • Also note that when ​​setsockopt(SO_RCVBUF)​​ is in use the buffer set in the application will be limited to the values set in ​​net.core.rmem_default​​ and ​​net.core.rmem_max​​ As such if the application has a 1MB buffer, but the value in ​​net.core.rmem_max​​ is only 256K the application socket buffer will be limited to this value instead.

​Raw​

SO_RCVBUF
Sets or gets the maximum socket receive buffer in bytes. The kernel doubles this value (to allow
space for bookkeeping overhead) when it is set using setsockopt(2), and this doubled value is
returned by getsockopt(2). The default value is set by the /proc/sys/net/core/rmem_default file, and
the maximum allowed value is set by the /proc/sys/net/core/rmem_max file. The minimum (doubled)
value for this option is 256.

诊断步骤

  • You can check to see that the socket buffer is being overloaded by looking at the output of netstat -s.

​Raw​

$ netstat -s | grep socket
1617 packets pruned from receive queue because of socket buffer overrun
798 TCP sockets finished time wait in fast timer
29 delayed acks further delayed because of locked socket
346941 packets collapsed in receive queue due to low socket buffer

  • If there are packets being "pruned" or "collapsed" in the socket buffer, some tuning may be needed.
  • ----->>>转帖者注:如上是诊断socket不足的方法!!!
  • Please note on servers where there is not much activity, you may not see any change to load or memory pressure.
    The sysctl changes will have more of an impact on a busy server with, thousands of connections occurring.
    Tests should be done accordingly.

​Raw​

BEFORE CHANGE (On non busy server)
10:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
10:50:01 AM 611988 15515156 96.21 1368 8560820 10050932 62.30 8101972 6045188 4604
11:00:01 AM 224920 15902224 98.61 1284 8917416 10182208 63.11 8204204 6331076 4952

AFTER CHANGE (On non busy server)
10:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
10:53:39 AM 752268 15374876 95.34 1168 8388744 10207324 63.26 8826552 5067064 768
10:53:40 AM 752832 15374312 95.33 1168 8388744 10206940 63.26 8826848 5067064 768
10:53:41 AM 752052 15375092 95.34 1168 8388744 10207068 63.26 8827452 5067064 880