我们都学过TCP,HTTP的相关概念,本文借助协议分析工具Wireshark,让大家对一些概念眼见为实,权当温故而知新。
场景:
在Client(10.239.196.211)上通过web browser访问另一台Server(10.239.9.22)上的web server.
步骤:
0. 首先配置Wireshark -> Edit -> Preference -> Protocol:
如下配置的HTTP包显示效果与TCP segment的时序比较一致,便于理解。
1. 用浏览器访问 http://10.239.9.22:8080/,同时打开Wireshark, 抓包如下:
2. 抓到的内容太多,先用Wireshark filter expression过滤一下: http
3. 然后我们找到了关心的packet, 右键 Follow -> TCP Stream,找到所在的TCP Stream
于是得到了如下比较清爽的结果,呈现的即是"HTTP请求所用到的那个TCP connection"(在HTTP 1.1的实现中,同一个TCP Connection可以被多个HTTP通信复用。)
4. 一些分析
HTTP 层把下层的TCP的PDU (也就是TCP segment)重新组装成自己的PDU(也就是HTTP message)。在各个协议层,有不同的PDU,每个协议层要做2件事情:(i)把自己待发送的PDU交给下层去发送 (ii)把下层的收到PDU拿过来组装成自己的PDU。
TCP只负责发送自己的PDU,也就是TCP segment,TCP会把收到的数据放到自己的buffer里,那么何时这些数据会传送给上层的HTTP呢?或者换句话说,HTTP如何知道该用哪些 TCP segment来拼成一个HTTP message呢?这就是PSH flag的作用。观察下图,我们会看到PSH flag总是出现在一个TCP传输最后一部分的HTTP message时。
关于PSH的权威解释,可以看这个帖子:https://ask.wireshark.org/questions/20423/pshack-wireshark-capture
引用如下:
PSH is an indication by the sender that, if the receiving machine's TCP implementation has not yet provided the data it's received to the code that's reading the data (program, or library used by a program), it should do so at that point. To quote RFC 793, the official specification for TCP:
The data that flows on a connection may be thought of as a stream of octets. The sending user indicates in each SEND call whether the data in that call (and any preceeding calls) should be immediately pushed through to the receiving user by the setting of the PUSH flag.
A sending TCP is allowed to collect data from the sending user and to send that data in segments at its own convenience, until the push function is signaled, then it must send all unsent data. When a receiving TCP sees the PUSH flag, it must not wait for more data from the sending TCP before passing the data to the receiving process.
There is no necessary relationship between push functions and segment boundaries. The data in any particular segment may be the result of a single SEND call, in whole or part, or of multiple SEND calls.
The purpose of push function and the PUSH flag is to push data through from the sending user to the receiving user. It does not provide a record service.
我们实验中的web server端的HTTP层就相当于引文中的sending user,实验中的web browser端的HTTP层就相当于引文中的receving process。
PSH flag的作用有2方面:
- 对于发送方,就是把所有尚未发送的数据立刻发送出去
- 对于接收方,就是不要再等未到达的数据,而把当前收到的数据全部交给上层协议。
所以,web server所在机器的TCP协议在发送一个HTTP reponse时,当它意识到这是最后一个TCP segment时,就会加上PSH flag,然后所有buffer中尚未发送的数据都会被发送出去;而web browser所在机器的TCP协议在看到一个TCP segment带有PSH flag时,就会意识到已经收到足够信息了,应当立即把到目前为止已经收到的信息交给上层应用(即HTTP协议)。
简单的说,PSH flag有点像我们写文件时的flush操作。
还有几点,稍微提一下:
- TCP segment (PDU)的数量不一定等于TCP ACK的数量
- 数据的完整性和次序,通过SEQ/ACK number保证,ACK表示了"当前本方从对方已经收到的数据量",SEQ表示"当前本方已经向对方发送的数据量"。所以,如果通信得以顺利完成,A方发给B方的ACK,最终应该等于B方最后一次的SEQ+LENGTH。