Jason4Zhu: TCP

Monday, May 28, 2018

HTTP部分时候请求长时间卡死超时问题排查思路

部分业务方用户反映有些时候hue刷新页面会卡死2-5min无响应。个人也遇到过几次，在页面访问卡住的同时，通过curl url:8890 -I -L一样会卡住无法返回页面内容。

通过记录事故时间点反查linux TCP相关信息，发现在卡死的时刻，TCP connection数量骤增到2-3k (ss -s)，同时TCcpExt的ListenDrops和TcpExt.ListenOverflows会相对较高：

此时感觉是因为sysctl中关于TCP连接的部分参数没有优化导致，所以添加了如下参数并重启了hue服务，但问题依旧。


net.ipv4.tcp_tw_reuse = 1

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_fin_timeout = 10

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_max_syn_backlog = 65535

net.ipv4.tcp_max_tw_buckets = 400000

net.ipv4.tcp_orphan_retries = 1

net.ipv4.tcp_retries2 = 5

net.core.netdev_max_backlog = 512000

net.ipv4.tcp_syn_retries = 3

net.ipv4.tcp_synack_retries = 3

net.ipv4.tcp_slow_start_after_idle = 0

vm.min_free_kbytes = 360448

vm.zone_reclaim_mode = 1

net.core.somaxconn = 65535

#net.netfilter.nf_conntrack_max = 1655350

#net.netfilter.nf_conntrack_tcp_timeout_established = 7200

#net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10

kernel.shmall = 4294967296

kernel.shmmax = 68719476736

kernel.msgmax = 65536

kernel.msgmnb = 65536

此时想关注下到底是哪个进程导致了TCP连接数骤增，ss -n后发现主要为:50010端口。netstat -nlp | grep :50010后发现是DataNode服务。

此时通过dmesg发现如下报错内容：


[12522087.589828] TCP: request_sock_TCP: Possible SYN flooding on port 50010. Sending cookies.  Check SNMP counters.

[12547933.126955] TCP: request_sock_TCP: Possible SYN flooding on port 50010. Sending cookies.  Check SNMP counters.

也即TCP连接已经全被50010吃满了，导致新来的hue请求无法正常建立connection导致用户角度的卡死和超时（参考：https://help.marklogic.com/Knowledgebase/Article/View/182/0/possible-syn-flooding-messages-in-system-logs）。

同时，通过sar -n DEV 3发现下行带宽已经占了1.2GB/s, 已经达到机器对应的最大下行带宽，所以数据传输不出去也是可能的因素之一。

最终，将混部的服务拆分到独立的节点，问题解决。

Saturday, November 22, 2014

An Explanation On Too Many Socket Connections In CLOSE_WAIT State When Mapreduce Applications Are Running In Hadoop

Occasionally, we found that there are too many TCP connections with state CLOSE_WAIT on DataNode when our daily MapReduce tasks are running in Hadoop. At some time, the amount can be approximately up to 20000.

/usr/sbin/lsof -i tcp | grep -E 'CLOSE_WAIT|COMMAND'

Firstly, we should be familiar with the process of TCP three-way handshake as well as four-way termination, which is well illustrated in the following graph.

Let's explain more detail on four-way termination process which is related to our problem.

When data transfer between client and server is done, whoever calls close method first, it, as the 'active' end, will send a 'FIN M' to the other one, and transit to FIN_WAIT_1 state.

The 'passive' end sends a 'ACK M+1', after receiving which the 'active' end goes into state 'FIN_WAIT_2', back to the 'active' one and transits into state CLOSE_WAIT right after it receives the 'FIN M' signal.

When the 'passive' end invokes close method and sends 'FIN N' to the 'active' one again, the 'active' one transits to 'TIME_WAIT' state. After waiting for tcp_time_wait_interval amount of time to ensure that there are no leftover segments, the 'active' end send 'ACK N+1' back to the 'passive' one, and a TCP connection terminates.

After getting some general understanding of TCP four-way termination, I did an experiment to simulate the above process by reading file from HDFS. The code snippet is as below:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

/**
 * Created by jasonzhu on 20/11/14.
 */
public class HdfsCloseWait {
    public static void main(String[] args) throws IOException, InterruptedException {
        Path pt=new Path("hdfs://ns1/user/supertool/zhudi/input/blk_1073752163");
        FileSystem fs = FileSystem.get(new Configuration());
        BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt)));
        String line;
        line=br.readLine();
        System.out.println("sleep at phase 1");
        Thread.sleep(1*30*1000);
        while (line != null) {
            System.out.println(line);
            line = br.readLine();

        }
        System.out.println("sleep at phase 2");
        Thread.sleep(1*30*1000);
        IOUtils.closeStream(br);
        fs.close();
        System.out.println("sleep at phase 3");
        Thread.sleep(10*30*1000);
        System.out.println("done=["+pt.toString()+"]");
    }
}

Run it with command:

hadoop jar com.judking.mr-1.0-SNAPSHOT-jar-with-dependencies.jar HdfsCloseWait

At the same time, monitor TCP connection for this java process on every phase, which is printed out in my code, via command:

#get the PID of this java process
/usr/java/jdk1.7.0_11/bin/jps

#either of the following command will work for monitoring TCP connection
/usr/sbin/lsof -i tcp | grep PID
netstat -anp | grep PID

Result:

Phase	TCP connection state
phase 1	ESTABLISHED
phase 2	CLOSE_WAIT
phase 3	CLOSE_WAIT

The result shows that it will transit to CLOSE_WAIT state right after it has read to EOF. In this scenario, the HDFS server is the 'active' end.

It will stay at CLOSE_WAIT state no matter we call close() method or not. However, the TCP connection in CLOSE_WAIT will disappear after some time, which is before the java process dies, whereas it will last to the time the java process dies if we do not call close() method after reading the file.

Thus, it is just a normal TCP connection no matter how many CLOSE_WAIT TCP connections there are provided we remember to call close() method every time.

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved
If transfering, please annotate the origin: Jason4Zhu