SocketException:Connection reset 异常排查

问题描述 上一期的需求上线之后,线上多了一个异常:Connection reset。如下:
代码语言:javascript
复制
[2017-03-22 00:45:00 ERROR] [creativeAuditTaskScheduler_Worker-9] (cn.com.ServiceImpl:169) - getAuditResult exception, call adx api failed. msg:I/O error on GET request for "https://biz/getAuditInfo?dspId=13":Connection reset; nested exception is java.net.SocketException: Connection reset
org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://biz/getAuditInfo?dspId=13":Connection reset; nested exception is java.net.SocketException: Connection reset
	at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:558)
	at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:511)
	at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:248)
	at cn.com.ServiceImpl.java:136)
	at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:98)
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:262)
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
	at com.sun.proxy.$Proxy28.getAuditResult(Unknown Source)
	at cn.com.QueryAuditResultTask.queryAuditResult(QueryAuditResultTask.java:117)
	at cn.com.QueryAuditResultTask.execute(QueryAuditResultTask.java:88)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:196)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
	at sun.security.ssl.InputRecord.read(InputRecord.java:480)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
	at org.apache.http.impl.conn.LoggingInputStream.read(LoggingInputStream.java:87)
	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87)
	at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
	at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:52)
	at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:542)
	... 19 more

这里使用Spring RestTemplate调外部接口查询结果。Spring RestTemplate 配置如下:

代码语言:javascript
复制
    <bean id="ky.pollingConnectionManager" class="org.apache.http.impl.conn.PoolingHttpClientConnectionManager">
        <!--整个连接池的并发-->
        <property name="maxTotal" value="1000" />
        <!--每个路由的并发-->
        <property name="defaultMaxPerRoute" value="32" />
    </bean>
&lt;bean id=&#34;ky.httpClientBuilder&#34; class=&#34;org.apache.http.impl.client.HttpClientBuilder&#34; factory-method=&#34;create&#34;&gt;
    &lt;property name=&#34;connectionManager&#34; ref=&#34;ky.pollingConnectionManager&#34; /&gt;
    &lt;!--开启重试--&gt;
    &lt;property name=&#34;retryHandler&#34;&gt;
    &lt;!--新加的异常处理,只处理ConnectTimeoutException和UnknownHostException异常--&gt;
    &lt;!--上一版本使用的是org.apache.http.impl.client.DefaultHttpRequestRetryHandler--&gt;
    &lt;!--默认处理InterruptedIOException、UnknownHostException、ConnectException、SSLException 4种异常--&gt;
        &lt;bean class=&#34;cn.com.autodx.common.http.CustomRequestRetryHandler&#34;&gt;
            &lt;constructor-arg value=&#34;3&#34;/&gt;
            &lt;constructor-arg value=&#34;true&#34;/&gt;
            &lt;constructor-arg value=&#34;200&#34;/&gt;
        &lt;/bean&gt;
    &lt;/property&gt;
    &lt;!--开启keep-Alive--&gt;
    &lt;property name=&#34;keepAliveStrategy&#34;&gt;
        &lt;bean class=&#34;org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy&#34; /&gt;
    &lt;/property&gt;
    &lt;property name=&#34;defaultHeaders&#34;&gt;
        &lt;list&gt;
            &lt;bean class=&#34;org.apache.http.message.BasicHeader&#34;&gt;
                &lt;constructor-arg value=&#34;User-Agent&#34;/&gt;
                &lt;constructor-arg value=&#34;Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36&#34;/&gt;
            &lt;/bean&gt;
            &lt;bean class=&#34;org.apache.http.message.BasicHeader&#34;&gt;
                &lt;constructor-arg value=&#34;Accept-Encoding&#34;/&gt;
                &lt;constructor-arg value=&#34;gzip,deflate&#34;/&gt;
            &lt;/bean&gt;
            &lt;bean class=&#34;org.apache.http.message.BasicHeader&#34;&gt;
                &lt;constructor-arg value=&#34;Accept-Language&#34;/&gt;
                &lt;constructor-arg value=&#34;zh-CN&#34;/&gt;
            &lt;/bean&gt;
            &lt;bean class=&#34;org.apache.http.message.BasicHeader&#34;&gt;
                &lt;constructor-arg value=&#34;Connection&#34;/&gt;
                &lt;constructor-arg value=&#34;keep-alive&#34;/&gt;
            &lt;/bean&gt;
        &lt;/list&gt;
    &lt;/property&gt;
&lt;/bean&gt;

&lt;bean id=&#34;ky.httpClient&#34; factory-bean=&#34;ky.httpClientBuilder&#34; factory-method=&#34;build&#34; /&gt;

&lt;bean id=&#34;ky.clientHttpRequestFactory&#34; class=&#34;org.springframework.http.client.HttpComponentsClientHttpRequestFactory&#34;&gt;
    &lt;constructor-arg ref=&#34;ky.httpClient&#34;/&gt;
    &lt;!--连接超时时间,毫秒--&gt;
    &lt;property name=&#34;connectTimeout&#34; value=&#34;2000&#34;/&gt;
    &lt;!--读写超时时间,毫秒--&gt;
    &lt;property name=&#34;readTimeout&#34; value=&#34;5000&#34;/&gt;
&lt;/bean&gt;

&lt;bean id=&#34;restTemplate&#34; class=&#34;org.springframework.web.client.RestTemplate&#34;&gt;
    &lt;constructor-arg ref=&#34;ky.clientHttpRequestFactory&#34;/&gt;
    &lt;property name=&#34;errorHandler&#34;&gt;
        &lt;bean class=&#34;org.springframework.web.client.DefaultResponseErrorHandler&#34;/&gt;
    &lt;/property&gt;
    &lt;property name=&#34;messageConverters&#34;&gt;
        &lt;list&gt;
            &lt;bean class=&#34;org.springframework.http.converter.FormHttpMessageConverter&#34;/&gt;
            &lt;bean class=&#34;cn.com.autodx.common.jsonView.ViewAwareJsonMessageConverter&#34;/&gt;
            &lt;bean class=&#34;org.springframework.http.converter.StringHttpMessageConverter&#34;&gt;
                &lt;property name=&#34;supportedMediaTypes&#34;&gt;
                    &lt;list&gt;
                        &lt;value&gt;text/html;charset=UTF-8&lt;/value&gt;
                        &lt;value&gt;application/json&lt;/value&gt;
                    &lt;/list&gt;
                &lt;/property&gt;
            &lt;/bean&gt;
        &lt;/list&gt;
    &lt;/property&gt;
&lt;/bean&gt;</code></pre></div></div><p>本次需求,并没有修改逻辑,为什么会出现这种情况呢?只是网络关系,还是跟代码有关呢。我有几个疑问:</p><ol class="ol-level-0"><li>什么情况下会产生Connection reset?</li><li>长连接中,向server发请求,是先发送数据的,如果连接断开,应该是写数据异常,为什么是读数据异常呢?请求是否发送成功?发送之前有校验连接是否可用吗?</li><li>http连接池defaultMaxPerRoute什么意思?每个并发都建立一个长连接吗?</li><li>Connection reset之后,如何重新建立连接,继而继续进行业务交互?</li><li>RestTemplate中配置了重试,为什么没有重新发起连接?</li></ol><p>我们便来解答上面的问题。</p><h5 id="2q8ve" name="%E9%97%AE%E9%A2%981%EF%BC%8C%E4%BB%80%E4%B9%88%E6%83%85%E5%86%B5%E4%B8%8B%E4%BC%9A%E4%BA%A7%E7%94%9FConnection-reset%EF%BC%9F">问题1,什么情况下会产生Connection reset?</h5><p>网上搜一下,很多这样的打包附送的答案,如下:</p><blockquote><p> 第1个异常是java.net.BindException:Address already in use: JVM_Bind。该异常发生在服务器端进行new ServerSocket(port)(port是一个0,65536的整型值)操作时。异常的原因是以为与port一样的一个端口已经被启动,并进行监听。此时用netstat –an命令,可以看到一个Listending状态的端口。只需要找一个没有被占用的端口就能解决这个问题。

第2个异常是java.net.ConnectException: Connection refused: connect。该异常发生在客户端进行new Socket(ip, port)操作时,该异常发生的原因是或者具有ip地址的机器不能找到(也就是说从当前机器不存在到指定ip路由),或者是该ip存在,但找不到指定的端口进行监听。出现该问题,首先检查客户端的ip和port是否写错了,如果正确则从客户端ping一下服务器看是否能ping通,如果能ping通(服务服务器端把ping禁掉则需要另外的办法),则看在服务器端的监听指定端口的程序是否启动,这个肯定能解决这个问题。

第3个异常是java.net.SocketException: Socket is closed,该异常在客户端和服务器均可能发生。异常的原因是己方主动关闭了连接后(调用了Socket的close方法)再对网络连接进行读写操作。

第4个异常是java.net.SocketException: (Connection reset或者Connect reset by peer:Socket write error)。该异常在客户端和服务器端均有可能发生,引起该异常的原因有两个,第一个就是如果一端的Socket被关闭(或主动关闭或者因为异常退出而引起的关闭),另一端仍发送数据,发送的第一个数据包引发该异常(Connect reset by peer)。另一个是一端退出,但退出时并未关闭该连接,另一端如果在从连接中读数据则抛出该异常(Connection reset)。简单的说就是在连接断开后的读和写操作引起的。

第5个异常是java.net.SocketException: Broken pipe。该异常在客户端和服务器均有可能发生。在第4个异常的第一种情况中(也就是抛出SocketExcepton:Connect reset by peer:Socket write error后),如果再继续写数据则抛出该异常。前两个异常的解决方法是首先确保程序退出前关闭所有的网络连接,其次是要检测对方的关闭连接操作,发现对方关闭连接后自己也要关闭该连接。

这里我们关心的是第四个异常,即server已经关闭了连接,client仍然在从连接中读数据。

细节剖析
正常流程(成功日志)剖析

接下来,先逐步debug,分析调用成功的日志。

org.springframework.http.client.support.HttpAccessor.createRequest(URI url, HttpMethod method)

代码语言:javascript
复制
(org.springframework.web.client.RestTemplate:?) - Created GET request for "https://domainName/creative/getAuditInfo"
(org.springframework.web.client.RestTemplate:?) - Setting request Accept header to [application/json, application/*+json, text/html, application/json]
(org.apache.http.client.protocol.RequestAddCookies:?) - CookieSpec selected: default
(org.apache.http.client.protocol.RequestAuthCache:?) - Auth cache not set in the context

org.apache.http.impl.execchain.MainClientExec.execute

  1. 获取ConnectionRequest connManager.requestConnection(route, userToken); 调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection
代码语言:javascript
复制
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection request: [route: {s}->https://domainName:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]
  1. 根据ConnectionRequest,获取HttpClientConnection [调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection] 获取连接后,校验连接不可用,则关闭连接connection。后面判断connection没有打开,就会重新建立连接
代码语言:javascript
复制
(org.apache.http.wire:?) - http-outgoing-798 << "end of stream" 
 //isStale返回true     
(org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-798: Close connection    
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection leased: [id: 799][route: {s}->https://domainName:443][total kept alive: 0; route allocated: 1 of 32; total allocated: 1 of 1000]

3.如果managedConn没有打开, 则建立路由 establishRoute

代码语言:javascript
复制
(org.apache.http.impl.execchain.MainClientExec:?) - Opening connection {s}->https://domainName:443

调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect 调用 org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect 遍历地址集,新建socket,并与connection绑定,成功即返回。 以下是建立连接的过程

代码语言:javascript
复制
(org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:?) - Connecting to domainName/0.0.0.0:443

调用 sock = sf.connectSocket(connectTimeout, sock, host, remoteAddress, localAddress, context); //建立socket conn.bind(sock); //绑定socket

代码语言:javascript
复制
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Connecting socket to domainName/0.0.0.0:443 with timeout 2000
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Enabled protocols: [TLSv1]
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Enabled cipher suites:[]  //session.getCipherSuite()

调用 org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket sslsock.startHandshake();

代码语言:javascript
复制
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Starting handshake

调用 org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket.verifyHostname(sslsock, host.getHostName()); session可用,打印信息

代码语言:javascript
复制
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) - Secure session established
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  negotiated protocol: TLSv1  
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  negotiated cipher suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  peer principal: CN=*.home.com.cn, OU=IT, O="Beijing Autohome Information Technology Co., Ltd.", L=Beijing, ST=Beijing, C=CN
(org.apache.http.conn.ssl.SSLConnectionSocketFactory:?) -  issuer principal: CN=Symantec Class 3 Secure Server CA - G4, OU=Symantec Trust Network, O=Symantec Corporation, C=US

调用 org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect

代码语言:javascript
复制
(org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:?) - Connection established localhost:4357<->0.0.0.0:443
  1. 设置socket超时时间 managedConn.setSocketTimeout(timeout);
代码语言:javascript
复制
(org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-799: set socket timeout to 5000
  1. 开始处理请求 根据request头部参数,作相应操作
代码语言:javascript
复制
(org.apache.http.impl.execchain.MainClientExec:?) - Executing request GET /creative/getAuditInfo HTTP/1.1
(org.apache.http.impl.execchain.MainClientExec:?) - Target auth state: UNCHALLENGED
(org.apache.http.impl.execchain.MainClientExec:?) - Proxy auth state: UNCHALLENGED
  1. 处理请求 requestExecutor.execute(request, managedConn, context); 调用:org.apache.http.protocol.HttpRequestExecutor.execute 主要功能:
代码语言:javascript
复制
  HttpResponse response = doSendRequest(request, conn, context);
  if (response == null) {
      response = doReceiveResponse(request, conn, context);
  }

6.1 doSendRequest 6.1.1 发送请求头部 调用org.apache.http.impl.DefaultBHttpClientConnection.sendRequestHeader (确保socket有效,然后向缓存写请求头部,写完后打印以下日志)

代码语言:javascript
复制
(org.apache.http.headers:?) - http-outgoing-799 >> GET /creative/getAuditInfo HTTP/1.1
(org.apache.http.headers:?) - http-outgoing-799 >> Accept: application/json, application/*+json, text/html, application/json
(org.apache.http.headers:?) - http-outgoing-799 >> User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36
(org.apache.http.headers:?) - http-outgoing-799 >> Accept-Encoding: gzip,deflate
(org.apache.http.headers:?) - http-outgoing-799 >> Accept-Language: zh-CN
(org.apache.http.headers:?) - http-outgoing-799 >> Connection: keep-alive
(org.apache.http.headers:?) - http-outgoing-799 >> Host: domainName

6.1.2 发送请求头部之后,如果请求带有entity,则继续发送entity,即conn.sendRequestEntity。这里对HTTP 1.0协议做了兼容判断

6.1.3 通过connection将所有缓存数据发送到服务端,并记录日志如下: (headers和wire的日志稍有不同。)

代码语言:javascript
复制
(org.apache.http.wire:?) - http-outgoing-799 >> "GET /creative/getAuditInfo HTTP/1.1[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Accept: application/json, application/*+json, text/html, application/json[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Accept-Encoding: gzip,deflate[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Accept-Language: zh-CN[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Connection: keep-alive[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "Host: domainName[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 >> "[\r][\n]"

6.2 doReceiveResponse 如果response为空,则接收服务端的响应

代码语言:javascript
复制
 response = conn.receiveResponseHeader();
 if (canResponseHaveBody(request, response)) {
     conn.receiveResponseEntity(response);
 }

6.2.1 接收响应头 receiveResponseHeader — 读入数据,头部和数据体,第一次读取内容【这里数据包读了两次】 << 表示接收数据,从输入流读入

代码语言:javascript
复制
(org.apache.http.wire:?) - http-outgoing-799 << "HTTP/1.1 200 OK[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Server: 10.29[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Date: Wed, 22 Mar 2017 01:50:00 GMT[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Content-Type: application/json;charset=UTF-8[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Transfer-Encoding: chunked[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Connection: keep-alive[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Pragma: no-cache[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Cache-Control: no-cache, no-store, max-age=0[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Expires: Thu, 01 Jan 1970 00:00:00 GMT[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "Content-Language: zh-CN[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "6f7[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "{"data":{......... data1

— 接收完之后,打印日志

代码语言:javascript
复制
(org.apache.http.headers:?) - http-outgoing-799 << HTTP/1.1 200 OK
(org.apache.http.headers:?) - http-outgoing-799 << Server: 10.29
(org.apache.http.headers:?) - http-outgoing-799 << Date: Wed, 22 Mar 2017 01:50:00 GMT
(org.apache.http.headers:?) - http-outgoing-799 << Content-Type: application/json;charset=UTF-8
(org.apache.http.headers:?) - http-outgoing-799 << Transfer-Encoding: chunked
(org.apache.http.headers:?) - http-outgoing-799 << Connection: keep-alive
(org.apache.http.headers:?) - http-outgoing-799 << Pragma: no-cache
(org.apache.http.headers:?) - http-outgoing-799 << Cache-Control: no-cache, no-store, max-age=0
(org.apache.http.headers:?) - http-outgoing-799 << Expires: Thu, 01 Jan 1970 00:00:00 GMT
(org.apache.http.headers:?) - http-outgoing-799 << Content-Language: zh-CN

6.2.2 如果响应含有消息体,则接收消息体

  1. 检查reuse策略和keepAlive策略,设置connection属性。 这里永久保留 (org.apache.http.impl.execchain.MainClientExec:?) - Connection can be kept alive indefinitely
org.springframework.web.client.RestTemplate
  1. 处理请求response = request.execute(); 日志如上 2.如果响应没有出错,则打印以下日志
代码语言:javascript
复制
(org.springframework.web.client.RestTemplate:?) - GET request for "https://domainName/creative/getAuditInfo?" resulted in 200 (OK)
  1. 读取扩展字段 extractData

3.1使用HttpMessageConverter读取并解析数据 调用org.springframework.http.converter.GenericHttpMessageConverter.read方法,从输入流中读取数据

代码语言:javascript
复制
(org.springframework.web.client.RestTemplate:?) - Reading [java.lang.String] as "application/json;charset=UTF-8" using [cn.com.autodx.common.jsonView.ViewAwareJsonMessageConverter@56a00a64]

3.2如果上一次的数据没有读全,这里会接着从输入流读取数据

代码语言:javascript
复制
(org.apache.http.wire:?) - http-outgoing-799 << "......  data2"
(org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "0[\r][\n]"
(org.apache.http.wire:?) - http-outgoing-799 << "[\r][\n]"

3.3 数据读取完毕,就释放连接 调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.releaseConnection

代码语言:javascript
复制
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection [id: 799][route: {s}->https://domainName:443] can be kept alive indefinitely
(org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection released: [id: 799][route: {s}->https://domainName:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]

RestTemplate调用完毕

代码语言:javascript
复制
(cn.com.autodx.creative.task.service.autohomeAdx.AutoHomeAdxServiceImpl:?) - getAuditResult .....

总结一下流程:

  • 创建连接请求
  • 根据连接请求的参数,从连接池中获取一个连接
  • 如果连接没有打开,则创建一个底层的socket连接。
  • 设置socket超时时间
  • 发送请求头部(如果请求中带有entity,则发送)
  • 接收响应(先接收头部,如果有主体,则接收)
  • 读取扩展数据(使用HttpMessageConverter读取并解析数据,读取完成后,关闭输入流及释放连接池中的连接)
  • 调用完毕,返回数据
本次异常日志剖析

接下来,查看失败日志: org.springframework.http.client.support.HttpAccessor.createRequest(URI url, HttpMethod method)

代码语言:javascript
复制
 (org.springframework.web.client.RestTemplate:?) - Created GET request for "https://domain/creative/getAuditInfo"
 (org.springframework.web.client.RestTemplate:?) - Setting request Accept header to [application/json, application/*+json, text/html, application/json]
 (org.apache.http.client.protocol.RequestAddCookies:?) - CookieSpec selected: default
 (org.apache.http.client.protocol.RequestAuthCache:?) - Auth cache not set in the context

org.apache.http.impl.execchain.MainClientExec.execute

  1. 获取ConnectionRequest connManager.requestConnection(route, userToken); 调用org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection
代码语言:javascript
复制
 (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection request: [route: {s}->https://domain:443][total kept alive: 1; route allocated: 1 of 32; total allocated: 1 of 1000]

2.根据ConnectionRequest,获取HttpClientConnection 调用 org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection 其间获取entry时,校验connection().isStale()。

代码语言:javascript
复制
    public boolean isStale() {
        if (!isOpen()) {
            return true;
        }
        try {
        	//如果测试结果返回-1说明不可用
            final int bytesRead = fillInputBuffer(1);
            return bytesRead < 0;
        } catch (final SocketTimeoutException ex) {
            //注意这里SocketTimeoutException时,认为是可用的
            return false;
        } catch (final IOException ex) {
            //有I/O异常,不可用
            return true;
        }
    }

这里读超时,返回连接可用。于是后面,就没有关闭连接。也就没有重新建立新连接。具体参见这篇文章。

代码语言:javascript
复制
 (org.apache.http.wire:?) - http-outgoing-766 << "[read] I/O error: Read timed out"
 (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection leased: [id: 766][route: {s}->https://domain:443][total kept alive: 0; route allocated: 1 of 32; total allocated: 1 of 1000]

设置socket超时时间 managedConn.setSocketTimeout(timeout);

代码语言:javascript
复制
 (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: set socket timeout to 5000

处理请求,并发送请求数据。处理过程同成功日志中的流程。

代码语言:javascript
复制
 (org.apache.http.impl.execchain.MainClientExec:?) - Executing request GET /creative/getAuditInfo HTTP/1.1
 (org.apache.http.impl.execchain.MainClientExec:?) - Target auth state: UNCHALLENGED
 (org.apache.http.impl.execchain.MainClientExec:?) - Proxy auth state: UNCHALLENGED
 (org.apache.http.headers:?) - http-outgoing-766 >> GET /creative/getAuditInfo HTTP/1.1
 (org.apache.http.headers:?) - http-outgoing-766 >> Accept: application/json, application/*+json, text/html, application/json
 (org.apache.http.headers:?) - http-outgoing-766 >> User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36
 (org.apache.http.headers:?) - http-outgoing-766 >> Accept-Encoding: gzip,deflate
 (org.apache.http.headers:?) - http-outgoing-766 >> Accept-Language: zh-CN
 (org.apache.http.headers:?) - http-outgoing-766 >> Connection: keep-alive
 (org.apache.http.headers:?) - http-outgoing-766 >> Host: domain
 (org.apache.http.wire:?) - http-outgoing-766 >> "GET /creative/getAuditInfo HTTP/1.1[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Accept: application/json, application/*+json, text/html, application/json[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Accept-Encoding: gzip,deflate[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Accept-Language: zh-CN[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Connection: keep-alive[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "Host: domain[\r][\n]"
 (org.apache.http.wire:?) - http-outgoing-766 >> "[\r][\n]"

接收相应数据时,出现I/O异常,关闭连接,并向上抛出异常。

代码语言:javascript
复制
 (org.apache.http.wire:?) - http-outgoing-766 << "[read] I/O error: Connection reset"
 (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: Close connection
 (org.apache.http.impl.conn.DefaultManagedHttpClientConnection:?) - http-outgoing-766: Shutdown connection
 (org.apache.http.impl.execchain.MainClientExec:?) - Connection discarded
 (org.apache.http.impl.conn.PoolingHttpClientConnectionManager:?) - Connection released: [id: 766][route: {s}->https://domain:443][total kept alive: 0; route allocated: 0 of 32; total allocated: 0 of 1000]
 (cn.com.autodx.creative.task.service.autohomeAdx.AutoHomeAdxServiceImpl:169) - getAuditResult exception, call adx api failed. msg:I/O error on GET request for "https://domain/creative/getAuditInfo":Connection reset; nested exception is java.net.SocketException: Connection reset

对比失败与成功的日志,发现问题在于获取connection的时候,校验connection是否可用的操作上。此时服务器因为不可知的原因断开了连接(服务端不可以向客户端发数据),这里应该是没有按照正常流程进行四次挥手,所以客户端还保持着连接(可以向服务端发数据,但收不到数据)。测试连接时,客户端读超时(必然的),但此时认为连接可用,实际上不可用(不知道这里是不是认为给的1ms探测时间太短了,允许读超时?),然后就没有重新建立连接。将错误操作延迟到读取请求这一步。

现在可以回答前面的问题

  • 长连接中,向server发请求,是先发送数据的,如果连接断开,应该是写数据异常,为什么是读数据异常呢?请求是否发送成功?发送之前有校验连接是否可用吗? 本次异常发生在发送完请求,读取response的时候,所以是read异常。既然服务端连接断掉,请求应该是没有发送成功。发送之前有检查连接是否可用,然而检查认为连接可用。
  • http连接池defaultMaxPerRoute什么意思?每个并发都建立一个长连接吗? 为每一个路由建立一个连接池,连接数最大为defaultMaxPerRoute。每一个连接都是一个socket连接。如果配置为长连接,则是长连接。
  • Connection reset之后,如何重新建立连接,继而继续进行业务交互? 上面的分析,connection reset之后,把有问题的连接关闭掉了,所以,后面不会再使用这个连接,只要重试,一般是可以成功的。
  • RestTemplate中配置了重试,为什么没有重新发起连接? 在org.apache.http.impl.execchain.RetryExec的execute(调用MainClientExec的execute,但是被RestTemplate的doExecute调用)中,有如下重试处理:
代码语言:javascript
复制
    public CloseableHttpResponse execute(final HttpRoute route,
            final HttpRequestWrapper request, final HttpClientContext context,
            final HttpExecutionAware execAware) throws IOException, HttpException {
        ......
    //不停重试,停下的判断在retryHandler.retryRequest
    for (int execCount = 1;; execCount++) {
        try {
            //执行操作
            return this.requestExecutor.execute(route, request, context, execAware);
        } catch (final IOException ex) {
            ......

            //判断是否符合重试的条件   
            if (retryHandler.retryRequest(ex, execCount, context)) {
                //重试的话就打印日志
                if (this.log.isInfoEnabled()) {
                    this.log.info(&#34;I/O exception (&#34;+ ex.getClass().getName() +
                            &#34;) caught when processing request to &#34;
                            + route + &#34;: &#34; + ex.getMessage());
                }
                //是否可以重试
                if (!RequestEntityProxy.isRepeatable(request)) {
                    this.log.debug(&#34;Cannot retry non-repeatable request&#34;);
                    throw new NonRepeatableRequestException(&#34;Cannot retry request &#34; +
                            &#34;with a non-repeatable request entity&#34;, ex);
                }
                request.setHeaders(origheaders);
            } else {
                //不符合重试条件,就抛出异常
                if (ex instanceof NoHttpResponseException) {
                    final NoHttpResponseException updatedex = new NoHttpResponseException(
                            route.getTargetHost().toHostString() + &#34; failed to respond&#34;);
                    updatedex.setStackTrace(ex.getStackTrace());
                    throw updatedex;
                } else {
                    throw ex;
                }
            }
        }
    }
}</code></pre></div></div><p>上一个版本中,没有connection reset的Error日志,这一个版本就有了。如果说新版本上线之后网络才有问题,可能性小。所以,以前和现在都会偶尔有网络问题产生。

如果之前版本有网络问题,为什么没有Error日志呢?是否有重试呢?是否通过重试补救了呢?
查看一下以前的info日志,发现的确有重试,也的确补救成功(没有类似的Connection reset报出)。

代码语言:javascript
复制
//3.11
(org.apache.http.impl.execchain.RetryExec:?) - I/O exception (java.net.SocketException) caught when processing request to {s}->https://domainName:443: Connection reset

//3.12
(org.apache.http.impl.execchain.RetryExec:?) - I/O exception (java.net.SocketException) caught when processing request to {s}->https://domainName:443: Connection reset

然后查看新版本上线之后的日志,没有发现重试日志,说明SocketException能被DefaultHttpRequestRetryHandler处理,而不能被CustomRequestRetryHandler处理。在对比两者的异常处理类型,发现DefaultHttpRequestRetryHandler处理的ConnectException extends SocketException,所以能处理。因为,我们只要在CustomRequestRetryHandler中,加入SocketException或者ConnectException,就可以通过重试解决网络问题。

至此,我们搞清楚了这个Connection reset的问题,并且找到了解决方案,开心ing~~

补充与参考

几个重要的类说明:

  • MainClientExec
    执行流程核心类,execute方法。这里有源码分析,还不错。
  • PoolingHttpClientConnectionManager
    用来管理connection,方法如:requestConnection(从连接池中lease一个连接,连接的状态可能是关闭的),connect(真正建立底层socket连接),releaseConnection(释放连接池中的连接)等。这里有一篇分析。
  • 如何检查httpConnection连接是否有效
方法一,org.apache.http.HttpConnection的isStale()方法
代码语言:javascript
复制
    public boolean isStale() {
//没有打开,即socket为空,则不可用
if (!isOpen()) {
return true;
}
try {
//socket链路有了,测试链路是否可用
//这里的测试方法是查看很短的时间内(这里是1ms),是否可以从输入流中读到数据
//如果测试结果返回-1说明不可用
final int bytesRead = fillInputBuffer(1);
return bytesRead < 0;
} catch (final SocketTimeoutException ex) {
//注意这里SocketTimeoutException时,认为是可用的
return false;
} catch (final IOException ex) {
//有I/O异常,不可用
return true;
}
}
方法二、org.apache.http.HttpClientConnection的isResponseAvailable方法

有几种实现方法,但大都被弃用。查看org.apache.http.impl.DefaultBHttpClientConnection的实现。

代码语言:javascript
复制
    @Override
public boolean isResponseAvailable(final int timeout) throws IOException {
ensureOpen();
try {
return awaitInput(timeout);
} catch (final SocketTimeoutException ex) {
//这里与isStale不同,SocketTimeoutException时认为不可用
return false;
}
}

protected boolean awaitInput(final int timeout) throws IOException {
    if (this.inbuffer.hasBufferedData()) {
        return true;
    }
    //BHttpConnectionBase 中的fillInputBuffer方法,与isStale的测试方法一致
    fillInputBuffer(timeout);
    return this.inbuffer.hasBufferedData();
}</code></pre></div></div><p>比较两种方法:isResponseAvailable暂时没有找到使用的地方;isStale在MainClientExec.execute中有直接使用。使用的话需要配置staleConnectionCheckEnabled,即每次请求都要去检测(最高耗时30ms)一次,看起来有一点悲观锁的意思,对性能影响比较大。所以4.4版本开始,默认值为false,且功能被标识为过时。这种方法不推荐。</p><div class="rno-markdown-code"><div class="rno-markdown-code-toolbar"><div class="rno-markdown-code-toolbar-info"><div class="rno-markdown-code-toolbar-item is-type"><span class="is-m-hidden">代码语言:</span>javascript</div></div><div class="rno-markdown-code-toolbar-opt"><div class="rno-markdown-code-toolbar-copy"><i class="icon-copy"></i><span class="is-m-hidden">复制</span></div></div></div><div class="developer-code-block"><pre class="prism-token token line-numbers language-javascript"><code class="language-javascript" style="margin-left:0">if (config.isStaleConnectionCheckEnabled()) {
// validate connection
if (managedConn.isOpen()) {
    this.log.debug(&#34;Stale connection check&#34;);
    if (managedConn.isStale()) {
        this.log.debug(&#34;Stale connection detected&#34;);
        managedConn.close();
    }
}

}

方法三:官方推荐的org.apache.http.impl.conn.PoolingHttpClientConnectionManager#getValidateAfterInactivity()

对非活动的永久连接,每个validateAfterInactivity毫秒(默认2s)做一次链路检查,尽量确保在使用的时候是可用的。为什么说尽量呢?本文就是一个例子,2s检查没有问题,但在使用之前的2s内网络出了问题,这就没有办法了。

validateAfterInactivity的使用,在org.apache.http.pool.AbstractConnPool中,getPoolEntryBlocking方法。

代码语言:javascript
复制
if (entry.isExpired(System.currentTimeMillis())) {
entry.close();
} else if (this.validateAfterInactivity > 0) {
if (entry.getUpdated() + this.validateAfterInactivity <= System.currentTimeMillis()) {
if (!validate(entry)) {
entry.close();
}
}
}

结语

如果这篇文章对您有所帮助,或者有所启发的话,求一键三连:点赞、评论、收藏➕关注,您的支持是我坚持写作最大的动力。