WCF 超时异常详细调查

我们有一个具有 WCF 服务的应用程序(* 。Svc)运行在 IIS7和查询服务的各种客户机上。服务器正在运行 Win2008服务器。客户端正在运行 Windows2008Server 或 Windows2003服务器。我得到了下面的例外,我已经看到,实际上可能与大量潜在的周转基金问题有关。

System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:59.9320000. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The HTTP request to 'http://www.domain.com/WebServices/myservice.svc/gzip' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout.

我已经将超时时间增加到30分钟,错误仍然发生。这告诉我,还有其他因素在起作用,因为数据的数量永远不可能用30分钟来上传或下载。

错误时有时无。目前,这种情况更为频繁。如果我有3个客户端同时运行或100个,这似乎并不重要,它仍然会偶尔发生。大多数时候,没有超时,但我仍然每小时有几次。错误来自调用的任何方法。其中一个方法没有参数,并返回一点数据。另一个参数接收大量数据,但是异步执行。错误始终源自客户端,并且从不在堆栈跟踪中引用服务器上的任何代码。结局总是这样:

 at System.Net.HttpWebRequest.GetResponse()
at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)

在服务器上: 我已经尝试(目前已经尝试)了以下绑定设置:

maxBufferSize="2147483647" maxReceivedMessageSize="2147483647" maxBufferPoolSize="2147483647"

看起来没什么效果。

我已经尝试(目前已经尝试)了以下节流设置:

<serviceThrottling maxConcurrentCalls="1500"   maxConcurrentInstances="1500"    maxConcurrentSessions="1500"/>

看起来没什么效果。

我目前有以下 WCF 服务的设置。

[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Single)]

我使用 ConcurrencyMode.Multiple运行了一段时间,仍然出现了错误。

我试过重新启动 IIS、重新启动底层 SQLServer、重新启动计算机。所有这些似乎都没有影响。

我试过关闭 Windows 防火墙,但似乎没有效果。

在客户端,我有以下设置:

maxReceivedMessageSize="2147483647"


<system.net>
<connectionManagement>
<add address="*" maxconnection="16"/>
</connectionManagement>
</system.net>

我的客户关闭了它的连接:

var client = new MyClient();


try
{
return client.GetConfigurationOptions();
}
finally
{
client.Close();
}

我已经更改了注册表设置,以允许更多的传出连接:

MaxConnectionsPerServer=24, MaxConnectionsPer1_0Server=32.

我最近刚刚尝试了 SvcTraceViewer.exe。我设法在客户端捕捉到一个异常。我看它的持续时间是1分钟。查看服务器端跟踪,我可以看到服务器没有意识到这个异常。我能看到的最大持续时间是10秒。

我查看了服务器上使用 exec sp_who的活动数据库连接。我只有几个(2-3)。我使用 TCPview 查看了来自一个客户端的 TCP 连接。它通常是2-3左右,我已经看到多达5或6。

简单地说,我被难住了。我已经尝试了所有我能找到的东西,一定是遗漏了一些非常简单的东西,周转基金专家可以看到。我的直觉是,在服务器实际接收到消息之前,有什么东西在底层(TCP)阻塞了我的客户机,并且/或者有什么东西在服务器层将消息排队并且永远不让它们处理。

如果您有我需要查看的性能计数器,请告诉我。(请说明哪些值不好,因为其中一些计数器很难破译)。另外,如何记录 WCF 消息的大小?最后,是否有任何工具可以让我测试在我的客户端和服务器之间建立多少连接(独立于我的应用程序)

谢谢你抽出时间!

6月20日补充的额外信息:

我的 WCF 应用程序执行类似于下面的操作。

while (true)
{
Step1GetConfigurationSettingsFromServerViaWCF(); // can change between calls
Step2GetWorkUnitFromServerViaWCF();
DoWorkLocally(); // takes 5-15minutes.
Step3SendBackResultsToServerViaWCF();
}

通过使用 WireShark,我确实看到当错误发生时,我有五个 TCP 重新传输,随后是 TCP 重置。我猜测 RST 是来自 WCF 中断连接。我得到的异常报告来自第3步超时。

我通过查看 tcp 流“ tcp.stream eq 192”发现了这一点。然后,我将过滤器扩展到“ tcp.stream eq 192和 http 和 http.request.method eq POST”,并在该流中看到了6个 POST。这看起来很奇怪,所以我使用另一个流,比如 tcp.stream eq 100进行检查。我有三个 POST,这似乎更正常,因为我正在做三个电话。但是,我在每次 WCF 调用后都会关闭连接,所以我希望每个流调用一次(但是我对 TCP 了解不多)。

进一步研究之后,我将 http 包负载转储到磁盘,以查看这六个调用的位置。

1) Step3
2) Step1
3) Step2
4) Step3 - corrupted
5) Step1
6) Step2

我猜测是两个并发客户端使用相同的连接,这就是为什么我看到了重复。然而,我还有一些问题无法理解:

A)为什么数据包损坏了?随机网络侥幸,也许?使用以下示例代码压缩加载: http://msdn.microsoft.com/en-us/library/ms751458.aspx-当并发使用时,代码偶尔会出错吗?我应该在不使用 gzip 库的情况下进行测试。

B)为什么我会看到步骤1和步骤2在损坏的操作超时后运行?在我看来,这些行动似乎不应该发生。也许我没有看到正确的流,因为我对 TCP 的理解是有缺陷的。我还有其他同时发生的流。我应该调查其他流-快速浏览流190-194表明,第3步 POST 有适当的有效载荷数据(没有损坏)。迫使我再次查看 gzip 库。

112767 次浏览

from: http://www.codeproject.com/KB/WCF/WCF_Operation_Timeout_.aspx

To avoid this timeout error, we need to configure the OperationTimeout property for Proxy in the WCF client code. This configuration is something new unlike other configurations such as Send Timeout, Receive Timeout etc., which I discussed early in the article. To set this operation timeout property configuration, we have to cast our proxy to IContextChannel in WCF client application before calling the operation contract methods.

Did you try using clientVia to see the message sent, using SOAP toolkit or something like that? This could help to see if the error is coming from the client itself or from somewhere else.

I'm not a WCF expert but I'm wondering if you aren't running into a DDOS protection on IIS. I know from experience that if you run a bunch of simultaneous connections from a single client to a server at some point the server stops responding to the calls as it suspects a DDOS attack. It will also hold the connections open until they time-out in order to slow the client down in his attacks.

Multiple connection coming from different machines/IP's should not be a problem however.

There's more info in this MSDN post:

http://msdn.microsoft.com/en-us/library/bb463275.aspx

Check out the MaxConcurrentSession sproperty.

Did you check the WCF traces? WCF has a tendency to swallow exceptions and only return the last exception, which is the timeout that you're getting, since the end point didn't return anything meaningful.

If you havn't tried it already - encapsulate your Server-side WCF Operations in try/finally blocks, and add logging to ensure they are actually returning.

If those show that the Operations are completing, then my next step would be to go to a lower level, and look at the actual transport layer.

Wireshark or another similar packet capturing tool can be quite helpful at this point. I'm assuming this is running over HTTP on standard port 80.

Run Wireshark on the client. In the Options when you start the capture, set the capture filter to tcp http and host service.example.com - this will reduce the amount of irrelevant traffic.

If you can, modify your client to notify you the exact start time of the call, and the time when the timeout occurred. Or just monitor it closely.

When you get an error, then you can trawl through the Wireshark logs to find the start of the call. Right click on the first packet that has your client calling out on it (Should be something like GET /service.svc or POST /service.svc) and select Follow TCP Stream.

Wireshark will decode the entire HTTP Conversation, so you can ensure that WCF is actually sending back responses.

I'm having a very similar problem. In the past, this has been related to serialization problems. If you are still having this problem, can you verify that you can correctly serialize the objects you are returning. Specifically, if you are using Linq-To-Sql objects that have relationships, there are known serialization problems if you put a back reference on a child object to the parent object and mark that back reference as a DataMember.

You can verify serialization by writing a console app that serializes and deserializes your objects using the DataContractSerializer on the server side and whatever serialization methods your client uses. For example, in our current application, we have both WPF and Compact Framework clients. I wrote a console app to verify that I can serialize using a DataContractSerializer and deserialize using an XmlDesserializer. You might try that.

Also, if you are returning Linq-To-Sql objects that have child collections, you might try to ensure that you have eagerly loaded them on the server side. Sometimes, because of lazy loading, the objects being returned are not populated and may cause the behavior you are seeing where the request is sent to the service method multiple times.

If you have solved this problem, I'd love to hear how because I'm stuck with it too. I have verified that my issue is not serialization so I'm at a loss.

UPDATE: I'm not sure if it will help you any but the Service Trace Viewer Tool just solved my problem after 5 days of very similar experience to yours. By setting up tracing and then looking at the raw XML, I found the exceptions that were causing my serialization problems. It was related to Linq-to-SQL objects that occasionally had more child objects than could be successfully serialized. Adding the following to your web.config file should enable tracing:

<sharedListeners>
<add name="sharedListener"
type="System.Diagnostics.XmlWriterTraceListener"
initializeData="c:\Temp\servicetrace.svclog" />
</sharedListeners>
<sources>
<source name="System.ServiceModel" switchValue="Verbose, ActivityTracing" >
<listeners>
<add name="sharedListener" />
</listeners>
</source>
<source name="System.ServiceModel.MessageLogging" switchValue="Verbose">
<listeners>
<add name="sharedListener" />
</listeners>
</source>
</sources>

The resulting file can be opened with the Service Trace Viewer Tool or just in IE to examine the results.

If you are using .Net client then you may not have set

//This says how many outgoing connection you can make to a single endpoint. Default Value is 2
System.Net.ServicePointManager.DefaultConnectionLimit = 200;

here is the original question and answer WCF Service Throttling

Update:

This config goes in .Net client application may be on start up or whenever but before starting your tests.

Moreover you can have it in app.config file as well like following

<system.net>
<connectionManagement>
<add maxconnection = "200" address ="*" />
</connectionManagement>
</system.net>

You will also receive this error if you are passing an object back to the client that contains a property of type enum that is not set by default and that enum does not have a value that maps to 0. i.e enum MyEnum{ a=1, b=2};

Are you closing the connection to the WCF service in between requests? If you don't, you'll see this exact timeout (eventually).

I've just solved the problem.I found that the nodes in the App.config file have configed wrong.

<client>
<endpoint name="WCF_QtrwiseSalesService" binding="wsHttpBinding" bindingConfiguration="ws" address="http://cntgbs1131:9005/MyService/TGE.ISupplierClientManager" contract="*">
</endpoint>
</client>


<bindings>
<wsHttpBinding>
<binding name="ws" maxBufferPoolSize="2147483647" maxReceivedMessageSize="2147483647" messageEncoding="Text">
<readerQuotas maxDepth="2147483647" maxStringContentLength="2147483647" maxArrayLength="2147483647" maxBytesPerRead="2147483647" maxNameTableCharCount="2147483647"/>
<**security mode="None">**
<transport clientCredentialType="None"></transport>
</security>
</binding>
</wsHttpBinding>
</bindings>

Confirm your config in the node <security>,the attribute "mode" value is "None". If your value is "Transport",the error occurs.

Looks like this exception message is quite generic and can be received due to a variety of reasons. We ran into this while deploying the client on Windows 8.1 machines. Our WCF client runs inside of a windows service and continuously polls the WCF service. The windows service runs under a non-admin user. The issue was fixed by setting the clientCredentialType to "Windows" in the WCF configuration to allow the authentication to pass-through, as in the following:

      <security mode="None">
<transport clientCredentialType="Windows" proxyCredentialType="None"
realm="" />
<message clientCredentialType="UserName" algorithmSuite="Default" />
</security>