Posts Tagged ‘reset’

TCP Offloading

I try not to get too techy in my blog here most of the time, so I apologize in advance for the density of the material.

I've got a few aging production servers that I'm responsible for (Windows 2008-era). Over the past year and a half, I've been having intermittent connectivity problems. Tons of exceptions are being thrown that indicate the server is just...dropping connections randomly. Dropping connections to the client, the SQL Server, or really anything larger than a tiny networking blip. It's not constant enough to warrant a "drop everything and figure this out" response, so it's just been this constantly irritating background noise to more pressing software development concerns.

To add, the error message read "The client disconnected. Invalid viewstate." We're not even using .Net's viewstate that much, so I was understandably confused. I made a few half-hearted attempts to track down the issue, but kept coming up empty.

After finally getting tired of the error and reading a several articles on the subject (also glossing over quite a bit of poking at the code), there seemed to be an issue with Broadcom gigabit NIC cards utilizing the TCP Offload Engine. This nifty feature is supposed to offload checksum calculation and some other TCP stack heavy lifting to the operating system in order to improve networking performance. The recommendation was to disable it entirely.

disable-tcp-offloadAll our production servers have Broadcom ethernet cards, so after disabling it on all our production systems, our dropped connection issues stopped completely. Fast-forward to several months later, and I'm debugging a problem wherein an application crashes when trying to download a large block of data from a remote SQL Server system, and getting the error: "A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired." Pretty obscure.

This system in particular is a virtual machine running in a Hyper-V server with a physical Intel card, so I dismissed that possibility and focused on the code. After losing a day running down that rabbit hole, I discovered from the "Troubleshooting Common Hyper-V Errors Part 3" article that you can, indeed, be using a TCP offloading engine without a custom driver.

The article recommends disabling only Large Send Offload, but I'm really sick of this issue. Rather than taking a stepwise approach to solving the problem, I've just gone ahead and disabled offloading for all servers that are pre-2012 since Microsoft disabled the feature in 2008 R2 onward for gigabit adapters (presumably because it's so problematic). If networking speed is compromised as a result, I'll upgrade the hardware in question.

I hope I can save someone else the time figuring this issue out.

Comments