Archive for the Host configuration Category

While working this past week, I encountered a problem that I first solved in 1997, while working in Norcross for a small start up firm. The problem that we were having was related to a client/server connection going through a Firewall, being shut down after 5 minutes of no activity. I was amazed that a Firewall would shut down an idle connection in 5 minutes, but that’s what was happening. To make matters worse, the Firewall (Cyberguard), was hard coded to shut down idle tcp sessions after 5 minutes, and you couldn’t modify it in any way.

Well, we replaced that firewall, with a different product, but later that same year, we encountered the problem when we encountered the CheckPoint default session timeout of 60 minutes. We considered modifying the setting to 2 or more hours, but realized the risks of doing so. Leaving tcp connections open for long periods of time invites potential session hijacking risks. Since we were a security conscious company, we decided to look for alternate solutions. We went back to the RFC’s and really dug into TCP/IP settings and TCP Tuning.

We looked at how the tcp stack is implemented in Windows and found several documents on how to modify the systems we were running. In fact, I used the technet article so often, I have it memorized.  It’s Q120642.    We made several registry modifications and I even used this knowledge to write a document for CheckPoint FireWall-1 on how to tune the TCP stack on a Windows host that runs FireWall-1.

Several of the settings were modified to allow a high connection load,others we made on the servers on different segments on the firewalls.

To improve Connection Load we modified these two settings:

ForwardBufferMemory – default was for enough for fifty 1480-byte packets, rounded to a multiple of 256  (ONLY 50!!!)  We increased this to 5000 (Note, if you change this, you have to change NumForwardPackets as well)

NumForwardPackets – default here was enough for fifty packet headers.  We increased this to 5000 as well.  (Note, if you change this, also change ForwardBufferMemory)

(For windows servers running Internet facing sites, where connections may be greater and you need to transmit more date, you may also want to modify the above listed parameters)

We modified much more and here isn’t the forum, however, Microsoft has actually written a very nice document on how to tune your 2008 servers for many different scenarios.  You can find it here: http://www.microsoft.com/whdc/system/sysperf/Perf_tun_srv.mspx (read it… it’s actually well written!)

Now for you Linux guys, I know… you want to know how to tune your stacks too! Well, all I can say is learn your distro.  Use Google.  Or better yet, let me do that for you…Click Here for Linux TCP Tuning Tips

BUT I DIGRESS….

This is really about trying to get people to OPEN their minds and think outside the box.  No, wait… No it’s not.  It’s about getting people to open their minds and listen to reason.  Here are some interesting facts about tcp_keep_alives.

  • RFC 1122 states “A “keep-alive” mechanism periodically probes the other end of a connection when the connection is otherwise idle, even when there is no data to be sent. The TCP specification does not include a keep-alive mechanism because it could: (1) cause perfectly good connections to break during transient Internet failures; (2) consume unnecessary bandwidth (“if no one is using the connection, who cares if it is still good?”); and (3) cost money for an Internet path that charges for packets.”
  • it goes on to state, “To confirm that an idle connection is still active, these implementations send a probe segment designed to elicit a response from the peer TCP. Such a segment generally contains SEG.SEQ = SND.NXT-1 and may or may not contain one garbage octet of data. Note that on a quiet connection SND.NXT = RCV.NXT, so that this SEG.SEQ will be outside the window. Therefore, the probe causes the receiver to return an acknowledgment segment, confirming that the connection is still live. If the peer has dropped the connection due to a network partition or a crash, it will respond with a RST instead of an acknowledgment segment.”
  • This RFC was written in 1989!!!

I was asked what the “down side” of enabling keep alives were today, and there really is ONLY one.  BANDWIDTH.  In 1989, bandwidth was expensive.  Note in the section above, it mentions why the specification for TCP doesn’t REQUIRE a keep alive mechanism… to cause a good connection to fail during transient Internet Failures.  Wow… that doesn’t really happen in $20mil data centers…. does it? And it could cost more because you’re putting a packet on the wire, and it may cost more $$ in charges for packets… Do you really pay more for two packets totaling less than 256 bytes every n minutes?  On your internal 10Gig network? (I don’t think so)…

So, the downside is ≤ 256 bytes every n minutes, or, some intermediary security device will time out your “TIME_WAIT” connections every 30 or 60 minutes. (depending on your security products)

Product/Default Timeout
Juniper SRX / 30 minutes
CheckPoint FW-1/60 minutes
Cisco PIX-ASA/60 minutes
TCP Default / 120 minutes

So, if you’re a platform operations person, and you’re presented with this problem, should you:

A) Tell everyone to modify every protocol on every security device in the network to keep Applications that don’t support Application Level Keep-Alives connected?

B) Enable tcp keep alives on the server hosts that are running these broken applications?

BIG HINT, the answer is B!

Epilogue:

TCP settings are not specific to one product, one operating system or one device.  The TCP/IP stack is mostly deployed as a standard by “most” vendors, and your settings and capabilities most likely are going to vary.  If you are looking for the specifics of the Operating System, hardware, vendor or other product, PLEASE GOOGLE IT, or contact your vendor directly.  If they don’t know the TCP/IP tuning parameters, stop buying their equipment, they’re too stupid to deserve your money.  As always, this is my 2¢, YMMV.  All rights reserved for those products that I’ve mentioned by name.

To confirm that an idle connection is still active, these implementations send a probe segment designed to elicit a response from the peer TCP. Such a segment generally contains SEG.SEQ = SND.NXT-1 and may or may not contain one garbage octet of data. Note that on a quiet connection SND.NXT = RCV.NXT, so that this SEG.SEQ will be outside the window. Therefore, the probe causes the receiver to return an acknowledgment segment, confirming that the connection is still live. If the peer has dropped the connection due to a network partition or a crash, it will respond with a RST instead of an acknowledgment segment.

  • Share/Bookmark

© 2008-2010 dc0de\'s notes... & dc0de.com All Rights Reserved -- Copyright notice by Blog Copyright