F5 – TCP Connection Management: Sessions, Pool-Level & Node-Level Actions

Understanding the nuances of connections between servers and clients is crucial in the networking world. We often use the term “session” to refer to multiple transactions over an established Transmission Control Protocol (TCP) connection between a client and a server. Frequently, we see numerous TCP connections within the same session, necessitating session persistence to maintain this multi-transaction communication.

This blog unravels some critical configurations involving session management, particularly on the F5 Networks’ BIG-IP system. We will delve into the Pool Level settings—specifically, “Action on Service Down“—and Node Level settings, with a focus on “Disable” and “Force Offline.

Pool Level Actions: Interpreting the “Action on Service Down”

Within F5 BIG-IP system settings, the pool-level category includes “Action on Service Down“. This setting provides four possible actions—Reject, Drop, Reselect, and None (default)—each with distinct functionalities.

1. Reject: In this mode, the F5 terminates any active connection immediately after a pool member transitions to a ‘DOWN’ state. It does so by dispatching a reset (RST) to both the server and the client, removing the connection from the Local Traffic Manager (LTM) connection table. Essentially, it forces both ends of the connection to shut down, offering a rapid option to cease sessions.

2. Drop: In this case, F5 will silently discard any new client packets sent over an established connection. The connection stays intact until either side closes it or the LTM idle timer related to it expires.

3. Reselect: The F5 chooses another available pool member and reestablishes a connection. This option is valuable when the client can seamlessly proceed with a new server. However, remember that any request made during the switchover could be lost.

4. None (Default): By default, the F5 continues transmitting data over established connections as long as the client and the server interact. This option is preferable when there’s no need for the F5 to intervene with either end of the connection.

Node Level Settings: The Power of “Disable” and “Force Offline”

Another intriguing aspect relates to a setting often used when administrators temporarily “remove” a node from a cluster—maybe for maintenance or another task. The term “remove” is used loosely here; it doesn’t mean physical removal but instead stopping the node from receiving traffic and participating in the cluster. 

The settings in question, “Disable” and “Force Offline,” are found under the Nodes tab. It’s worth noting that performing this action at the node level is more efficient, as a single node could be linked to multiple pools. While you can also execute these actions at the Pool level, doing so for each pool individually would be time-consuming and less efficient.

But what do these settings entail?

1. Disable: Under this setting, established connections continue to process, and everything in the persistence table remains unaffected. New connections will only be accepted if they belong to an existing persistence session.

2. Force Offline: In this mode, F5 only manages already established connections. No new connections are permitted; existing ones remain connected until they time out.

In both scenarios, F5 will eventually remove the connections. However, the ‘Force Offline‘ mode expedites this process. Understanding these intricacies allows network administrators to make informed decisions regarding traffic management and node maintenance strategies.

F5 Networks’ Recommendations for Node Maintenance

The recommended process for removing a node for maintenance purposes involves a few crucial steps that ensure smooth operation and minimize disruptions. The guidance provided by F5 Networks for maintenance is as follows:

1. Disable the Node: Firstly, navigate to the Nodes tab and select the ‘Disable‘ option. As discussed, this setting allows established connections to continue processing, and the persistence table remains unaffected. New connections will only be accepted if they belong to an existing persistence session. Essentially, this action allows the node to complete any ongoing transactions but prevents it from starting new ones.

2. Monitor the Node: Once the node is disabled, you should monitor it to ensure all existing connections are complete and the traffic levels drop to zero. This step is crucial because it helps prevent abruptly terminating ongoing sessions, which could lead to data loss or corruption.

3. Force Offline the Node: After all connections have been completed and the node isn’t processing more traffic, switch the node to ‘Force Offline‘ mode. This action ensures the node will not accept any new connections, even those tied to existing persistence sessions. This step essentially declares that the node is no longer participating in the cluster, ensuring no new connections are sent.

4. Maintenance: You can now safely carry out any necessary maintenance tasks on the node without worrying about it handling traffic or being interrupted by incoming connections. 

5. Re-enable the Node: After completing maintenance tasks, return the node to its operational status by enabling it again. This action will allow the node to participate in the cluster and begin accepting new connections.

Remember, these actions should ideally be taken at the node level rather than the pool level to ensure efficiency, especially if the node is part of multiple pools.

By following these recommendations, you can effectively remove a node from the cluster for maintenance while minimizing disruptions and maintaining the integrity of the existing sessions.

Why not just use “Force Offline” and call it a day?

You might be asking yourself, why not just use “Force Offline” and skip the rest. It’s a valid question to ask why one can’t just directly “Force Offline” a node and why disabling it first is necessary. While the “Force Offline” option seems faster and more straightforward, it may not always be the ideal choice due to the following reasons:

1. Graceful Connection Termination: When you use the “Disable” option first, it allows the active sessions on the node to complete naturally. By doing so, you can avoid abruptly terminating active connections, which can lead to an unexpected disruption of services for users or potentially even data corruption.

2. Persistence Sessions: If any persistence sessions are active on that node, directly going to “Force Offline” could interrupt these sessions, which would be undesirable in most cases. Using “Disable” first allows existing persistence sessions to conclude gracefully.

3. Traffic Monitoring: By disabling the node first, you can monitor it to ensure all existing connections are complete and the traffic levels drop to zero. This assures you that it’s safe to proceed with maintenance without the risk of interrupting ongoing connections.

While the “Force Offline” option could be suitable in emergencies where you need to immediately stop all traffic to a node for regular maintenance purposes, following the recommended process of disabling the node first provides a more controlled and smoother approach to managing the node’s connections and sessions.