SSH Start to Finish Architecture – Dealing with laggy networks

One of the aspects of servers and clients that speak to each other over a network that sometimes needs handling is how to keep the connection open. Sometimes the network is flakey, for lack of a better term, and this means packets can get dropped. OpenSSH is not exception, and it has several options that can be set in the sshd_config, ssh_config, and local ~/.ssh/config files to help determine how to handle such cases.

Both the server and client share the “TCPKeepAlive” option. This option determines whether or not to send TCP keepalive messages. This helps terminate a connection is a client or server crashes, but can be problematic on a network that has considerable lag. If it’s not set, though, it can mean lots of ghost users in the connection pool for the server, so it’s recommended to turn this on unless you have a very bad network. The options are “yes” or “no” depending on whether to turn it on or off.

There are also the “ClientAliveCountMax” (sshd_config) and “ServerAliveCountMax” (ssh_config and ~/.ssh/config.) These set the number of alive messages that may be sent without the client or server receiving a response back from the other end. These messages are different from TCPKeepAlive messages. The alive messages are sent via the encrypted channel, and thus are not able to be spoofed. TCP keepalive messages on the other hand can be spoofed, so this may be a better option in a more (potentially) hostile environment. This setting is a number, and defaults to “3.” Remember that this is the COUNT so it represents the number of times a message will be sent before a session it terminated due to lack of reply from the other end.

The other variable needed to make the above work is the “ClientAliveInterval” (sshd_config) and the “ServerAliveInterval” (ssh_config and ~/.ssh/config.) These determine the number of seconds between messages to be sent. The default is “0” which means nothing is sent, so you must give this a value greater than 0 to turn this option on. If this value is set to “5” on either end, and the alive count max is left at a default of “3,” then the connection would be terminated after 15 seconds if the messages don’t get a response.

Outside of TCPKeepAlive and the *AliveCountMax and *AliveInterval settings, there is also a setting for determining how long to wait for a user to successfully log in. To keep people from making the initial ssh request handshake, but then not logging in and tying up a socket, this can be set and that user will be dropped after that set amount of time. The default value is 120 seconds, but if you don’t want this at all, you can change it to 0 to turn it off, just like the *AliveInterval options.

That’s all we’re going to cover for today. We’re getting closer to the certificate authority stuff, which makes maintaining keys so much better.

Thanks for reading!