Incidents happen and the enterprise SSH surface can be a high risk.
So how can you respond appropriately? Sooner or later every security operations organization gets confronted with an incident which—if not taken care of—can become a data breach. So, when that first event is discovered, both accuracy and time is of essence. A broken SSH often clears the path for adversaries to the crown jewels. Thus, security events touching an enterprise SSH component (client or server) are likely a poster child for the value of a laser-focused and lightspeed response.
To illustrate the value of speed and agility in your response, let’s look at an example what an accurate SSH-related response might look like:
Just moments ago, the security operations team receives an event notification that a cryptomining malware like Skidmap has been found in one of the cloud hosted machines. The malware is known for disabling Linux security policies and creating a backdoor by installing SSH authorized keys. Outside of immediately taking the server offline, here are three Rs you should include in your immediate response:
R #1: Rotate credentials immediately
While shutting down the affected server is of course the most logical action, it won’t necessary stop an adversary who may have already affected other machines by pivoting to other accounts or systems. To interrupt this movement, you should immediately rotate all SSH keys including removing old, orphaned or shared keys. This is a first solid step in the eradication process. Any activity from an unknown client can be eliminated in this way.
R #2: Repave surface to known good SSH state
Now you should address the question of how the malware came in and if are there any holes in the SSH configuration. First, you should investigate specific SSH vulnerabilities, such as the usage of deprecated SSHv1 protocol, weak configurations or duplication of unique host keys. In particular, duplicating unique host keys is not all that uncommon—especially when a specific VM machine gets copied and used in identical ways. The assessment and scoping step involves more than just a scan. It also requires an intelligent configuration method for installing a new clean SSH agent and host state across your environment.
R #3: Repair or clean the access between zones
SSH is, of course, an access protocol used by machines and humans between various environments. When adversaries penetrate, they will likely try to jump across the environment and change access paths to their advantage. It’s critical that you identify which systems should be accessible and eliminate unnecessary access between test, development or production environments—not just after an incident but also as a compliance control. SSH keys can help with this control and are especially useful when applications are sharing the same network environment.
Prepare to respond and become more proactive
While the example above is hypothetical, it helps us understand that a response is more than just reimaging a machine. This is especially true for security events touching SSH as client or for keys that create a mesh of high privileged, unattended access—these require an in-depth, laser-focused response. Also, executing this response manually will give advantage to the attacker as it requires a lot of extra time and complexity on the part of the responder.
My recommendation is that security teams should prepare for incidents with an ongoing program that leverages the 3-R approach—rotate, repave and repair—proactively and at frequent intervals (30 days?). This type of regular maintenance and health check will help you protect the SSH surface in your organization from weakening slowly.
How well are your SSH keys protected?