Incident Response in the Cloud – Is Your Security Team Ready?
Incident Response (IR) is the umbrella term for activities where an organization recognizes and responds to an event. It applies to anything from your corporate website going down, to the loss of a database server, or even security incidents such as a user workstation compromised by malware. The purpose of Incident Response is to gather the information required to make educated decisions about how to deal with a specific event, and act upon the information gathered.
I’ve spent the last 6 years doing security incident response. My job has often entailed analyzing compromised machines for forensic artifacts, reviewing large amounts of various types of logs, and correlating event data related to the incident across a multitude of platforms and systems. I spent most of the early part of my career focusing on very traditional IR within Windows-based enterprise environments, but I’ve found the general idea of IR involving the cloud to be similar. Incidents tend to be pretty formulaic in nature. They usually go something like this: a workstation is compromised and becomes patient zero, attackers conduct some reconnaissance to orient themselves, attackers escalate privileges and gain control of user accounts (usually by harvesting credentials using tools like Mimikatz), move laterally to their objective, and then complete their objective. This follows the generally accepted attacker lifecycle frameworks (such as the Mandiant Attacker Lifecycle, MITRE ATT&CK, Lockheed Martin Kill Chain, etc).
From an Incident Response perspective, our response also follows a pattern. We identify a compromised asset, determine what happened to that asset, identify the source of compromise, pivot to that source of compromise, and then repeat until we’ve scoped the incident and identified all compromised assets. We then contain the threat, remove access, and do a damage assessment.
Identity and Incident Response
<>Identity is intrinsically tied to most security controls from badge readers at the door to usernames and passwords. Identity can also be a key part of IR. One common element of nearly every incident I’ve worked has been some sort of credential harvesting of compromised endpoints. Credentials (passwords, keys, certificates) are how systems prove identity, how it knows you are who you say you are, and what you are allowed to do. Attackers seek out credentials associated with highly-privileged identities (sysadmins, DBAs, SREs, etc) which allow them to bypass security controls to achieve their objectives.
Clearly, identities are important to both the defenders and the attackers. Defensive (or, blue) teams rely on identities to control who can access systems or take actions within those systems. Offensive (or red) teams want to control identities for that same reason. This is why it's important that identities are protected using multi-factor authentication.
The majority of organizations rely on Windows Active Directory to manage the identities of their users. Logging in to workstations, managing servers, accessing email and file storage services all use Active Directory identities rather than locally stored ones. Companies favor a directory service because it allows them to centrally manage privileges, system access, and configuration, as well as audit activity. From a security Incident Response standpoint, once you identify compromised identities you can easily determine which systems they accessed using that identity.
Incident Response at Scale
When you begin to do security in enterprises with a large cloud presence, things can get a bit more interesting. Not all cloud SaaS applications are capable of integrating into or with a central directory or SSO system. Think briefly about the number of “identities” you personally have, in the form of accounts: Netflix, Facebook, Google, Apple ID, Amazon… chances are, you can name at least a dozen different cloud identities that you have without too much thought. Each of these identities is probably protected by some sort of password credential, and each one could have access to sensitive systems and data.
From a Security Operations perspective, having to monitor dozens of identities per user makes identifying incidents much more difficult. Having a directory system means that each of your users only has one identity which is much easier to protect and monitor.
From an IR perspective, having a central directory system makes figuring out what happened easy. When you start using things that aren't backed by a central identity directory, the amount of work required to respond to an incident goes way up.
The Pain of Incident Response in the Cloud
Security teams prevent identity compromise by doing things like requiring long passwords with diverse character sets, and requiring a second authentication factor. They control access to systems using role-based access controls (RBAC) built around identity. They track how identities are used through audit logging and respond to compromises by reviewing audit logs, and remove access by disabling that identity. These are the controls and tools that we rely on today for effective, successful incident response.The unfortunate truth is that when you involve cloud SaaS you lose access to many of these controls. You’re stuck with the tools and methods the vendor or service provides unless you utilize some sort of central identity system that enforces these controls on your behalf.
Let’s walk through a hypothetical example to illustrate this: Imagine you are doing incident response for a user identity you suspect has been compromised. As a responder you have a few initial questions to answer:
- What systems does this identity have access to?
- Within those systems, what kind of privileges does the identity have?
- Has the identity been used to access the system recently?
- What has that identity done within the system?
This is typically where things start to break down. You start realizing that your cloud SaaS vendors thought the audit log would be a great add-on feature, that application passwords are only required to be 8 characters long, that they don't support multi-factor authentication or only SMS as a second factor, and that they don't do RBAC at all so your only role options are admin or normal user. Now, in order to respond you now need to be given admin level access to all the cloud SaaS applications that the compromised identity could access, figure out what kind of user account it was, attempt to determine if each system does audit logging, figure out each logging schema, repeat for all in-scope applications, and then aggregate all the individual system data to build a timeline.
For context, in 2015 alone, Okta customers used on average 11-16 applications, and in the latest Okta Businesses at Work Report (2018) Okta found that the number of cloud apps our customers use has gone up ~25% in the last year. So when each application has its own associated identity, your team is stuck doing incident response for each one. This process does not scale.
Do Cloud Identity Right
Integrating all your cloud SaaS user identities into one main directory-backed identity has a lot of benefits for your organization. From the perspective of an incident responder, having a single identity for users is critical. As we saw in our hypothetical scenario, when a users’ workstation is compromised by an attacker, the incident responder must start with the assumption that everything that user touched since the time of infection is also compromised, including any and all credentials used to access cloud applications. Then that responder must determine whether those cloud apps are in scope for the incident one at a time, in whatever way that vendor supports.
With a single sign-on (SSO) solution your incident response team can start with one identity for the user, and that identity is protected by your strong password policies and multi-factor authentication rather than whatever the vendor allows, you have detailed audit logs around all logins and application access, and you have the ability to completely quarantine that identity and stop an attacker in their tracks with just a few clicks. When you look at IR actions like identifying the scope of the compromise, determining what the attacker accessed, and containing the threat by removing access… all of these are things having an SSO solution makes easy. As we saw, the alternative of multiple individual incident response actions for each compromised identity is both ineffective and non-scalable.