Case Study: Incident Response Automation Through IRP Implementation

Incident Response Automation Through IRP Implementation - security lock

Author: Katie Teitler and Aleksandr Kuznetcov, PH.D., CISM, CISSP
Date Published: 1 September 2023
Related: Security Incident Management Audit Program | Digital | English

A large European managed security services provider (MSSP) and systems integrator (The Enterprise) runs multiple operations centers throughout the region. The Enterprise provides managed security services for more than 850 commercial and public organizations, from small businesses through major enterprises, across diverse industries and federal authorities.

The Enterprise must operate 365 days a year on a 24/7 basis. The Enterprise’s mission is to protect its clients from all types of attacks and respond if a cyberevent occurs. To support this mission, The Enterprise must operate best-in-class technology to identify, detect and respond to incidents. These systems include numerous technologies:

Secure email (through which The Enterprise communicates with its clients and partners)
Networkcentric detection and response tools
Security information and event management (SIEM)
Security orchestration, automation and response (SOAR)
Incident response platforms (IRPs)
Threat intelligence
Other expected technologies found in a traditional SOC

The Enterprise’s joint SOC has been in operation since 2012 and currently employs more than 400 cybersecurity experts who provide managed security services (MSSs) and managed detection and response (MDR) services, and build and operate cybersecurity systems for clients.

The Enterprise serves a diverse set of clients, including a major energy company that manages a range of assets across the Commonwealth of Independent States (CIS), Europe and Russia. This energy provider maintains various business units (BUs) including those that oversee the production and sale of electric and thermal energy; the engineering, design and construction of energy facilities; the governance of thermal and hydroelectric power plants, and the maintenance of electric grid and energy trading companies in the CIS and Europe. The customer operates more than 50 branches across different time zones, hundreds of information systems, internal and external IT services (for citizens, energy buyers, and government agencies), and supervisory control and data acquisition (SCADA) systems. This energy company contracted with The Enterprise to serve as its MDR provider, allowing its employees to focus on their core competencies while securely facilitating digital transformation.

Challenge

The Enterprise’s client (The Client), as with most energy companies worldwide, is in the middle of a digital transformation, taking old systems historically used to run energy facilities and modernizing them to serve today’s digital economy. The challenges of modernizing energy infrastructure are well known¹ and beyond the scope of this discussion. Needless to say, the consequences of exploitation of vulnerabilities in energy systems could result in dire consequences, not the least of which is loss of human life.

In addition, cyberattackers are increasingly taking advantage of the vulnerabilities in energy sector hardware and software, and the comingling of information technology/operational technology (IT/ OT) to affect damage.^2, ³ These facts necessitate an increase in staff, monitoring and cybersecurity governance of these systems. The Client employs internal staff (employees) who interact daily with digital systems. These employees have received training and certifications to ensure that they possess the latest knowledge about these systems. However, most of the training and certifications earned by The Client’s employees are related to IT systems and not cybersecurity explicitly, leaving gaps in coverage and knowledge, while increasing cyberrisk for the organization.

A logical solution to this problem is to simply hire more experienced staff to oversee the cybersecurity function. However, the worldwide cybersecurity staffing crisis means organizations across every sector are unable to hire an adequate number of trained and skilled security staff. In the case of The Client, to cover its 24/7 operational needs, it would need to hire more than 50 qualified security staff, the majority of whom would have expertise in incident response with a subspecialization in the energy sector. This is not possible given the circumstances.

After a careful assessment, The Enterprise concluded that The Client would need to centralize incident response functions and services at its headquarters or major service locations, making hiring even more challenging due to geographic restrictions.

Manual Processes and Not Enough Staff
Prior to working with The Enterprise, The Client’s incident response processes were long and laborious. Reacting to simple alerts or incidents took days and weeks instead of hours because everything was done manually, and teams were not aligned on priorities. Necessary key performance indicators (KPIs) had not been defined to address the most pressing issues first.

In addition to addressing staffing concerns, The Enterprise wanted to ensure that incident response functions were equipped with the right processes and technologies to support a modern-day incident response program able to fend off cyberattacks against the energy sector. Incorporating automation and repeatable tasks were primary factors for The Enterprise.

Reacting to simple alerts or incidents took days and weeks instead of hours because everything was done manually, and teams were not aligned on priorities.

Solution

When The Client hired The Enterprise, its main goals were to gain assistance with overseeing security operations and to help fill the security gaps related to:

Staffing—The Enterprise supported The Client with the appropriate number of technical staff and staff member expertise.
Infrastructure—The Client maintained a heterogenous and extended IT infrastructure which introduced management complexity. The Enterprise deployed and managed tools to provide the right level of monitoring and control over The Client’s environments.
Network access—The Client maintained limited network access from its headquarters to its branch locations, thereby restricting the type of work that could be done and hindering visibility into normal and anomalous operations and activity on the network. The Enterprise set up secure network access to ensure that network governance and control were managed.

For proper and complete asset management, the data collection process (as defined by the data model) needed to clearly identify and display all connectivity and data transport mechanisms between data sources and data flows.

The Client contracted with The Enterprise to assist with incident management and response-in particular, improving detection and response capabilities. At the start of the engagement, all cyberincident work at The Client’s site was being done manually, which wasted significant time and effort, was error-prone, and did not lend itself to timely or appropriate response actions that could meaningfully reduce risk.

Further, when The Enterprise was onboarded as a provider, the main incident response support tool between the organizations was email. This meant that many of the follow-up actions recommended by the enterprise (to be executed by The Client) fell into a black hole of communication. The Enterprise could not know whether an active response was being undertaken by The Client or if the recommendations were being ignored or deprioritized. This lack of visibility increased both risk and frustration.

To improve incident response and, thus, cybersecurity and risk management for The Client, The Enterprise implemented functionality in three main areas.

IT Asset Management
To begin any functional cybersecurity or risk program, organizations must uncover and understand the scope of assets and the assets’ related operational and security states. Without basic visibility, it is highly challenging and time-intensive to uncover vulnerabilities within systems. Further, due to the time it takes to conduct a manual asset inventory, inventories conducted without automation are highly inaccurate, making it impossible for organizations to effectively triage or remediate any event, incident or active exploit.

The Client leveraged approximately 20 different data sources and services that were already deployed in its environment. The data sources included the SIEM, Internet Protocol Address Management (IPAM), agent-based endpoint management, the antivirus software, various local databases and more. The data sources also included external services such as VirusTotal, GeoIP and others. The Enterprise stitched together data using a general data model to create an inventory that was manageable and fed the data into an incident response platform (IRP) for further analysis.

In addition to pulling together data from The Client’s environment to create an asset inventory, The Enterprise created a data model that would be fed into the IRP to enumerate various key points. The Enterprise felt it was critical for the model to be part of any implemented solution because a commercial off-the-shelf solution, without any customization, would not be sufficient for The Client’s need.

As such, the data model required an information-gathering step that would allow The Enterprise’s staff to understand which data sources and data (primary keys) were present. These data were necessary to achieve the desired outcomes, which included faster response times and risk reduction.

Further, for proper and complete asset management, the data collection process (as defined by the data model) needed to clearly identify and display all connectivity and data transport mechanisms between data sources and data flows.

In addition to tools The Client had already deployed, two tools were added to The Client’s environment to ensure that the most accurate and actionable data could be consumed by the IRP: vulnerability management (active security scanners) and an IT asset module within the IRP.

Localization Automation
Within the IRP, team members at The Enterprise created incident localization automation tasks within incident response, a testing technique used to block or isolate suspicious hosts or activities. Localization technology "can facilitate internal process, streamline workflows, increase efficiency, and boost quality"⁴ for otherwise repetitive tasks, speeding up time to delivery, increasing accuracy and ensuring scalability.

Three criteria for localization automation tasks were used to determine how The Client’s systems would autorespond to various alerts. There was a low-level designation intended to be used for issues such as the validation of false positives or in cases in which the impact of business process interruption was minimal. A high-level determination would indicate a system compromise. These three criteria (two low and one high) would help automate workflows for response, whether that meant something as simple as ignoring the alert or something more impactful such as locking accounts, blocking devices (i.e., network isolation) or ceasing suspicious processes.

IRP
Before The Enterprise could begin its work, The Client had to select and implement a SOAR IRP platform. The Enterprise felt that any chosen technology must include case and incident management, workflow management and the building of an incident knowledge base.

To choose the right incident response capability, The Enterprise used three criteria to select the most suitable system for The Client:

Feature/functionality comparisons
Analyses by leading research analyst groups
Internal incident management process assessments (to determine an appropriate level of automation needed and identify security coverage gaps)

The third point was the most important for The Enterprise; it is not a standard approach, but the team felt it was the most accurate and appropriate for this circumstance. Further, The Enterprise was able to customize the solution to meet The Client’s exact needs using an individual assessment rather than standard industry approaches (i.e., merely feature/functionality comparisons, analyses by leading research analyst groups).

Results

The technical solution to improve The Client’s incident response program centered around choosing and deploying the best commercial off-the-shelf incident response solution and then customizing it to its needs. The Enterprise executed several steps to customize the IRP:

Existing parameters were estimated for every step or decision made within incident response.
Every response team action was dictated by a process step, as determined by the data model. Every action was designated as "sufficient," "insufficient" or "not applicable" prior to an automated action.
Criticality was assigned to every issue (IRP function) related to any response team action: block (critical), high, medium or low.
Response team workflows were automated via IRP customization, depending on the priority and criteria.

Process
It was important to The Enterprise and The Client to create a step-by-step process for both selection and implementation. Just as important, The Enterprise wanted to ensure that the tool could offer ongoing support throughout The Client’s entire incident response journey.

The Enterprise created a data model for incident response workflows based on a general data model (figure 1). The goal of the model was to build a repeatable process by which The Client could run an ongoing incident response program that would allow it to handle incidents with a prioritization mechanism and thus drive down cyberrisk and organizational risk. The organization included data from integrated systems that would capture data from the IRP to assist with decision-making. Automations are being added so that The Client can easily and efficiently execute incident response playbooks.

Playbooks
One obstacle that arose was the realization that traditional physical playbooks were insufficient for modern-day incident response and modern computing. The Enterprise knew it needed to modernize incident response playbook workflows (figure 2).

The Enterprise wanted to ensure that The Client was fully embracing digital transformation and so provided a list of requirements for paper playbook content. Recommendations included:

Assignment of a procedure ID
Assignment of a procedure administrator (admin)
Listing of involved participants
Duration of the procedure
Input data
Output data
Action algorithms

All paper playbooks were redesigned according to these specifications.

Automation
In addition to the incorporation of playbooks and workflows, The Enterprise was able to initiate some automation for The Client. Not all playbook content was able to be automated at once; however, the primary areas of automation focus were:

Network isolation—Reduces malicious lateral movement
Disabling Universal Serial Bus (USB) device ports—Reduces the risk that malicious content will be uploaded to enterprise systems
Domain account lockouts—Locks accounts when suspicious access attempts are made
Automatic file deletion—Deletes unknown or suspicious files to prevent malicious payloads
Automatic disabling of anomalous or out-of-band operating system (OS) processes or services—Prevents malicious execution

Related to cybersecurity and risk management functionality, The Enterprise was able to accomplish:

IT asset management—Using data collected from various technologies in The Client’s networking environment, The Enterprise was able to leverage the IRP to accomplish a basic IT asset inventory, understand The Client’s digital asset ecosystem, and scan assets to learn about its security state. Using this preliminary information, The Enterprise was able to notify The Client’s administrators of any necessary response actions so that The Client could act on any incidents or triage issues that might impact the environment.

Using the asset management functionality, The Client was able to set up correct routing for notifications about incidents via email and chatbots. Information about IT assets gleaned from the IRP was also instrumental in providing the necessary enrichment for decision-making and tactical enforcement actions around risky assets or assets compromised by tampering.
Incident localization automation—The Enterprise used automated incident localization tasks via integration between deployed tools. Tasks were also used to automate scripting functionality that would identify when malicious or suspicious sources (e.g., hosts or accounts) were trying to obtain system access and set rules for blocking those potentially malicious sources before they could affect system damage.

The mean time to respond to incidents was reduced as a result of the new process, as was the total number of cybersecurity incidents that resulted in some form of damage or disruption to The Client.

The Enterprise is looking to make further improvements to the IRP, including greater use of automation localization tasks, the use of automation and machine learning (ML) to reduce the number of false positives, and automated enforcement actions for remediation.

Benefits

Both The Enterprise and The Client experienced numerous cybersecurity benefits as a result of the described work. The primary benefits include:

Time savings—The time needed for security incident localization was reduced from days to seconds for some incident types. For the remaining incident types, a service level agreement (SLA) was created to ensure that there would be no black hole of communication in regard to incident response and reporting.
Risk reduction—The number of security incidents that had the potential to inflict real harm to The Client’s organization was reduced.

Since low-level tasks became automated, the current staff had more time to focus on higher-level activities and more strategic decisions, and contribute to more positive business outcomes for the organization.

Another positive business outcome was cost savings. Due to the IRP implementation and automated workflows built into the technical solution, The Client was able to reduce the number of internal employees required to manage processes by approximately 10 percent.

In addition to cost savings, The Client was able to allocate more human resources to other IT projects. Since low-level tasks were automated, the current staff had more time to focus on higher-level activities and more strategic decisions, and contribute to more positive business outcomes for the organization.

Finally, the difficulty of hiring qualified cybersecurity staff was mitigated significantly as a result of the project and the aforementioned benefits. The Client’s human resources (HR) and security teams no longer had to spend time, effort and excess budget looking for hard-to-find cybersecurity talent and could therefore spend that time recruiting for other necessary positions within the organization.

Endnotes

¹ Teitler, K.; "Critical Infrastructure Attack Reveals Why Access Should be the Nexus of Your Security Program," HMG Strategy, 19 February 2021, http://hmgstrategy.com/resource-center/articles/2021/02/19/critical-infrastructure-attack-reveals-why-access-should-be-the-nexus-of-your-security-program
² Bailey, T.; A. Maruyama; D. Wallance; "The Energy-Sector Threat: How to Address Cybersecurity Vulnerabilities," McKinsey and Company, 3 November 2020, http://www.mckinsey.com/capabilities/risk-and-resilience/our-insights/the-energy-sector-threat-how-to-address-cybersecurity-vulnerabilities
³ US Department of Energy, CESR Blueprint, USA, 2021, http://www.energy.gov/sites/prod/files/2021/01/f82/CESER%20Blueprint%202021.pdf
⁴ Phrase, "How Global Businesses Benefit From Localization Automation," 15 November 2022, http://phrase.com/blog/posts/top-benefits-localization-automation/

KATIE TEITLER

Is a senior product marketing manager at Axonius where she is responsible for the company’s cybersecurity asset management product messaging. She is also a co-host on the popular podcast Enterprise Security Weekly. Prior to her current roles, Teitler was a senior analyst at a small cybersecurity analyst firm, advising security vendors and end-user organizations and authoring custom content. In previous roles, she managed, wrote and published content for various research firms including MISTI (now part of the CyberRiskAlliance), and a cybersecurity events company. She was also the director of content at Edgewise Networks, now part of ZScaler.

ALEKSANDR KUZNETCOV | PH.D., CISM, CISSP

Is a security operations center (SOC) architecture team leader and an independent cybersecurity expert. He has more than 15 years of experience in cybersecurity projects within Asia, the Commonwealth of Independent States and Russia. Currently he teaches students about cybersecurity.

Home / Resources / ISACA Journal / Issues / 2023 / Volume 5 / Case Study Incident Response Automation Through IRP Implementation