Writing a Charter for an Enterprise Security Operations Center

Larry Wlosinski
Author: Larry G. Wlosinski, CISA, CISM, CRISC, CDPSE, CISSP, CCSP, CAP, PMP, CBCP, CIPM, CDP, ITIL v3
Date Published: 30 June 2022

The security operations center (SOC) is the heart of information security for medium- to large-sized organizations. It ensures organizational cyber well-being by monitoring the infrastructure and managing its cyberhealth. The SOC uses sensors to monitor the status of the software and hardware devices to check for weaknesses and areas of risk, which are reported to management who direct remedial actions as part of everyday operations.

The mission of the SOC is often understated or not defined at all. The primary purpose of the SOC is to perform the core functions associated with cyber incidents, but there are also secondary functions to be performed (i.e., management and ongoing operations).

The SOC should be the central point of escalation, support and guidance for the organization’s security program. It helps develop road maps for information security and privacy prevention, protection, and response; identify future priorities; and define architecture design requirements. It should also be the responsibility of the SOC to review and control the quality of incident information compiled and lead the continuous improvement of all SOC capabilities. Additionally, the SOC supports security advisory capabilities within the organization and manages the actions required to resolve problems noted in monitoring reports and associated metrics.

To establish a strong SOC and support its requirements, a charter should be written that identifies the core operational and management activities that must be performed.

Outline Core SOC Functions

The core functions of the SOC include the full list of requirements of an information security framework. A number of attributes should be outlined in the SOC charter, including:

  • Prevention activities—Proactively gathering, analyzing, and disseminating relevant threat intelligence to prevent, prepare for, and communicate information about upcoming attacks. Prevention also involves developing, updating and conducting research for cybersecurity tools. Training security staff is an additional responsibility that should not be overlooked, particularly when staffing changes occur or new software tools are adopted.
  • Protection—Conducting threat hunting; monitoring infrastructure, systems, and applications; and ensuring that the infrastructure has backups and other necessary recovery components
  • Detection—Constantly performing infrastructure security monitoring and detection. Techniques include conducting vulnerability assessments, supporting audits and incident ticket handling, and providing reports on findings.
  • Triage (i.e., determining urgency)—SOC teams categorize, correlate, and prioritize events, and create assignments for further investigation and possible response.
  • Investigation—Conducting digital forensics and correlation activities to determine an incident’s scope, impact or incident’s root cause
  • Incident response—Developing incident response plans, coordinating activities and performing response duties to address cybersecurity and privacy incidents. Incident response duties include traffic and event monitoring of attack vectors (for malware and attacker targets and activity) and informing management of the scope, volume and speed of intrusions and compromises. Playbooks and exercises are key to quick and effective responses.
  • Containment and restoration—Containing cybersecurity and privacy incidents, and supporting system and data restoration services. Containment could be as simple as disconnecting compromised devices or a subnet, or it could escalate to organizational alerts. Restoration could involve reinstalling the operating system (OS) and/or software products or simply rebooting the affected devices. The scale of the incident and amount of work varies by the effectiveness of the attack, time to detection, time to investigate and decide course of action, and time to implement the response.
  • Recovery—Supports (and sometimes directs) and assists with recovery efforts. When servers and storage areas are affected, recovery requires installing backup software, data files, and transaction files. Recreating the production environment’s last optimal state may be difficult if backups are not performed frequently or have not been protected, contingency plan1 documentation and procedures are out-of-date, and/or staff have not been trained and tested.2 If recovery is predicated by a ransomware attack,3 management may become involved in negotiations with the attacker, the contingency plan may be needed (because this may be considered a disaster event), or extensive work may be required to recreate the systems, data, and production environment.

Identify Management Functions

In addition to the core SOC functions, the charter should identify management and coordination requirements. Management should make it a priority to act as the official point of contact for law enforcement agencies with regard to information security incidents and lead the response to information security incidents in a professional, effective and timely manner.

Support Ongoing Operational Functions

Another secondary function of the SOC that should be detailed in the charter is supporting ongoing operations. These activities may include providing internal policy and oversight compliance support and information, guidance, and assistance to reduce the risk of information security incidents. Supporting other security teams is necessary to challenge, adjust, improve or introduce security controls in a timely manner. The SOC is intended to protect the organization from cyberattacks by overseeing network operations and cybersecurity processes and capabilities. Providing support for SOC architecture design and engineering, conducting security administration and managing the organization’s penetration testing program is crucial.

In large organizations, the operational area of concern could easily become complicated if cloud service providers (CSPs) are used, systems are contracted to external data centers, or multiple data centers are in place to support regions across the country. Coordination with multiple sites, service providers, and vendors requires knowledge of the many points of contact (i.e., managers, vendors, response teams, support staff), architectures involved, system configuration weaknesses and vulnerabilities, and coordinated and regular testing. In-house support and Security-as-a-Service (SecaaS) can also complicate attack awareness, monitoring, containment, response and recovery.

Capitalize on Metrics

An often overlooked resource for operational and managerial functions is metrics. Collecting metrics provides a means of measuring and tracking performance, incident activity (i.e., when an incident is opened or closed) and trends. A metrics tool can be implemented in the areas of management, prevention, protection, detection, containment and restoration, recovery and operations.

Metrics can help management track the volume of incidents, their impact and cost. Metrics may include the number of cybersecurity incidents over time, the number of privacy incidents and number of individuals affected by them, and the number of high-impact incidents. Determining the total cost of the incident (i.e., cost of labor, travel expenses, equipment and third-party organizations required to resolve the incident) is an important business metric. These metrics not only help justify the cost of the software, staff, testing, and training, but could underline the need for better and more tools and programs (e.g., artificial intelligence [AI],4 threat hunting). Investments and expenses must be tracked because they are important to the organization’s bottom line and overall success. Failing to prevent or quickly address a ransomware attack could be detrimental to the organization’s survival.

Metrics associated with prevention measure the effectiveness of an organization’s remediation efforts. Examples include:

  • Number of devices with critical and high vulnerabilities
  • Number of critical and high vulnerabilities by device and subnet
  • Number of threats uncovered by month
  • Number of penetration tests performed and the number of vulnerabilities by severity and type

Knowledge of vulnerabilities concerning global threat intelligence5 can improve management’s awareness of the external threat environment. Information from vendors and threat monitoring organizations can aid in the prevention (and recovery) of new attacks and possibly make the threat a nonissue.

Useful protection metrics include the number of alerts distributed by time period, the number of devices with updated antivirus software, the number of devices with the latest security patches and the number of devices without them. Metrics associated with configuration weaknesses are also key to identifying where remedial efforts (e.g., staffing, funding, time) are needed. Understanding such metrics helps organizations allocate resources more efficiently.

Common metrics associated with incident detection, containment and restoration that are operational in nature include the time to detect a cybersecurity threat, the mean time to detect or discover, the time to respond to a cybersecurity threat, the time to contain a cybersecurity threat and the number of threats detected by cybersecurity tools over time. These metrics provide information about how the effectiveness of the SOC has improved or weakened over time. Change could be required in terms of tools used, staffing (i.e., replacing existing employees, acquiring new ones), training, exercises, and possible vendor (or contractor) support. Cross-training with regional SOCs can also be helpful.

Incident recovery metrics provide insight into the effectiveness of the incident response team. Metrics may include the time to cybersecurity threat resolution, the mean time to recovery or repair and the downtime to recovery (i.e., duration of business outage). Trends over time will drive the need for changes in resources, staffing, processes, procedures and possibly the entire oversight approach and recovery program.

Conclusion

A charter for an SOC helps define minimum management and operational requirements and provide a secure operational environment that keeps management informed of current events, information security weaknesses (and volume) and trends. The benefits of an SOC include continuous protection, quick and effective response, decreased costs of breaches and operations, threat prevention, security expertise, communication and collaboration, regulatory compliance and improved business reputation. The benefits of informed decisions via metrics include improved management of the information security posture and program, prevention of future cybersecurity incidents, avoidance of significant financial losses, prevents damage to reputation, ensures the ability to quickly respond to government data protection and compliance regulations and/or board oversight queries. Having a center of excellence can be a model for other SOCs under the control of the organization. A charter defines what is needed for a particular organization.

Endnotes

1 Wlosinski, L.; “Information System Contingency Planning Guidance,” ISACA® Journal, vol. 3, 2021
2 Wlosinski, L.; “Cybersecurity Incident Response Exercise Guidance,” ISACA Journal, vol. 1, 2022
3 Wlosinski, L.; “Ransomware Response, Safeguards and Countermeasures,” ISACA Journal, vol. 5, 2020
4 Wlosinski, L.; “Understanding and Managing the Artificial Intelligence Threat,” ISACA Journal, vol. 1, 2020
5 Wlosinski, L.; “Cyberthreat Intelligence as a Proactive Extension to Incident Response,” ISACA Journal, vol. 6, 2021

Larry G. Wlosinski, CISA, CRISC, CISM, CDPSE, CAP, CBCP, CCSP, CDP, CIPM, CISSP, ITIL v3, PMP

Is a senior consultant at Coalfire Federal. He has more than 22 years of experience in IT security and privacy and has spoken at US government and professional conferences on these topics. He has written numerous magazine and newspaper articles, reviewed various ISACA® publications, and written questions for the Certified Information Security Manager® (CISM®) and Certified in Risk and Information Systems Control® (CRISC®) examinations.