Seven Software-related Incidents and How to Avoid or Remediate Them

Author: Frederick G. Mackaden, CISA, CMA, PMP
Date Published: 1 January 2016

Verizon’s 2015 Data Breach Investigations Report,¹ which addresses industry verticals such as education, entertainment and manufacturing, points to key incidents related to data breaches to watch for, primarily:

Education: Crimeware (represents “Malware infections within Organizations not associated with specialized patterns”²), miscellaneous errors, cyberespionage
Manufacturing: Cyberespionage, crimeware, insider misuse

This article focuses on miscellaneous errors and insider misuse as these are not as closely monitored, perhaps because they are not perceived as an external threat. The potential incident is created by errors of insiders (employees) and approved suppliers who function as trustworthy insiders. In these cases, a malicious intent may not be present, yet the errors can become catastrophic.

It goes without saying that risk relating to incidents such as those mentioned need to be planned for, and potential disaster recovery options should be in place. Otherwise, the organization will be caught off guard and suffer real business losses in terms of delays in cash flow, which could seriously impact the stability of the business, especially in markets where customers delay payments.

This article will focus on seven software incidents that caused a great deal of panic and heartache.³ Nearly 60 percent of the incidents resulted from incorrect or accidental deletions. The others were caused by faulty customized code and lack of comprehensive testing prior to promotion to the live environment. The whole point is that such errors often slip by unnoticed by the IT department until operations actually grind to a halt.

Any multinational organization should have incident management and disaster recovery measures in place so that such events can be prevented, at best, or recovered from, at worst, with minimal impact to the business as a whole.

Lessons learned from incidents when they occur are very important. Effective communication is key in all of this, especially when programmers are offshore in another country and only telephonic or other electronic communication methods are possible (figure 1, incident 1). Of course, everyone knows that face-to-face communication is best, but that is not possible when one team is located in one country and another team is located in the headquarters in a different country. If weekly project calls are not held and interactions with the offshore programmers and the onshore functional consultants are minimal, it can lead to problems down the line.

Effective change management acts as a preventive step, especially with respect to programming code movement across environments. When customized code is moved across software environments, it needs to follow a gate review process in which the senior management representatives (who look at the business in its entirety) along with concerned independent specialists (who focus on the technical angle) open the gate or door into the next environment or request the programmers to review the code due to a critical testing failure. This process enables the team to revisit the work and check whether all is well before launching the (software) vessel on the high seas. Figure 2 gives a graphic design of this process. Change management can be done for hardware as well as software changes. Unfortunately, this was not the case in the examples in figure 1, incidents 1 and 2.

Figure 1 details seven incidents which caused significant turmoil and distress.

In one instance (figure 1, incident 3), the functional consultant had forgotten the program number of the functional route to the problem and resorted to the technical route to solve the issue. The technician concerned also did not know the English language well and further complicated the situation as he also was not sure of what needed to be done.

Once an incident has happened, it should be logged into the incident reporting system. The incident reporting system alerts key personnel to an emergency and analysts and programmers relevant to the task can be deployed. The recovery project needs to be monitored closely. In one instance (figure 1, incident 7), the recovery was possible within a few hours as it was one of the best practices of that organization to mirror key files of the enterprise resource planning (ERP) system on a real-time basis. This, coupled with the fact that the organization had extremely skilled technicians who set to work immediately and were successful in their endeavor, enabled the organization to recover within the same business day. The disaster recovery measures in place thus enabled the organization to recover from the emergency and ensured correction of the issue in an extraordinarily swift manner. It is notable that in this case, an incident report from one country’s user who noticed the anomaly triggered the response. This underscores the need, even in the midst of a crisis, to issue the appropriate communication to all users of the software and the senior management concerned. A matrix of who needs to be informed and when, (e.g., hourly or at the end of the business day) would also be helpful.

Maintenance of journal files (which tracked database activity such as Update, Delete, Insert to the concerned users) ensured tracing the root cause of the incident in at least one of the instances (figure 1, incident 5).

IT heads also need to focus inward as the actors responsible for incidents may be within the same building. An organization with effective internal control systems needs to have an effective backup regime. Mirroring of key files on a real-time basis nightly, along with regular weekly and monthly backups, ensures that data are protected. Occasionally restoring runs of the available backup helps ensure that staff are prepared for such incidents and also supports confirming the veracity of the backup tapes, even when things are going pretty well. In fact, the recovery in just a few hours demonstrated in one of the scenarios (figure 1, incident 7) was possible due to the mirroring of the key files and the backup that resulted from this.

Conclusion

IT stakeholders need to look inward to ensure that their data are safe at all times. Maintenance of confidentiality, integrity and availability are priorities that are always present in the increasingly complex world of enterprise software among others. IT professionals must watch for miscellaneous errors and insider misuse especially.

Five processes, when performed effectively, can help prevent dire and distressing situations:

Effective communication
Change management
Backup and restore
Incident reporting
Crisis management

That “integrity rings like fine glass, true, clear and reassuring”⁴ is true for data and the software environments that create and sustain it. Indeed, this needs to be assured not just for the data within an organization, but all of the hardware, software, and people responsible and accountable for them.

Endnotes

¹ Verizon, 2015 Data Breach Investigations Report, www.verizonenterprise.com/DBIR/2015/
² Ibid., page 39
³ This article highlights software incidents that caused great distress in the author’s personal experience or led to a project failure, but the focus is on a preventive, rather than a corrective, approach.
⁴ Brown, P.; Helen Exley Giftbook, Watford, UK, 2002

Frederick G. Mackaden, CISA, CMA, PMP, is currently with Crowe Horwath, a leading consulting network in the global top 10. He recently implemented management controls for a leading hospital group in the Middle East through Horwath MAK, the consulting arm of Crowe Horwath in the Middle East. His previous employers include a Fortune 500 multinational, where he worked as an enterprise resource planning (ERP) specialist supporting finance, sales, purchasing and manufacturing modules. He has more than a decade of experience in the ERP consulting environment and more than 25 years of experience overall. He is one of the contributors and reviewers of A Guide to the Project Management Body of Knowledge, 5^th Edition.

Home / Resources / ISACA Journal / Issues / 2016 / Volume 1 / Seven Software-related Incidents and How to Avoid or Remediate Them

Seven Software-related Incidents and How to Avoid or Remediate Them

Conclusion

Endnotes