Information Security Matters: Privacy in the Dark (Data)

Privacy in the Dark Data
Author: Steven J. Ross, CISA, CDPSE, AFBCI, MBCP
Date Published: 31 December 2021
Related: Ensuring Privacy Through Effective Data Management | Digital | English

In my previous column,1 I wrote about the security of dark data. As a refresher, dark data are information collected for a variety of purposes, filed away (usually in unstructured network attached storage) and forgotten. According to informed sources, 55 percent of all enterprise data are dark,2 so there is a lot of information just sitting there waiting to be (mis)used.

Much of these dark data were bright as day when they were collected. They come from a variety of sources, including logs, networking systems, surveillance systems, job applications, the sensors in industrial systems and the Internet of Things (IoT).3, 4 If these data refer directly or indirectly to people, there is a potential for privacy issues to arise.

Examples of Dark Data Containing Personal Information

Some of these issues are or should be obvious. For example, enterprises in both the public and private sectors are expected to maintain the privacy of information about their employees. But what about the information regarding people who were not hired? A widely cited statistic states that each corporate job opening attracts 250 résumés,5 meaning that 249 people were not hired. A common refrain in letters turning down an application is “We will keep your résumé on file.” If that is true (I always doubted it), then all that personal information was filed and forgotten—the very definition of dark data.

The logs of physical access control systems are an instance of indirect personal information that might become dark data. People, including visitors, enter many facilities only after swiping a badge or coded visitor pass. One well-recognized standard calls for the records generated to be retained for 90 days.6 Does everyone actually destroy those records at the end of the period? Or do they become dark data?

Potential Privacy Violations

In both these examples, the potential for a privacy violation exists, but is there an actual violation if the personal information is not disclosed? The answer depends greatly on whose data, kept where and by whom? If the data subject is a citizen of a country in the European Union, there is a case to be made that the mere fact that a person’s data are dark implies that they are being used for purposes other than those that are necessary, kept for longer than necessary and not processed in a manner that ensures appropriate security;7 hence, the person’s privacy has been violated. There are also laws that might apply in other jurisdictions. And in those without applicable legislation, there are generally accepted principles, such as those enunciated by the American Institute of Certified Public Accountants (AICPA)/CPA Canada, Generally Accepted Privacy Principles,8 which say much the same as the EU General Data Protection Regulation (GDPR).

But how would anyone know that their information had been disclosed if it is all dark? I can think of several ways. Using the example of physical access control records, an opportunistic hacker might go looking for evidence of a person being in a place he or she was not supposed to be, perhaps for blackmail or other nefarious purposes. Or a cyberattacker9 might search for evidence that a cabal was being organized, based on the presence of all the members of a gang in the same place at the same time.

Okay, I have been reading too many “The Girl Who…” novels, but the point is still valid.

Unfortunately, it is not so farfetched to contemplate a government searching through an adversary’s personnel or physical security systems for compromising or otherwise useful information. It has happened.10 A direct attack on the primary human resources or access control systems might yield more information than going after dark data. But such attempts might also be more likely to be detected than attacks on lightly protected dark data.

The mere fact that no one really knows what is in all that darkness does not mean that someone might not suspect that something valuable is there.

Besides data theft, there is the very real possibility of a search through dark data, in the legal process known as discovery, which is usually involved in both civil suits, but also, potentially, in criminal investigations.11 The mere fact that no one really knows what is in all that darkness does not mean that someone might not suspect that something valuable is there. Modern ediscovery tools are intended to identify, collect and produce electronically stored information in response to a request for production in a lawsuit or investigation.12 And, in fact, there are a number of software vendors who advertise that their tools can do exactly that regarding dark data.13

Is the Sky Falling?

I have to admit that the privacy issues that I have described regarding dark data are all potentials, not actuals. After some vigorous searching, I can find no legal cases that hinge on disclosure of dark data. I did find one interesting law review article from American University (Washington DC, USA) that describes a case that was filed by the US Federal Trade Commission alleging that a company had misled its customers, in that the company promised (and failed) to filter out personally identifiable information picked up by its focused advertising software.14 This is slim pickings to show that dark data pose a current legal or regulatory threat.

This lack of documented incidents worries me because I have long been leery of The Sky Is Falling school of communicating the message of information security generally and data privacy in particular. On the other hand, no one can be ahead of the curve without seeing an issue approaching before it descends upon us. I will be very happy if, in a decade, someone comes across this article and says I was all hopped up over nothing. But I suppose Cassandra felt the same way.

Endnotes

1 Ross, S.; “Afraid of the Dark (Data),” ISACA® Journal, vol. 6, 2021, http://h04.v6pu.com/archives
2 Splunk, The State of Dark Data, USA, 2019, p. 3, http://www.splunk.com/en_us/form/the-state-of-dark-data.html
3 AnswerMiner, “Dark Data 101: Everything You Need to Know,” 31 August 2020, http://www.answerminer.com/blog/dark-data
4 Marsh, S.; “Dark Data—The Blind Spots in Your Analytics,” iDashboards, http://www.idashboards.com/blog/2019/01/30/dark-data-the-blind-spots-in-your-analytics/
5 It seems to me that all the citations are quoting one another. As best I can tell, the original statement was from a company called Glassdoor, a recruitment information company, “50 HR and Recruiting Stats That Make You Think,” 20 January 2015, http://www.glassdoor.com/employers/blog/50-hr-recruiting-stats-make-think/. This number may have changed in the intervening years.
6 PCI Security Standards Council, Payment Card Industry (PCI) Card Production and Provisioning, USA, December 2016, p. 8, http://www.pcisecuritystandards.org/documents/PCI_Card_Production_Physical_Security_Requirements_v2_Nov2016.pdf
7 Intersoft Consulting, Art. 5 GDPR, Principles Relating to Processing of Personal Data, Belgium, 2018, http://gdpr-info.eu/art-5-gdpr/
8 Chartered Professional Accountants Canada, Generally Accepted Privacy Principles (GAPP) in Privacy Policy Development, Canada, http://www.cpacanada.ca/en/business-and-accounting-resources/other-general-business-topics/information-management-and-technology/publications/business-and-organizational-privacy-policy-resources/gapp-in-privacy-policy-development
9 Note that I use “hacker” in the first case and “cyberattacker” in the second. To me, a cyberattack means deliberate, targeted and malicious misuse of an organization’s systems. In the first instance, the snoop was trying to get information on a data subject, rather than the organization holding the data, thus, he or she is a hacker, not an attacker.
10 For example, the attack on the US Office of Personnel Management: See Keoerner, B. I.; “Inside the Cyberattack That Shocked the US Government,” Wired, 23 October 2016, http://www.wired.com/2016/10/inside-cyberattack-shocked-us-government/
11 Justia, “Discovery in Criminal Cases,” May 2019, http://www.justia.com/criminal/procedure/discovery-in-criminal-cases/
12 Complete Discovery Source, “The Basics: What Is e-Discovery?” http://cdslegal.com/knowledge/the-basics-what-is-e-discovery/
13 A brief web search produced these at the top of the list: Epiq, “Four Steps to Shed Light on Dark Data,” http://www.epiqglobal.com/en-us/thinking/blog/four-steps-to-shed-light-on-dark-data; Everlaw, “Illuminating Dark Data With Everlaw,” http://www.everlaw.com/blog/2019/03/26/illuminating-dark-data-everlaw/; Heureka Software, “What Is Dark Data and Should You be Worried?” http://www.heurekasoftware.com/what-is-dark-data-and-should-you-be-worried/
14 Grimm, D. J.; “The Dark Data Quandary,” American University Law Review, vol. 68, iss. 3, 2019, www.aulawreview.org/the-dark-data-quandary/

Steven J. Ross | CISA, CDPSE, AFBCI, MBCP

Is executive principal of Risk Masters International LLC. Ross has been writing one of the Journal’s most popular columns since 1998. He can be reached at stross@riskmastersintl.com.