Big Data is Dead! But “Small” Data Isn’t Ready for Primetime
Last month I had the honor of presenting on a panel at Relativity Fest, one of the most prominent events in the eDiscovery community. Our session titled “Little Data: The Results of the 2019 Relativity/CTRL Study on Data Minimization” contained the big “reveal” on survey results that had been months in the making.
The survey was commissioned by the Coalition of Technology Resources for Lawyers (CTRL) –an industry forum I help found in 2014. CTRL’s mission is to “advance the discussion on the use of technology and analytics in the practice of law.”
As we began planning for this survey (sponsored by Relativity and conducted by Osterman Research), we honed in on the emerging topic of data minimization. For those unfamiliar with the term, it is defined as:
The concept that companies should limit the data they collect and retain, and dispose of it once they no longer need it.
The interesting thing about data minimization is the nexus with recent data privacy regimes throughout the world, including the EU’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Since these regulations are relatively new, CTRL’s goals with the survey were to understand the maturity of organizations as they respond to this new landscape.
While I’d urge you to read the complete findings, several interesting excerpts are highlighted below. Initially, it’s interesting to note that data minimization has quickly evolved beyond merely disposing of non-valuable content (although that element can be important too). Significantly, it has started to morph into a way to minimize data retention by design, i.e., to stop collecting data that doesn’t fulfill a legitimate business objective. This sea change CAN’T be overstated.
What Does Data Minimization Actually Mean to an Organization?
This change in data governance has a range of interesting drivers, many regulatory. Leading the pack was IG “best practices”. This term is a bit of an umbrella concept that encompasses regulatory compliance, records management and general data hygiene. Notably, while data storage costs still made the list, this business driver isn’t as critical as it may have been even 5-10 years ago.
Data Mapping? Data Discovery? If the Nomenclature is Confusing, so is the Process.
One of the first things a company needs to do as they begin on their minimization quest is to understand their current data estate. Whether called “data mapping” or “personal information inventory” or “data discovery” this stage is foundational to any downstream attempts to govern existing stores and often precedes efforts to minimize future information collection. Here, the 120 survey respondents were clearly vexed:
As the graphic indicates, only a third felt like they really understood their data universe and it’s potential to contain risky, sensitive or personal data. Even the 34% who believe they “completely” understood their data landscape had a hard time when they had to break down their data silos into structured, semi-structured and unstructured categories. Without editorializing too much, the following graphic seems to show that confidence across all data categories would be low.
Given that more than 80% of a company’s data is typically unstructured, the fact that organizations have a very low (15%) handle on that data source means that comprehensive data visibility is paltry. And, as we’ve seen with recent high-profile data breaches, it’s this dark/unstructured data that ultimately contains the most risk in terms of sensitive content that is unguarded.
Finally, the survey did diagnose one potential area of conflict for organizations that want to aggressively minimize their content – legal holds.
Fortunately, both the survey data and the discussion in our panel seems to bear out that most should not be concerned if they have a well thought out/documented IG program, which should be sufficient insulation against any spoliation charges.
Data minimization is on the cusp of materially impacting organizations. The CCPA will very quickly drive this theme home for many organizations. Then the hard work really begins – both to minimize the intake of superfluous data and the cleanup of sensitive information they have already been collecting for years.
Click here to download a full copy of the survey findings.