Get Better at Storing Less Data: Part 1 - An Overview

Written by Rich Hale | Jun 29, 2023 8:40:09 AM

Introduction

In this blog series we're going to look at the practical implications of making Information Governance (IG) stick. We're doing this because it’s clear that most people think IG is important (few people dispute it) but, in practical terms, it’s tough to execute, especially when it comes to governance of unstructured (or user-generated) data.

There are many reasons for this apparent contradiction. Probably the most important is that IG is a lot like an insurance policy. That is, in today's world, IG can seem like a luxury when you haven't been at the sharp end of a data breach, a fine from a regulator or some painful data event where the accumulation of years of poor (or non-existing practices) come home to roost.

What does it Mean?

This series is going to cover (as the title suggests) governing unstructured, user generated, data during its normal daily use. If you’re not familiar with the Electronic Discovery Reference Model (EDRM) you can learn more here; very briefly, the left-hand side of this model refers to IG and the management of data in use, as opposed to when a legal discovery event has been initiated. As a result, we'll be looking at preventive (or proactive) activity and practices. The good work that protects the organization and, when done well, reduces the cost and liability of data ownership and, in the most mature cases, can add value to the bottom line. At its core, this is all about developing practices that drive the organization to get better and better at storing less data. To be more specific, that means storing less data in the wrong places (unprotected in the wild) and, to complement that, to get better at recognizing, storing and re-using valuable and sensitive data in the right places.

At this point it’s important to recognize more formal work done in this area. In no particular order, we should recognize ARMA's IG Implementation Model as well as the EDRM's IG Reference Model mentioned above. Each provide a common vocabulary and foundations that can be used to develop a range of IG strategies, policies and processes for any organization. We'll treat these (and others we've surely missed) as context while exploring some real-world issues that so often get in the way of their successful deployment.

Alongside these models, analyst organizations such as Gartner and Forrester continue to write on the subject and its supporting technologies, although, in our opinion, they often fall short of providing actionable advice beneath the strategic level. That notwithstanding, we would characterize their view as, increasingly, organizations need to complement their information security posture with the people, process and technology capabilities that enable them to proactively understand and classify their information assets inside the defensive perimeter.

What's the State of the Play?

In the course of working with customers, partners, analysts and opportunities alike, what's clear to us is that most organizations are struggling to land their information governance programs, if they have a program at all. To put another way, when we presented at this year’s HIMSS Conference, we coined the headline ‘Bad Information Governance comes as standard’. Common threads include:

Leaving everything as an IT effort, or, depending on a technology implementation to resolve the issue. This frequently leads to ‘silver bullet’ technology impositions on the organization which fail to deliver.
Small teams of individuals, often under compliance or legal, engaged in rear-guard actions driven by legal or business events and characterized by fire fighting a wide range of information issues, rather than being proactive.
Pockets of success which are struggling to scale, whereby dedicated information professionals have made inroads and developed practices in response to an initial project, but have neither the support or the resource to extend their reach.

That said, we see success too. While it can be in pockets, we have seen plenty of organization-wide programs establish themselves and deliver benefits. These programs often have their origins in some change event (such a migration or divestiture or, increasingly, in response to some security event). Cool practices (that we'll return to in the future) include:

IG teams offering a menu of services to the lines of business, ranging from data discovery and cleanup, through sensitive data audits and treatment plans to metadata enrichment and data enhancement. We recently learned from a customer that its Information Architecture team was leading this effort.
Data health and compliance monitoring with the provision of dashboards and reporting to inform and direct improvement and compliance efforts.
Wholesale file plan cleanup and reorganization in place, engaging lines of business in a programmatic approach to getting to know their data and organizing it so it can be better put to work. This is often motivated by preparation for migration to the cloud, in turn minimizing cloud costs, rather than just to improve operations.

Overall, the state of play, in our opinion, is patchy. If we think about governance of unstructured data using a generalized maturity model framework (such as the CMMI) we might describe many or most organizations as being at Level 1 (initial) or even Level 0 (unaware or unknown). Clearly there are plenty that are well beyond that mark but we’re generalizing here, while noting that there’s some great practice to circle back to for future articles.

What are the Implications?

Many have written on the implications of poor IG and, as we said at the beginning, it seems that most understand them, even if they don't feel compelled to act. What's clear is that the global march of data privacy regulation and awareness will continue to raise the bar for IG. These will increasingly drive organizations to invest in becoming better stewards of their customers' and their own data and, in turn, start to attend to the lefthand side of the EDRM.

Our customers are beginning to prioritize IG because the following things are grabbing their attention:

An increasing awareness of the costs associated with reacting to data events. Especially with respect to data breach events, 'it will never happen to me' is an increasingly naive stance to adopt. Other events that drive customer action are migrations (to the cloud or elsewhere) and corporate events such as acquisitions or divestitures.
More robust language in customer requirements for third and fourth-party vendors to meet obligations for data privacy, but also for the simple protection of their data and intellectual property. A recent discussion identified that some organizations are being asked to attest to their IG standards, particularly to their knowledge of where and how their data is stored.
An increasing demand from business innovation areas to provide clean, sanitized and focused data sets for use in generative AI workflows and initiatives. The realization that AI can and will hallucinate or regurgitate trash and sensitive data has been a wake-up call that IG should be well positioned to respond to.
The need to broaden the scope of records management practices to include all data, regardless of repository or format. This, perhaps, can be characterized as no longer ignoring the wildlands of file shares, email, chat or ad hoc collaboration platforms and seeking to actively dispose of any business data against a formal record retention schedule and, in turn, reduce liability with respect to a range of regulations.

Dimensions and Considerations

s

Looking ahead to the rest of this series, here's a preview of the dimension we think organizations and their leadership need to account for as real-world considerations for getting better at storing less data:

Resourcing. How will the effort be sustainably resourced? What relevant skills and knowledge already exist in the organization? What new skills need to be acquired? Where is augmentation appropriate and what should be core?
Ownership and Accountability. Who will be accountable for data? How will accountability be assigned? What responsibilities does ownership bring?
User Support. My way vs the Organization's way? What role will end users play in IG? What impact will the program have on users? How will end users be supported through the program?
Repositories of Record vs Transitory Environments. What is the purpose of each repository? Where are the repositories of record? Where is collaborative work done? How will records be captured during the collaborative process?
Balancing Risk and Complexity. Simplifying unmanageable records schedules. Setting up actionable disposition policies. Managing legal hold outside of mailboxes.

This blog series will explore each of these in pursuit of what we think should be a foundational goal of any IG program - achieving a state of Zero Dark Data. In this state the organization can take defensible, practical decisions about how and whether to retain any of its data and, in turn, reduce its risk and cost of ownership, mitigate potential regulatory sanctions and, in time, increase the value of its information. Why not join us by commenting as we go? We'd love to hear your feedback.

View full post