Unstructured Data Insights
Data age and retention is a critical dimension for records and information management. In all of its forms, unstructured data is often over-retained, usually as the result of simple neglect rather than the application of a deliberate retention and disposal program. Across our customers, the average age of all unstructured data is 6 years and, as this chart shows, up to 31% of that data is at least 7 years old. Since the data we discover runs across all repositories from file shares and email to document management systems such as iManage, it's easy to identify large data sets that can be readily and defensible disposed of.
Context is king, and nothing provides context quite like storage repository or location. When one overlays data age against storage context, a story opens up in front of any analyst. Email and personal drives are inappropriate for long term retention while file shares, unless subject to careful controls, are rarely the right place to manage data beyond transitory work-in-progress. When investigating into top-level metrics like these, analysts can readily identify poor data practices and disposition actions, enabling them to provide advice to data owners and to scale the governance oversight.
Knowing data composition is foundational to any governance effort. With almost 80% of data appearing in non-Microsoft formats it's clear that additional capabilities are needed to fully understand the nature of your organization's data composition. Sometimes, data object type is all an analyst needs to enact an information or security policy (for example, for driving acceptable use policies or identifying proprietary data). Adding in age and storage context along with, perhaps, sensitivity score, power up disposition and regulatory compliance.
While the use patterns of different data formats can vary wildly, for the analyst that understands their organization, a simple comparison of age across different formats can tell a clear story. Mail and messages retained for long periods present significant risk to any organization, whether it be for regulatory compliance or increased breach surface area. Alternatively, extensive retention of text formats can represent data dumps, which can often contain sensitive data originally intended for transitory processing only. In each case, a data-led enquiry will quickly get to the root and can drive a disposition for risk and cost reduction.