How to Establish Data Ownership and Governance Roles
Establishing strong data management and governance requires assigning clear roles (especially for data ownership) and an iterative approach that slowly moves the needle away from unowned/unmanaged data repositories to a modern environment where all data is fully understood, secured and managed. This piece explains how to put a workable data management and governance process in place.
Data Governance Basics
It’s critical to define roles for data management transformation and understand how they should function. We recommend identifying the following:
- A senior leader to oversee the entire program.
- Builders to focus on the gaps between where things are and where they should be.
- Maintainers who are currently supporting a form of data governance in their areas.
However, that approach presumes in situ improvement and not fundamental technical and cultural transformation. The greater challenge requires additional roles and a more rigorous approach to the analysis and moving of data.
Structuring Unstructured Data
To successfully make such a transformation, a one-time data inventory process will not be sufficient. Instead, an iterative approach will be needed to identify data that can be moved to a new, structured environment and then to reassess the data that remains. The principle at play will be one of continual winnowing, parceling the centralized data out to a large number of structured repositories.
However, it is critical to recognize that, while the understanding of the overall approach is required to start, it is best not to start with the winnowing. Instead, a strong analysis of roles and responsibilities should be in place first, driving from a presumed future state of a large number of small data repositories managed by different individuals.
Data Ownership and Governance Roles and Responsibilities
While a senior leader is needed to oversee the program, and the builders and maintainers must be identified, those roles are just for the data management component. Those roles alone cannot drive cultural change. Organizations should consider the following additional roles defined for each new data repository:
- Data owner: The data owner role is often discussed but seldom defined. For this process to work, the data owner must be the individual in the organization who understands and oversees the business processes that involve a particular
data repository. Normally, the data owner would be a senior manager on the business side. Efforts to assign security or risk people to a data owner role will not work because, while they can provide assistance protecting the data, they do not
know how it is used in each business process and cannot ensure the controls placed around the data will be appropriate for each business context.
The data owner is responsible for overseeing and delegating tasks to ensure data is available, minimized, up to date and protected throughout its lifespan within that data owner’s repositories.The data owner is also responsible for ensuring data retention and destruction practices run as they should for those repositories.
- Educator: The responsibility of an educator is to ensure the data owners understand their responsibilities. One educator can be assigned to multiple repositories, and one repository may have multiple educators. Educators will typically come from existing security and compliance groups. Thus, one educator may be an expert in GDPR and be assigned to all the data owners in an organization and tasked with ensuring all data owners understand their responsibilities under the law. However, another educator may be a security expert and assigned only to three or four repositories because they are expected to understand the data at a deeper level to provide appropriate guidance to the data owner.
- Reporter: A reporter role functions much like an educator role, but instead of conveying “how things should be” to the data owner, the reporter conveys “how things are.” Reporters will often come from the data owner’s own organization. Reporters work with educators under the direction of the data owner to identify how best to report metadata from their assigned repositories, so the data owner can make the appropriate resourcing and prioritization decisions. Because reporters must have both business and technical knowledge and be aware of changes at the operational level, it should be rare for a reporter to be responsible for more than a few repositories.
Data Ownership and Governance Role Transitions
It is natural that, over time, the individuals assigned to specific roles will move to other work. It is critical the roles not be left open or the system will break down. When the organization loses a reporter, it should identify how much of the reporting was automated and will continue in that person’s absence. A new individual should be prioritized to learn the reporting process and maintain the reporting scripts. While the reporting practice can continue without a role for a period of time, it must be understood that the quality and accuracy of the reports will likely degrade over time and, if not maintained, will drive the data owner to make the wrong decisions.
When an educator leaves, that role should be replaced according to the cadence of the particular knowledge set. For example, PCI DSS updates on a yearly cycle and, even if the standard does not change, enough clarification documents are issued by the PCI Security Standards Council that an annual cadence is needed to stay current. In contrast, if an expert in intrusion and exfiltration moves on, that role may need to be replaced within a month, because attack groups update their capabilities much more quickly than a standards body makes updates.
Data owners should be replaced immediately, even if it means bringing in an unrelated senior manager until a long-term replacement can be found. While the teams know what they need to do, and are generally good at doing it, the message an open data owner position sends to those teams is that data security and compliance is unimportant and is just a checkbox exercise. For data transformation to be successful, there should never be an absence of leadership.
Niche Repository Creation and Assignation
Once the new roles are understood, the creation of the newer “data niches” can begin. The form a data niche can take varies drastically, but as a general rule of thumb, data should move counter-entropically. In other words, data should become more structured as it goes through this process. Examples:
- Data stored as a collection of PDF documents could be migrated into a cloud service that indexes the documents for search, tracks access and allows for redaction.
- Data stored as a collection of Microsoft Word and Excel documents could be migrated into a SharePoint environment and grouped in a way to provide minimal access through Microsoft 365 data controls.
- Data stored in a Microsoft Access database could be migrated to Snowflake, with basic reporting created to mimic what users were doing with Access.
- Data stored in a Microsoft SQL database could be migrated to Amazon Relational Database Service, while also tokenizing data and deleting unneeded data in the process.
As the form of each repository is defined, the appropriate data owner, educator(s) and reporter(s) should be assigned, and each should sign off on requirements (educator), implementation details (reporter) and operational readiness (data owner) before the new repository goes live.
As each new niche repository is moved to an operational state, the data in the centralized system should be archived and removed from the centralized system. Ideally, such data would be archived to an air-gapped system. Even if the system is a virtual machine that may simply be turned off, the networking should be adjusted so that when it is turned on, it is not re-exposed to the network, but instead to an archival DMZ that must be accessed via VPN. This approach keeps the data from doubling and causing internal conflicts and loss of data control.
Iterative Reduction
As this process continues, the centralized repository will shrink through the archival process and new niche repositories will be implemented to replace the legacy systems and legacy data methods. It is critical to recognize this reduction process is part of the migration itself. It is common for migrations to occur and the older systems retained “just in case.” However, such data storage areas are sources of data leaks, because no one is responsible for maintaining or even monitoring them. Many data breaches have come from older systems that were simply never removed properly.
As the process continues, the concentration of data confusion will grow on the legacy repository. Data confusion shows itself in the form of files that have no owner, legacy systems that “just run themselves” and files that exist because they were important to someone who no longer works at the organization. It’s important this confusion be resolved, so these issues can be addressed. In many cases, the best approach may be to simply delete that data, but cultural norms may prevent that solution. Once the legacy repository reaches a high enough concentration, it is often best to assign ownership of the entire repository to a single role – such as the head of the data team – and task them with sorting out the rest. The task of this role is, however, not to solve the problems, but to eliminate the confusion, so the remaining files can be assigned to appropriate owners and migrated to the niche repositories.
Eventually, when no data is left in the centralized location(s), it/they may be removed.
Transforming Data Tips
Transforming data while transforming a culture can be tricky. To improve the chances of success organizations should consider:
- Assign new roles: Define the data owner, educator and reporter roles as necessary to start.
- Maintain the new roles: Unlike the “builder” roles highlighted at the beginning of this piece, the transformation roles are permanent and must be maintained as the organization matures.
- Don’t give up: Data transformation can take years, and it can often feel like progress is not being made. However, that feeling is what drove the organization to the current situation and must be addressed directly to prevent further stagnation and confusion. Iterate through the legacy environment(s) and eliminate all you can with each iteration – building new migration locations in between iterations. Success is possible; it just takes time and effort.
Although reasonable efforts will be made to ensure the completeness and accuracy of the information contained in our blog posts, no liability can be accepted by IANS or our Faculty members for the results of any actions taken by individuals or firms in connection with such information, opinions, or advice.