Data governance: 5 tips for holistic data protection

Your data is a strategic asset. To benefit your business, data requires strict controls around structure, access, and lifecycle. However, most security leaders have doubts about data security—nearly 70 percent of chief information security officers (CISOs) expect to have their data compromised in a ransomware attack.1 Part of the problem lies in traditional data-management solutions, which tend to be overly complex with multiple unconnected, duplicative processes augmented with point-wise integrations. This patchwork approach can expose infrastructure gaps that attackers will exploit.

In contrast, proactive data governance offers a holistic approach that conserves resources and simplifies the protection of your data assets. This integrated approach to data governance is a vital component of Zero Trust security and spans the complete lifecycle of your data. It also reduces the cost incurred by a data breach, both by shrinking the blast radius and preventing an attacker from moving laterally within your network. Microsoft Purview provides a comprehensive data governance solution designed to help manage your on-premises, multicloud, and software as a service (SaaS) data. To help you get more from your data, we’ve put together five guideposts.

1. Create a data map of all your data assets

Before you can protect your data, you’ll need to know where it’s stored and who has access. That means creating comprehensive descriptions of all data assets across your entire digital estate, including data classifications, how it’s accessed, and who owns it. Ideally, you should have a fully managed data scanning and classification service that handles automated data discovery, sensitive data classification, and mapping an end-to-end data lineage for every asset. You’ll also want to make data easily discoverable by labeling it with familiar business and technical search terms.

Storage is a vital component of any data map and should include technical, business, operational, and semantic metadata. This includes schema, data type, columns, and other information that can be quickly discovered with automated data scanning. Business metadata should include automated tagging of things like descriptions and glossary terms. Semantic metadata can include mapping to data sources or classifications, and operational metadata can include data flow activity such as run status and run time.

2. Build a decision and accountability framework

Once you know where all your data is located, you’ll need to document the roles and responsibilities of each asset. Start by answering seven basic questions:

  1. How is our data accessed and used? 
  2. Who is accountable for our data?
  3. How will we respond when business or regulatory requirements change?  
  4. What is the process for revoking access due to a role change or an employee leaving?
  5. Have we implemented monitoring and reporting to track data access?
  6. How do we handle lifecycle management?
  7. Are we automating permissions management to enforce security and compliance?

In response to question number one, you should develop a detailed lifecycle for data access that covers employees, guests, partners, and vendors. When deciding what data someone may need to access, consider both the person’s role and how the data in question will be used. Business unit leaders should determine how much access each position requires.

Based on the information gathered, your IT and security partners can create role-based access controls (RBAC) for each employee position and partner or vendor request. The compliance team will then be responsible for monitoring and reporting to ensure that these controls are put into practice. Implementing a permissions management solution can also help your organization by preventing misuse and malicious exploitation of permissions. By automatically detecting anomalous alerts, your organization can reduce IT workloads, conserve resources, and increase user productivity.

3. Monitor access and use policies

Next, you’ll need to document the policies for each data repository. Determine who can access the data—including read versus write access—and how it can be shared and used in other applications or with external users. Will your organization be storing personal identifiable information (PII) such as names, identification numbers, and home or IP addresses in this repository? With any sensitive data, it’s imperative to enforce the Zero Trust principle of least privilege or just-in-time (JIT) access.

The JIT permissions model strengthens the principle of least privilege by reducing the attack surface to only those times when privileges are actively being used (unlike the all-day, every day attack surface of standing privileges). This is similar to the just-enough-privilege (JEP), wherein a user completes a request describing the task and data they need to access. If the request is approved, the user is provisioned with a temporary identity to complete the task. Once the task is completed, the identity can be disabled or deleted. There’s also a “broker-and-remove-access” approach, wherein standing privileged accounts are created and their credentials stored securely. Users must then provide a justification when requesting to use one of the accounts to access data for a specific amount of time.

Your organization can protect itself by maintaining a log of every request for elevated access (granted or declined), including when the access was revoked. All organizations, especially those storing PII, need to be able to prove to auditors and regulators that privacy policies are being enforced. Eliminating standing privileged accounts can help your organization avoid audit troubles.

4. Track both structured and unstructured data

Traditionally, data governance has focused on business files and emails. But stricter regulations now require organizations to ensure that all data is protected. This includes both structured and unstructured data shared on cloud apps, on-premises data, shadow IT apps—everything. Structured data is comprised of clearly defined data types with patterns that make them easily searchable, such as Microsoft Office or Google Docs. Unstructured data can include anything else, such as audio files, videos, and even social media posts.

So, should you leave it up to the individual asset owner to implement their own data protections across such a vast data landscape? An alternative that some of Microsoft’s customers have embraced involves developing a matrixed approach to data governance, wherein security and compliance experts help data owners meet requirements for protecting their data. In this scenario, a “common data matrix” is used to track how data domains are interacted with across your organization. This can help document which areas of your business can simply create data versus read, access, or remove data assets. Your data matrix should identify the data’s source, including any shadow IT systems in use. Make sure to capture any domains and sub-domains containing sensitive or confidential data, subject to government regulation. Also, documenting roles and responsibilities for each business unit allows everyone to understand who is using specific data for a particular job, as well as who is adding data into a system and who is responsible for it.

5. Delete data that’s no longer needed

“Dark data,” which organizations pay to store but goes underutilized in decision making, is now growing at a rate of 62 percent per year.2 Given that most IT teams are already overstretched, asking them to stand guard over vast data lakes is not a recipe for security. So, how do you know when some data is no longer useful to your organization?

Sometimes the easiest way to protect data is to delete it. In keeping with the Zero Trust principle of “assume breach,” less data means less risk. Theft of intellectual property (IP) can be financially hazardous, whereas theft of customer PII can be disastrous long-term for your brand. Privacy laws require that businesses keep PII only for as long as it has served its original purpose.3 However, manually tracking which files are subject to deletion would be nearly impossible. A better approach is to implement ongoing controls to auto-expire PII or set up automated reminders for reviewing sensitive data to decide if it’s still needed.

Understanding the lifecycle of data makes it easier to delete when it’s no longer needed. An integrated data governance solution with intelligent machine learning capabilities can do the work for you, classifying content when it’s created and automatically applying appropriate sunset policies.4 Or, use multi-stage retention policies to automatically apply a new label at the end of a retention period.

Learn more

Proactive, holistic data governance is an integral part of data protection, spanning the complete lifecycle and helping drive business outcomes by ensuring that your data is discoverable, accurate, and secure. Microsoft Purview integrates and automates data governance by setting lifecycle controls on your sensitive data, protecting against data loss, and managing RBAC. To experience Purview in your organization, you’re welcome to start with a free trial.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.


1Almost 70% of CISOs expect a ransomware attack, Danny Bradbury. October 19, 2021.

2September 2021 survey of 512 United States compliance decision-makers commissioned by Microsoft from Vital Findings.

3GDPR personal data—what information does this cover?, GDPR. 2022.

4Microsoft is committed to making sure AI systems are developed responsibly and in ways that warrant people’s trust. As part of this commitment, Microsoft Purview engineering teams are operationalizing the six core principles of Microsoft’s Responsible AI strategy to design, build and manage AI solutions. As part of our effort to responsibly deploy AI, we provide documentation, gating, scenario attestation, and more to help organizations use AI systems responsibly.

READ MORE HERE