Generative AI (GenAI) is reshaping business operations—powering intelligent automation, enhancing customer experiences and unlocking new insights. Behind every successful AI initiative though, is a foundation of high-quality, well-governed data.
In fact, GenAI high performers1 have recognised data as the main challenge in capturing value. According to McKinsey’s Global Survey on AI, 70% of respondents cited data as a significant challenge in scaling GenAI effectively.2
Once you’ve invested time in cleaning up your data, ensuring it’s accurate, complete and relevant, the next challenge is keeping it that way. That’s where data governance comes in.
Data governance is the framework that ensures your data remains trustworthy over time. It involves assigning ownership, defining standards and implementing processes for how data is collected, maintained and used across the business. With a strong governance strategy, your GenAI models are better positioned to be trained on reliable data, supporting your AI-driven decisions to be more explainable and compliant.
Building a data governance framework that supports GenAI success
1. Define clear data ownership
AI models rely on diverse datasets that often span multiple departments and systems. Without clear ownership, it’s difficult to maintain accountability or ensure data quality. Clear ownership ensures that someone is responsible for the integrity of the data feeding your AI models.
- Assign data stewards for each dataset used in GenAI training and outputs.
- Empower stewards to monitor, update and validate data regularly.
- Clarify responsibilities across teams to avoid duplication or gaps in oversight.
2. Implement access controls
GenAI systems often require access to sensitive or regulated data. Without proper controls, you risk data breaches, misuse or non-compliance.
- Classify data by sensitivity level.
- Use role-based access to restrict who can view, edit, or export data used in AI models.
- Monitor access logs to track how data is being used and by whom.
- Ensure that only authorised systems and personnel can interact with training datasets.
3. Establish data standards
GenAI models are only as good as the data they learn from. To facilitate reliable outputs, your data must meet clearly defined quality benchmarks.
- Define minimum thresholds for data completeness, accuracy and consistency.
- Use automated profiling and cleansing tools to detect anomalies and standardise formats.
- Regularly audit datasets to ensure they meet quality standards over time.
- Apply data validation at the point of data capture.
4. Maintain robust documentation
Transparency is critical in AI. You need to be able to explain how data was collected, processed and used to train models, especially in regulated industries like finance and healthcare.
- Document data sources, policies, transformation rules and model training processes.
- Maintain lineage records to trace the origin, movement, and transformation of data used in GenAI models.
- Ensure documentation is accessible to both technical and non-technical stakeholders.
Putting governance into practice
Establishing a governance framework is essential for ensuring your GenAI initiatives are built on trustworthy, well-managed data. While strategy and policy are key, having the right tools in place makes governance scalable and sustainable.
Tools like Aperture Data Studio can help operationalise governance by:
- Assigning sensitive datasets to specific user groups, to help manage access and support data quality within the platform.
- Enforcing business rules, which promote consistency and reduces the risk of errors.
- Providing workflow traceability, allowing teams to track data transformations and understand the data’s origin.
By embedding governance into your data workflows, you’ll be better positioned to build a resilient foundation for your GenAI initiatives, helping to enable accuracy, transparency and long-term scalability.
Speak with us to explore how Aperture can support your data governance strategy.
1. Respondents who said that at least 11% of their organisations’ EBIT in 2023 was attributable to their use of generative AI. For respondents at AI high performers, n = 46; for all other respondents, n = 830. Respondents who said “don’t know / not applicable” are not shown.
2. McKinsey & Company. The state of AI in early 2024: Gen AI adoption spikes and starts to generate value. Based on McKinsey Global Survey on AI, 1,363 participants at all levels of the organisation, Feb 22–Mar 5, 2024. Available at: https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-2024
