Analyst Spotlight
Bob Parker
SVP, Software and Services Research
IDC
Getting Your Data AI Ready
It has become a common refrain – getting data governance right is key to a successful AI strategy! This conventional wisdom is very true, but it is not a new problem. For as long as I have been involved in IT, both as an analyst and as a CIO, companies have struggled with wrangling the various data sets across the applications running at the organization.
Much of this prior effort focused on structured data sitting in relational databases. From data warehousing to data lakes and now to data lakehouses, companies have incrementally built better cataloging and semantic mapping. This category of data provides a performance context; it is where a company keeps score whether it is for financial reporting, operational status, sales pipelines, or workforces.
While much of the effort historically has been on this structured data, for the average company it only represents about 20% of the information corpus. The rest is in the form of unstructured information in the form of documents, video, voice, or structures (e.g., blueprints or chemical models). A central benefit of the transformer algorithms that build the language models used in generative AI is that they introduce some structure into this mess via vectoring. This category of data represents the knowledge context at an enterprise – the collective knowledge of the organization is locked in these documents, videos, voice recordings, and structures.
There is a third category of information as well – streaming data. This is the telemetry of the organization. It could come in the form of sensors on a factory floor, the readings from health monitors, or click streams on a website. This type of data usually is delivered in some time-series form and needs specific governance, usually tag repositories, to understand and apply the data. This data provides the situational context, a view of what is happening in real time.
Efforts to organize, govern and utilize the data must link all three categories of information. To achieve the tremendous potential of agentic AI, a company must be able to link the knowledge to the situational and performance context. This requires advanced tools for semantic graphing and knowledge mapping with a strong commitment from the organization to elevate comprehensive data management to a strategic priority.
IDC does advise companies that they don’t have to get this all done before they undertake agentic efforts. Rather, it is important to have the tools, organization, and policies in place and then synchronize the data domains with the agentic priorities. For example, if the company wants to focus on marketing, then the information relevant to that function should be prioritized for governance.
It is easy to acknowledge that data is critical to AI success, but realization requires a comprehensive approach to data across all categories.