Do We Need More Knowledge Engineers for the Generative AI Era?

February 4, 2025
Do We Need More Knowledge Engineers for the Generative AI Era?

The generative AI era has ushered in a new wave of technological advancements, but with it comes a host of challenges, particularly in the realm of data management and understanding. As AI projects mature from proof of concept to production stages, the need for professionals who can navigate the complexities of data context has become increasingly apparent. These professionals, known as knowledge engineers, are essential for ensuring that AI models are both effective and ethical.

The Rising Importance of Data Context

Understanding Data Context

In the past, the era of Big Data was dominated by the belief that more data equated to better insights. However, this notion has evolved, and the current understanding emphasizes the importance of data context. Knowledge engineers play a crucial role in discerning the context of data, which includes understanding the sources, methods of collection, intended use, and intended users. This level of comprehension is vital for mitigating risks associated with black-box models and unstructured data usage.

The accuracy and applicability of AI models depend heavily on having the right data, which necessitates an in-depth contextual understanding. This includes knowing the sources, methods of collection, intended use, and intended users of the data. This level of comprehension ensures that AI applications are running on suitable data, mitigating risks associated with black-box models and unstructured data usage. Faulty queries or data interpretations in AI can lead to significant consequences, including unexplainable errors or hallucinations.

The Role of Knowledge Engineers

Knowledge engineers are akin to reference librarians from previous eras, tasked with the responsibility of managing and understanding data context. Their expertise ensures that AI applications run on suitable data, reducing the likelihood of unexplainable errors or hallucinations. By answering foundational questions such as Who, What, Where, When, Why, and How, knowledge engineers provide the necessary context for accurate data interpretation.

In his address, Juan Sequeda emphasized the importance of sharing semantics for effective data reuse across organizations. Andrew Nguyen from Best Buy Health further illustrated this with a practical example—the varying interpretations and actions of different medical professionals when documenting a medical condition. This divergence reflects the importance of context in data documentation and interpretation.

The Evolution of Data Models

From Big Data to Rightsized Models

The shift from Big Data to rightsized models marks a significant trend in the AI landscape. Companies like Databricks, IBM, and Snowflake are leading the charge by promoting models optimized for specific needs rather than relying on excessive data. This approach not only enhances efficiency but also underscores the importance of having the right data for AI models.

Historically, the era of Big Data was ruled by the notion that more data led to more comprehensive insights. However, contemporary understanding has refined this viewpoint, acknowledging that “bigger is not always better.” This is underscored by the rise of “rightsized models,” aimed at delivering efficient results based on adequate but not excess data. Companies like Databricks, IBM, and Snowflake are leading this charge, promoting models that are optimized for specific needs. DeepSeek’s recent benchmarks further highlight the need for rightsized data models, taking critical shots at overly expansive models like those by OpenAI.

The Impact of Rightsized Models

Rightsized models, as highlighted by DeepSeek’s recent benchmarks, challenge the notion that bigger is always better. These models focus on delivering efficient results based on adequate data, rather than overwhelming AI systems with unnecessary information. This trend emphasizes the need for knowledge engineers who can identify and curate the right data for AI training and inference.

Understanding the data context is about answering the foundational journalistic questions: Who, What, Where, When, and Why, supplemented by How. Ole Olesen-Bagneux’s keynote at Data Day Texas emphasized that organizational data is often highly scattered, which complicates gaining these contextual insights. In the past, data mesh and federated governance came into discourse, though its legacy lies in data products rather than effective governance. Olesen-Bagneux proposed the “meta grid” concept—a systematic mapping of systems, people, or line organizations served, alongside upstream and downstream connections, primarily through metadata.

Addressing Data Governance Challenges

The Meta Grid Concept

Ole Olesen-Bagneux’s keynote at Data Day Texas introduced the concept of the “meta grid,” a systematic mapping of systems, people, or line organizations served, alongside upstream and downstream connections, primarily through metadata. This approach focuses on metadata rather than raw data to yield insights on the Five Ws, providing a comprehensive understanding of data context.

Olesen-Bagneux’s approach focuses more on metadata rather than raw data to yield insights on the Five Ws. His concept, meta grid, as of now, is so new that it doesn’t populate Google search results. Metadata drawn from databases outlines schema, business applications, logic, or cloud infrastructure, indicating data processing methods. For instance, data pipelines can reveal extensive context regarding data shared across an organization, including transformations and sharing pathways, significantly impacting data interpretations.

Metadata and Data Pipelines

Metadata drawn from databases outlines schema, business applications, logic, or cloud infrastructure, indicating data processing methods. Data pipelines, in particular, reveal extensive context regarding data shared across an organization, including transformations and sharing pathways. This level of detail significantly impacts data interpretations and highlights the importance of knowledge engineers in managing metadata.

Synthesis of these unstructured data assets through entity extraction capabilities of language models remains complex and doesn’t necessarily address the context directly. Would you entrust a language model to design interpretations from messages or legal contracts? Context needs to be manually verified for appropriateness and pertinence.

The Emergence of Context Engineering

The Discipline of Context Engineering

Andrew Nguyen from Best Buy Health introduced the emerging discipline of Context Engineering, which focuses on systematically capturing and making data context explicit. Though not yet well-defined or widely documented, context engineering persists as an enriched knowledge graph, hinting at the growing need for knowledge engineering.

Understanding the data context is about answering the foundational journalistic questions: Who, What, Where, When, and Why, supplemented by How. Ole Olesen-Bagneux’s keynote at Data Day Texas emphasized that organizational data is often highly scattered, which complicates gaining these contextual insights. In the past, data mesh and federated governance came into discourse, though its legacy lies in data products rather than effective governance. Olesen-Bagneux proposed the “meta grid” concept—a systematic mapping of systems, people, or line organizations served, alongside upstream and downstream connections, primarily through metadata.

Practical Applications and Challenges

Nguyen’s example of varying interpretations and actions of different medical professionals when documenting a medical condition illustrates the importance of context in data documentation and interpretation. This divergence underscores the need for knowledge engineers to ensure consistent and accurate data usage across different contexts.

In his address, Juan Sequeda emphasized the importance of sharing semantics for effective data reuse across organizations. Andrew Nguyen from Best Buy Health further illustrated this with a practical example—the varying interpretations and actions of different medical professionals when documenting a medical condition. This divergence reflects the importance of context in data documentation and interpretation.

The Human Element in AI

The Role of Human Judgment

While AI can aid in extracting metadata and mapping data landscapes, human judgment remains indispensable. Knowledge engineers, reminiscent of library scientists, are essential for understanding both the technical and contextual details, ensuring that AI models produce reliable and relevant outcomes.

The human element is essential; AI can aid in extracting metadata and mapping data landscapes, but human judgment is indispensable. The crying need for knowledge engineers became evident during a town hall session moderated by Joe Reis and Matthew Housley. Knowledge engineering, reminiscent of library science, requires understanding both the technical and contextual details, ensuring that AI models produce reliable and relevant outcomes.

Knowledge Graphs in Enterprise Solutions

The era of generative AI has brought significant technological advancements, but it also introduces a variety of challenges, particularly related to data management and comprehension. As AI projects advance from initial proof of concept stages to full production, the necessity for skilled professionals who can navigate the nuances of data context has become increasingly clear. These specialists, known as knowledge engineers, play a vital role in ensuring that AI models are both effective and ethical.

Knowledge engineers are responsible for understanding the complexities of the data that AI systems use, ensuring the data is accurate, relevant, and applied correctly. They help bridge the gap between raw data and the actionable insights that businesses depend on. Without these experts, AI models might misinterpret information, leading to flawed or biased outcomes.

Moreover, the work of knowledge engineers goes beyond the technical realm. They are also tasked with addressing ethical considerations, ensuring AI systems do not perpetuate biases or cause harm. As AI becomes more integrated into various industries, the role of knowledge engineers is crucial for making sure these systems are fair and trustworthy.

In summary, the rise of generative AI highlights the pivotal role of knowledge engineers. They are indispensable for managing data effectively and ensuring that AI systems operate ethically and correctly. Their expertise helps harness the full potential of AI technology while safeguarding against its possible pitfalls.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later