The semantic layer
your AI agents reason on.
A multi-campus university system keeps its data in two places: an Amazon Redshift warehouse for student records (enrollments, grades, credits) and an open S3 Tables / Apache Iceberg lakehouse for everything else (facilities, financial aid, research grants, housing, library). The Knowledge Layer puts one AWS Glue Data Catalog business glossary over both — so a term like "Earned Credit Hours" is defined once and means the same thing whether an AI agent asks the warehouse or the lakehouse. Define once, govern everywhere. Built entirely on managed AWS services, ready to ride AWS Context the day it lands.
The problem, and the story this demo tells
A guided walkthrough — follow the tabs in order.
A data lake tells you where the bytes are. It doesn't tell you
what they mean. Ask three people "how many students are active?" and you get
three numbers — because "active" lives in three heads. An AI agent has it worse: it sees only a
column called enrollment_status_cd. This demo shows how one governed glossary
in the AWS Glue Data Catalog makes meaning consistent for every human and every agent — across
two different data engines — and how a single tool answers any question without ever being
rebuilt as the data grows.
- 1Knowledge GraphSee the whole system at a glance — campuses, both data planes, the glossary that governs them, the agent tools.
- 2Ask the LayerThe hero demo: ask a warehouse question, then a lakehouse one — one tool finds the data, applies the glossary, and routes to the right engine.
- 3GlossaryThe source of truth. Edit a term here and every agent's answer changes — live, no redeploy.
- 4OnboardAdd a new data source by manifest — and because the tool is data-agnostic, it's answerable with zero code change.
- 5Technology & RoadmapEvery layer is a managed AWS service, designed to ride AWS Context the day it lands.
Executive summary
What we built, in one minute.
The catalog is the semantic layer
Business meaning lives natively in the AWS Glue Data Catalog — a governed glossary whose terms carry the exact metric SQL, plus skill assets that ground agents in trusted definitions instead of guesswork. Define a term once; every consumer changes. No bespoke ontology store to run.
One glossary over two data planes
Student records live in an Amazon Redshift warehouse; the wider estate lives in an S3 Tables / Iceberg lakehouse. One glossary governs both — so "Earned Credit Hours" means the same thing whether the answer comes from Redshift or Athena. Unstructured questions hit a Bedrock Knowledge Base and answer with citations.
One tool, agnostic of the data
A single MCP tool, kl_ask, answers any question. It discovers the
tables at query time, grounds the SQL on the live glossary, and routes to the right
engine itself — so onboarding a new database or table set needs zero code change. The
agents' tools never change as the data grows.
Agents as first-class citizens
Every agent in the fleet reaches the layer as MCP tools through the AgentCore Gateway — to consume governed knowledge and to contribute terms back under review. Designed to ride AWS Context the day it lands. One layer, the whole fleet.
One layer over a distributed system
Mapping, not migration — meaning is the product.
Many campuses. Two data planes. One definition.
The whole university system runs on two data planes: an Amazon Redshift warehouse for student records and an Iceberg lakehouse for facilities, financial aid, research grants, housing, and library. A single Glue Data Catalog business glossary governs both, so "how many students are enrolled this fall?" or "what's our housing occupancy rate?" gets the same trustworthy answer on either plane — asked of one campus or the whole system, by a person or an AI agent.
Live knowledge graph
A real-time projection of the layer — campuses, students, courses, enrollments, the governed glossary, agent skill assets, document corpora, and the tools the fleet uses. Drag to explore. Click any node for detail.
Onboard a data source
A new source joins the knowledge layer by declaring a small manifest — no code. Pick a type, describe it, and run a dry run to see exactly what would be enriched, indexed, and how it joins the graph. The dry run is read-only — it never writes.
Business glossary
The controlled vocabulary, governed in the AWS Glue Data Catalog — the single source of truth every agent reads. Browse what's defined, propose a new term in plain language, and (as a reviewer) promote it live — where every tool and agent immediately consumes it. This is the steward workflow, running against the real catalog.
Live terms
Pending drafts
Propose a new term
A steward writes plain business language. It's saved as a reviewed draft — nothing goes live until promoted.
Ask a question. One tool finds the data, applies the glossary, and answers.
This is the live kl_ask tool — the single,
data-agnostic interface every fleet agent uses. Ask about student records (enrollments,
credits, GPA, DFW rate — in the Amazon Redshift warehouse) or the wider
estate (housing, financial aid, research grants — in the Iceberg lakehouse). The tool
discovers the right tables at query time, pulls the relevant governed definitions
live from the glossary, writes the SQL the glossary prescribes, routes it to the
right engine, and answers with the SQL + provenance. Edit a definition once in the
catalog and every answer changes — with zero code change. Onboard new data and it's
answerable — with zero code change.
Ask the layer
checking the layer…kl_ask tool over both planes · read-only · grounded on the Glue Data Catalog glossaryThe one business glossary
These governed terms live ONCE in the Glue Data Catalog business glossary. Each term's definition carries the exact metric SQL. The agent reads them live, so this list is what governs every answer — the lakehouse plane binds to the same terms.
Define once. Govern every answer.
The headline proof, verified live: the agent holds zero baked metric definitions. It pulls them from the glossary at query time — so a steward editing a term in the catalog instantly changes how the agent computes, with no redeploy.
| Governed term | What the glossary defines |
|---|---|
| Enrolled Headcount | COUNT(DISTINCT student_pidm) where registration is active |
| Earned Credit Hours | sum of credits_earned (not attempted; W/I/AU excluded) |
| DFW Rate | 1.0 * SUM(CASE grade IN (D,F,W))/COUNT(*) — engine-neutral |
The technology
Every layer is a managed AWS service. Here's what each does, how we use it, and how it scales — newest capabilities from the AWS Summit New York 2026 included.
How the platform evolves
We adopted the managed pieces as they reached GA, designed the data model to ride preview features, and aligned to the two published contracts of AWS Context — so adopting it later is re-pointing a tool, not a re-platform.
The AWS Context horizon
AWS Context is AWS's own managed knowledge-graph + agentic-search service. It maps relationships across your data, serves governed relationships, business rules, and domain knowledge to agents at runtime, learns from how agents use it, and — crucially — stores its metadata as Iceberg in S3 Tables: exactly the substrate this layer already writes.
Because our data is Iceberg-in-S3-Tables, our semantics live in the Glue Data Catalog, and all agent access is MCP behind the gateway, adopting AWS Context becomes a migration of one tool target — the institution's accumulated meaning carries straight over.