The semantic layer
your AI agents reason on.
A multi-campus university system keeps its data in two places: an Amazon Redshift warehouse for student records (enrollments, grades, credits) and an open S3 Tables / Apache Iceberg lakehouse for everything else (facilities, financial aid, research grants, housing, library). The Knowledge Layer puts one AWS Glue Data Catalog business glossary over both — so a term like "Earned Credit Hours" is defined once and means the same thing whether an AI agent asks the warehouse or the lakehouse. Define once, govern everywhere. Built entirely on managed AWS services, ready to ride AWS Context the day it lands.
Executive summary
What we built, in one minute.
The catalog is the semantic layer
Business meaning lives natively in the AWS Glue Data Catalog — table and column descriptions, a governed glossary, and skill assets that ground agents in trusted definitions instead of guesswork. No bespoke ontology store to run.
Two retrieval spines, one truth
Structured questions become governed SQL over Apache Iceberg via Athena. Unstructured questions hit a Bedrock Knowledge Base on S3 Vectors and return answers with citations. Same catalog, same governed definitions.
Agents as first-class citizens
Every agent in the fleet reaches the layer as MCP tools through the AgentCore Gateway — to consume knowledge and to contribute it back under review. One layer, the whole fleet.
Onboard anything
A manifest onboards a new source in minutes — structured data, an S3 document store, or an external SaaS like a personal notes graph (federated, never copied). The layer grows with the institution.
One layer over a distributed system
Mapping, not migration — meaning is the product.
Many campuses. Two data planes. One definition.
The whole university system runs on two data planes: an Amazon Redshift warehouse for student records and an Iceberg lakehouse for facilities, financial aid, research grants, housing, and library. A single Glue Data Catalog business glossary governs both, so "how many students are enrolled this fall?" or "what's our housing occupancy rate?" gets the same trustworthy answer on either plane — asked of one campus or the whole system, by a person or an AI agent.
Live knowledge graph
A real-time projection of the layer — campuses, students, courses, enrollments, the governed glossary, agent skill assets, document corpora, and the tools the fleet uses. Drag to explore. Click any node for detail.
Onboard a data source
A new source joins the knowledge layer by declaring a small manifest — no code. Pick a type, describe it, and run a dry run to see exactly what would be enriched, indexed, and how it joins the graph. The dry run is read-only — it never writes.
Business glossary
The controlled vocabulary, governed in the AWS Glue Data Catalog. Browse what's defined, propose a new term in plain language, and (as a reviewer) promote it live — where every tool and agent immediately consumes it. This is §5.6 of the guide, working.
Live terms
Pending drafts
Propose a new term
A steward writes plain business language. It's saved as a reviewed draft — nothing goes live until promoted.
Governed answers over Redshift. Definition pulled live from the glossary.
The multi-campus student-records warehouse lives in Amazon Redshift (one database, one schema, a campus dimension). Its raw columns are deliberately ambiguous — three different "credits" columns, drifting department codes, mixed status vocabularies. The AWS Glue Data Catalog business glossary is what makes them trustworthy: when you ask a question, the agent pulls the relevant governed definitions live from the glossary, writes the exact SQL the glossary prescribes, runs it on Redshift, and answers with the SQL + provenance. Edit a definition once in the catalog and every answer changes — with zero code change. Ask below; it queries the live warehouse.
Ask the governed warehouse
checking warehouse…bedrock-university · database university · read-only · governed by the Glue glossaryThe one business glossary
These governed terms live ONCE in the Glue Data Catalog business glossary. Each term's definition carries the exact metric SQL. The agent reads them live, so this list is what governs every answer — the lakehouse plane binds to the same terms.
Define once. Govern every answer.
The headline proof, verified live: the agent holds zero baked metric definitions. It pulls them from the glossary at query time — so a steward editing a term in the catalog instantly changes how the agent computes, with no redeploy.
| Governed term | What the glossary defines |
|---|---|
| Enrolled Headcount | COUNT(DISTINCT student_pidm) where registration is active |
| Earned Credit Hours | sum of credits_earned (not attempted; W/I/AU excluded) |
| DFW Rate | 1.0 * SUM(CASE grade IN (D,F,W))/COUNT(*) — engine-neutral |
The technology
Every layer is a managed AWS service. Here's what each does, how we use it, and how it scales — newest capabilities from the AWS Summit New York 2026 included.
How the platform evolves
We adopted the managed pieces as they reached GA, designed the data model to ride preview features, and aligned to the two published contracts of AWS Context — so adopting it later is re-pointing a tool, not a re-platform.
The AWS Context horizon
AWS Context is AWS's own managed knowledge-graph + agentic-search service. It maps relationships across your data, serves governed relationships, business rules, and domain knowledge to agents at runtime, learns from how agents use it, and — crucially — stores its metadata as Iceberg in S3 Tables: exactly the substrate this layer already writes.
Because our data is Iceberg-in-S3-Tables, our semantics live in the Glue Data Catalog, and all agent access is MCP behind the gateway, adopting AWS Context becomes a migration of one tool target — the institution's accumulated meaning carries straight over.