Table of contents
What everyone is getting wrong about context graphs
.png)
A lot of the recent “context graph” articles talk circles around proper structure and required features. Useful, but it misses the thing that makes context hard in real products.
Context does not live in a neat pile of documents. It lives in other systems, usually someone else’s system. Your CRM. Your ticketing tool. Your data warehouse. Your billing provider. A calendar. A Slack workspace. Ten different SaaS tools, each with its own auth model, rate limits, and weird data quirks.
So the context graph that actually matters in production isn't a diagram you draw once. It is a runtime system that decides, request by request:
- Which systems to consult
- Whether to call an API or use cached data
- Whether the caller is allowed to see the result
- How to stitch the results together into something the model can use
If you want a practical definition: a context graph is a coordination layer that turns “I have access to a bunch of tools” into “I can assemble the right slice of reality for this user, right now.”
That coordination layer is where the hard engineering lives. It's also where most “context graph” articles get hand-wavy, because integrations force you to make tradeoffs in public.
Why this isn't a knowledge graph story
Some context graph writing reads like knowledge graph 2.0, with extra metadata sprinkled on top. There is overlap, but the intent is different.
Classic enterprise knowledge graphs tend to optimize for correctness and durability. They model entities and relationships in a stable way, often built on a triplet-style representation (subject, predicate, object).
Context graphs, at least the versions that matter for LLM products, optimize for usefulness under constraints:
- Timeliness: what is true this week, not in general
- Audience: what this user can see, not what exists
- Cost: what is worth fetching, not what is possible to fetch
- Traceability: what can be defended later, not what “sounds right”
There is a research line that frames “context graphs” as knowledge graphs plus additional context like time validity and provenance. That is directionally right, but the bigger shift is operational: once you admit time and source and permission are first-class, you are no longer building a static graph. You are running a context assembly system.
Integrations are what make that shift unavoidable. An internal doc index can pretend everything is accessible and cheap. A Salesforce API cannot. A ticketing system will throttle you. A user’s OAuth token will expire. A workspace admin will revoke scopes. The “graph” is the easy part. The selection and enforcement logic is the product.
The real core: smart data selection
Here's the uncomfortable truth: the most important job of a context graph is choosing what not to pull.
Every user request comes with budgets, even if you do not write them down:
- Latency budgets (you have 300 ms, or you have 3 seconds, pick one)
- Cost budgets (API calls are not free, neither is token usage)
- Rate-limit budgets (you get 100 calls per minute, shared across users)
- Trust budgets (some sources are noisy, some are authoritative)
- Attention budgets (the model can only “hold” so much before it blurs)
A context graph that always fetches “everything relevant” isn't sophisticated. It's expensive and slow, and it usually makes answers worse by stuffing in mediocre context.
Selection is a ranking problem, but it's also an orchestration problem. A serious system asks questions like:
- Should I even hit the CRM, or can I answer from the last known snapshot?
- Is this question asking for a point-in-time state (needs live) or policy (cache is fine)?
- Do I fetch the top one source first, then decide if I need more?
- If an integration is down, what is the graceful fallback?
This is where “graph” becomes “control plane.” The graph is your internal representation of what you could know. The selection engine is what you choose to know right now.
If you want a concrete mental model, treat context like a shopping cart with a strict spend limit. Every item you add has a price: milliseconds, tokens, quota, and risk. A good context graph is picky.
Three tiers of context: live, cached, derived
Once you start integrating third-party systems, you end up with three practical tiers of context. You can pretend you do not. You will end up here anyway.
1. Live API calls
- Best for: current state, rapidly changing data, anything high-stakes
- Examples: “What is the status of the Acme renewal?” “Did that invoice get paid?” “Is the incident still open?”
- Pros: freshest, most defensible
- Cons: slow, expensive, auth-heavy, fragile.
Live calls are where permission checks bite, because access often depends on the user’s token, not your app’s service account. Live calls are also where you pay the coordination tax: retries, timeouts, backoff, partial responses, schema drift.
2. Local cache or snapshots
- Best for: data that changes, but not minute-to-minute, and where fast answers matter
- Examples: account summaries, open opportunities list, “most recent 20 tickets,” org charts
- Pros: fast, cheap, predictable
- Cons: can be stale, requires invalidation strategy
Caching isn't just an optimization. It's a design choice that forces you to define freshness. “Fresh enough” needs to be encoded, not implied. That usually means time-based TTL plus event-based invalidation when you can get it (webhooks, change streams).
3. Derived context (summaries, embeddings, prior conclusions)
- Best for: condensation and retrieval, especially for large bodies of text
- Examples: ticket thread summaries, “what changed since last week,” extracted entities and relationships, semantic indexes over docs
- Pros: cheap at query time, scales well
- Cons: lossy, can age poorly, easy to misapply.
Derived context is where a lot of teams get burned. They store a summary and treat it like truth. Then the underlying object changes, permissions change, or the user asks a question that requires nuance the summary threw away.
A mature context graph treats these tiers like a ladder. It starts with cheap signals, then climbs to more expensive sources if needed. It also knows when to bail out, because “more” isn't always better.
A simple selection pattern that works
For many product flows, a straightforward strategy beats fancy heuristics:
1. Start with cached, scoped context (fast, permission-safe).
2. If confidence is low or freshness matters, do one live call to the highest-yield integration.
3. Stop early when you have enough to answer.
4. Only then pull heavier context (full threads, large docs), and only if the request demands it.
The important part isn't the order. It's that instead of letting every query fan out into a dozen integrations, the context graph has an explicit policy for cost and freshness.
Provenance and traceability: the part everyone skips
Once third-party data is in the loop, provenance stops being a “nice to have.” It becomes the difference between a system you can run and a system you will be afraid to ship.
At minimum, every context node you pass to a model should be able to answer:
- Where did this come from? (system, object id, endpoint)
- When did we fetch it? (timestamp)
- Under what identity and scope? (user token vs service token, scopes)
- Was it transformed? (summarized, extracted, merged)
- What should invalidate it? (TTL, webhook event, permission change)
This isn't just for audits, though audits will come. It's for debugging.
When a model gives a wrong answer, the first question is almost never “why did the model do that?” The first question is “what did we show it?”
Without traceability, you can't answer that. You can't reproduce the run, because you don't know which sources were used. You can't fix the bug, because you don't know which node was stale. You can't explain the behavior, because you can't point back to what the system believed at the time.
There is also a trust problem. People are getting better at spotting generic, overly-polished text and treating it with suspicion. Researchers have even tracked specific “AI-favored” wording creeping into human language over time. In that environment, the way you build trust is not with confident prose. It's with grounded answers, and grounded answers require traceable sources.
One more reason: permission drift.
Integrations are permission surfaces. A user can lose access to a folder, leave a Slack channel, have their CRM role changed, or revoke OAuth consent. If you don't attach permission context to what you stored or derived, you can easily leak data by accident through cached or summarized nodes.
Provenance is how you prevent “I used to be allowed to see this” from turning into “the assistant still remembers it.”
What this looks like in practice
Say a user asks: “Give me a one-paragraph status update on Acme. Include renewal risk and the last support interaction.”
A context graph built around integrations might do something like:
- Check a cached “Account Summary” node for Acme (freshness window: 15 minutes)
- If missing or stale, call CRM for opportunity stage and renewal date (live)
- Query ticketing system for last interaction (live or cached, depending on SLA)
- Pull only the last 5 tickets, not the full history
- Generate a derived summary for the ticket excerpt, tagged with: source ticket IDs, fetch timestamps, user identity used for access
- Assemble final context bundle with provenance metadata
- Generate the update, and keep the linkable trace internally so you can answer “why did you say that?”
The value isn't that you have a graph. The value is that you made disciplined choices: small pulls, permission-aware fetches, provenance captured at every step.
That's the difference between a demo and a system you can operate.

.png)


