The Agent That Forgot What You Built Yesterday

Coding agents can fix a bug and open a PR while you sleep. But unless memory is wired in properly, the next session starts from zero. Here is what that costs, and what it looks like when the problem is solved properly.

You ask your coding agent to add rate limiting to the API. It does the work. In the morning there is a PR, tests green, ready to merge. Good session.

Two days later you open a new session and ask it to harden the auth middleware. The agent reads the repo, thinks for a moment, and asks: "What JWT library are you using?" You already answered that. The agent built half the system and still does not know.

This is the current state of most agentic coding setups. The tool is useful. The memory is broken.


The gap matters more now because the tools themselves have gotten genuinely capable. Agents can write changes, run tests, and open a GitHub PR - while you are asleep.

Perhaps the most important shift this year is the emergence of long-running autonomous workflows: instead of responding to a single prompt, agents now operate through execution loops. That is a real shift in what delegation means for an engineering team. But the value of that shift drops steeply if every session starts cold.

The default fix - stuff more context into the prompt - is not viable for long. Brute-force token expansion is no longer a financially viable strategy: the cost of processing massive context windows leads to geometric escalation in inference spend as token counts rise into the hundreds of thousands.

So the interesting engineering question has become: what should an agent actually remember, and in what form?


Last week Memori Labs announced it has been integrated as a core memory provider for Nous Research's Hermes Agent. The detail worth paying attention to is not the partnership itself - it is the specific thing Memori captures. Memori gives Hermes agents long-term persistent memory that captures not only conversation but also agent trace and execution. As Hermes completes tasks, Memori structures memory from the agent's actions: tool calls, workflow steps, assistant decisions, and outcomes. This means Hermes can remember what happened during prior executions, not just what was said in the transcript.

That distinction - what the agent did versus what was said - is the thing most memory implementations miss. Instead of stuffing old conversation history into every prompt, Hermes can retrieve the structured context it needs to continue work, avoid repeated mistakes, preserve project knowledge, and improve across sessions.

The benchmark numbers are striking: 81.95% accuracy on LoCoMo using only 1,294 tokens per query - roughly 5% of full-context cost, saving more than 95% on inference costs. Whether those numbers hold across real production workloads is always the open question, but the direction is right. Efficient retrieval of structured execution history is a better deal than raw context expansion.

A memory system that stores what the agent decided is more useful than one that stores what you asked.


The broader infrastructure landscape is catching up to the same idea. In 2026, memory is increasingly treated as a dedicated architectural component separate from the model's context window, not just a longer prompt.

During conversations, the memory layer extracts facts and stores them in a vector database indexed by user, session, and agent identifiers. At the start of a new session, relevant memories are retrieved using semantic similarity, keyword matching, and entity matching, then injected into the context window before the model responds - keeping token usage low and retrieval precise.

The practical challenge is scope control. Multi-scope memory tags each write with identity scopes: user-level facts that persist across sessions, agent-level facts tied to a specific instance, session-scoped facts for a single run, and org-level shared context. Get the scoping wrong and you end up with either data bleed across projects or no useful recall at all.

For teams building on top of LangGraph, the surface area is familiar: the MongoDB Store for LangGraph brings flexible long-term memory to AI agents, allowing agents to remember and build on previous interactions across multiple sessions instead of only retaining memory for the current session. It complements short-term checkpointing rather than replacing it.


The security caveat is real and should be named. Because agents can access email accounts, calendars, messaging platforms, and other sensitive services, misconfigured or exposed instances present security and privacy risks. The agent is also susceptible to prompt injection attacks, in which harmful instructions are embedded in data with the intent of getting the LLM to interpret them as legitimate user instructions. When memory is persistent across sessions, a bad injection from session one could surface in session three. The memory layer is not just a performance concern - it is a security surface.

Memory should be fully scoped and isolated by project, process, session, and entity, preventing data bleed across users and contexts. That isolation is not optional overhead. It is what makes it safe to give the agent write access to anything important.


For teams that want to move from "agent as one-shot tool" to "agent that builds knowledge on a project over time," the practical steps are concrete:

  • Wire in an execution trace store alongside conversation memory - the agent's actions are more recoverable than its chat history.
  • Scope memory writes tightly. Project-level context should not bleed into org-level retrieval.
  • Audit what is being persisted. A memory file that is never inspected is a prompt injection waiting to happen.
  • Test recall explicitly. Ask the agent a question whose answer only exists in a prior session. If it gets it wrong, your retrieval is broken.

A teammate like Beagle is positioned to do some of this naturally - surfacing what was decided in a thread from last Tuesday before an agent picks up the same task today - but the deeper execution trace layer still has to be built intentionally. The tools are there. Most teams just have not wired them together yet.

The agent that forgot what you built yesterday is not a model problem. It is an infrastructure problem, and the infrastructure is finally good enough to solve it.