A Careful Control Loop
Read the figure as a memory-validity audit. A planner may remember prior observations, but physical tasks require timestamps, scope, invalidation rules, and a check that retrieved state still matches the current scene.
Build And Evaluation Checklist
Depth and self-containment. This section must explain why memory in embodied systems is a state-estimation problem, not only a long-context problem. Readers should leave knowing which facts must be grounded and refreshed from sensors.
Production and evaluation contract. The artifact should record remembered facts, their source, freshness, and whether they were later verified or contradicted by perception. Otherwise hallucination remains a vague label.
For Memory, state tracking, and hallucination in physical tasks, name the language interface, grounded world state, executable action contract, and evidence artifact before trusting any claimed improvement.
For Memory, state tracking, and hallucination in physical tasks, write one evidence row recording instruction, world-state estimate, chosen action, verifier result, and failure label. Then identify which field would change first under command misunderstanding.
Memory and hallucination in embodied agents is about keeping world state synchronized with words. The agent must remember object identities, task progress, and user preferences without turning stale guesses into confident plans.
This section shows how LLM memory should be paired with explicit state tracking so that past context helps planning without silently overriding new sensor evidence.
The practical question is which memories should live as symbolic facts, which should live as scene state, and how hallucinated memories should be caught before action.
Embodied memory is only useful if it carries provenance and freshness. A remembered object location with no timestamp is not memory; it is a latent bug.
Theory
Let memory items be facts $m_i = (f_i, c_i, t_i)$ with content, confidence, and timestamp. A planner should reason over a belief state $$b_t = p(s_t \mid o_{1:t}, a_{1:t-1}, m_{1:t}),$$ not over free-floating text summaries alone. New observations should update or erase memory items whose confidence is no longer justified.
Hallucination in embodied tasks often means one of three things: inventing an object or tool, asserting a stale state as current, or carrying a wrong relational fact across scene changes. The fix is rarely 'better prompting' alone. It is usually a better contract between memory, observation, and verification.
A good memory system separates semantic memory, such as user preference, from dynamic world state, such as object location. The first may persist across episodes; the second should expire quickly or be refreshed from sensors before use.
Worked Example
Code Fragment 1 stores two memories with different freshness and shows how the planner should gate them before use. The example demonstrates why timestamps belong in the memory schema.
# Reject stale world-state memory while keeping durable preference memory.
# Embodied memory should store freshness and source, not just text.
# This keeps old observations from masquerading as current state.
memory = [
{"fact": "user_prefers_blue_mug", "age_s": 600, "durable": True},
{"fact": "red_mug_is_on_counter", "age_s": 45, "durable": False},
]
usable = [m["fact"] for m in memory if m["durable"] or m["age_s"] < 10]
print(usable)
The expected output is a memory subset where durable user preferences survive but stale scene claims do not. The point is that embodied memory should grant planning authority only to facts whose lifespan matches the kind of fact they are, not to every retrieved sentence equally.
State stores, graph memories, and vector memories can all hold the facts, but they are only safe in robotics when coupled to freshness metadata and sensor-side verification hooks. The library can manage retrieval; it cannot decide which physical facts are still true.
Practical Recipe
- Store memory items with source, timestamp, confidence, and type.
- Separate durable preferences from dynamic world-state facts.
- Refresh or invalidate dynamic facts before high-consequence actions.
- Never let retrieved text bypass a verifier when the action depends on current geometry.
- Log contradictions between memory and observation as first-class events.
The easiest hallucination to miss is not a novel object. It is a plausible but stale memory, such as believing the mug is still on the counter after another agent already moved it.
A household robot may remember that the user prefers tea in the blue mug across many days, but it should not remember that the blue mug is on the left shelf unless that fact was refreshed by recent perception. One memory is durable preference; the other is dynamic scene state.
Embodied hallucination is often just nostalgia with a manipulator attached.
Current research explores memory graphs, learned world models, and verifier-guided long-horizon planning for embodied agents. The open challenge is keeping memories useful across long tasks without allowing stale facts to outrank fresh sensor evidence.
Can you list one fact in your system that should persist across sessions and one that should expire within seconds unless perception reconfirms it?
This section connects directly to classical filtering and SLAM. The novelty is that language memories and symbolic task facts must join the same belief-management discipline as geometric state. Otherwise the planner treats a ten-minute-old caption and a ten-millisecond-old sensor reading as equally authoritative.
That is also why hallucination should be decomposed. A model may hallucinate semantically, but many embodied 'hallucinations' are actually stale-state propagation errors. Better memory schemas, not bigger models, are often the right fix.
| Tool or Library | Role in the Topic | Builder Advice |
|---|---|---|
| LangGraph or explicit state graph | Planner-visible memory state. | Use it when memory items should change planner behavior in transparent ways. |
| Semantic map or object tracker | Grounded dynamic world state. | Use it when remembered object locations must be refreshed from sensors. |
| Vector store with metadata | Retrieval of durable semantic context. | Use it for user preferences or long-range task summaries, not raw geometry. |
| Pydantic schemas | Typed memory records with freshness fields. | Use them to prevent planner logic from consuming untyped memory blobs. |
| Verifier layer | Checks remembered facts against observation. | Use it whenever an action depends on the present physical world. |
Code Fragment 2 stores a memory record with provenance and freshness. This is the minimum structure needed to talk coherently about embodied hallucination instead of merely complaining that the agent 'made something up.'
- Tag each memory by type: preference, world state, task progress, or explanation.
- Attach timestamps and evidence sources to every remembered fact.
- Force memory retrieval to pass through a fact-validity gate before execution.
- Record contradiction events when perception and memory disagree.
- Evaluate memory systems on tasks with delayed execution and hidden state changes.
The expected output is a provenance-rich memory record that blocks direct action because the scene fact is too old. This is exactly the kind of trace you want before calling a behavior a hallucination, since the deeper mechanism is often stale world state rather than fabricated semantics.
When memory-rich agents fail, check whether the wrong fact was retrieved, whether the fact was stale, or whether the verifier failed to challenge it. Those paths lead to very different architectural fixes.
Embodied memory is valuable only when it behaves like a state-estimation aid rather than an untyped bag of text.
Design a memory schema for an embodied assistant that stores both user preferences and object locations. Include the fields needed to keep one durable and the other freshness-limited.
EmbodiedBench is useful for evaluating long-horizon embodied tasks where memory and replanning matter.
LangGraph is a practical reference for explicit stateful agent memory rather than opaque prompt concatenation.
GTSAM is a classical reference for state-estimation discipline, useful here as a conceptual comparison for how embodied memory should treat uncertainty and updates.