Section 56.4: Memory errors | Building Embodied AI: From Perception to Autonomous Action

"I was useful yesterday, stale today, and somehow still very persuasive."
An Aging Memory Trace

Technical illustration for Section 56.4: Memory errors. — **Figure 56.4A**: A plausible but stale memory can be more dangerous than no memory at all.

Big Picture

Memory errors matter because a memory system can fail by returning stale, aliased, or overconfident information that the planner treats as if it were current state.

Trust Must Be Computed

Memory is not safe because it came from the past; it is safe only when provenance, context match, freshness, and conflict with current observations are all checked before action conditioning.

Theory

Memory safety needs an explicit trust score. One simple model is

$$\rho(m) = \lambda_1 \cdot \mathrm{source\_reliability}(m) + \lambda_2 \cdot \mathrm{context\_match}(m) - \lambda_3 \cdot \mathrm{age}(m) - \lambda_4 \cdot \mathrm{conflict}(m).$$

When $\rho(m)$ is too low, the memory should not directly condition action. The system should re-observe, ask for human help, or choose a conservative fallback.

That trust score only works if the memory schema stores the necessary fields: write timestamp, source sensor or operator, embodiment tag, conflict with live observations, and whether the item was previously overruled by a safety monitor. Memory safety is therefore partly a data-model problem, not only a planner problem.

This requirement is especially important in dynamic environments. A memory system that cannot represent conflict with present observations effectively treats past context as more authoritative than the world itself.

Memory Error Taxonomy

Error Type	Mechanism	Observable Symptom	Preferred Mitigation
Staleness	world changed after storage	memory conflicts with current sensors	freshness thresholds and forced re-observation
Aliasing	wrong but similar item retrieved	plausible yet incorrect plan branch	better metadata filters and embodiment tags
Overconfidence	summary presented as certain fact	system stops seeking new evidence	confidence calibration and uncertainty-aware routing
Poisoning	faulty or adversarial memory write	repeated harmful retrieval from same source	source validation and write-side governance

These error classes should not be merged under a generic "hallucination" label. Each one implies a different system remedy and a different audit trail.

Worked Example

A hospital delivery robot may remember that corridor C is usually open, but if a new isolation barrier appeared this morning, that memory has become a hazard unless it is checked against current sensors or facility updates.

memory_item = {
    "source_reliability": 0.9,
    "context_match": 0.4,
    "age": 0.8,
    "conflict": 0.7,
}

rho = (
    1.0 * memory_item["source_reliability"]
    + 1.2 * memory_item["context_match"]
    - 1.0 * memory_item["age"]
    - 1.1 * memory_item["conflict"]
)
decision = "reobserve_or_request_help" if rho < 0.2 else "memory_allowed"
print({"rho": round(rho, 2), "decision": decision})

{'rho': -0.19, 'decision': 'reobserve_or_request_help'}

Code Fragment 56.4.1 computes a trust score for a retrieved memory item before the planner is allowed to rely on it.

The expected output shows a memory that should be rejected for action guidance. High source reliability alone is not enough when age and conflict with current context are severe.

Library Shortcut

Store memory items in a database with freshness, provenance, coordinate frame, conflict score, and rejection reason, then run an acceptance filter before the planner consumes them. Logging accepted and rejected memories beside ROS 2 monitor events makes it possible to ask whether the wrong action began with the wrong remembered world state.

Implementation Stack

Use Open3D or SLAM map timestamps for geometric freshness, ROS 2 bags for replayable evidence, NetworkX for explicit dependency graphs between memory records, and PyTorch or JAX scoring models only when their trust score is calibrated against held-out failures. Weights & Biases or TensorBoard should track rejection precision, missed stale-memory failures, and downstream policy changes under the same evaluation panel.

Algorithm: Memory Safety Filter

Score each retrieved memory for freshness, context match, source reliability, and conflict with current observations.
Allow direct action conditioning only above a trust threshold.
Below threshold, request re-observation, alternate planning, or human input.
Log every rejected memory item for offline diagnosis.
Estimate how often the safety filter prevented a downstream failure.

Common Failure Mode

A stale memory that sounds plausible is often more dangerous than missing memory, because the agent may act decisively on the wrong world model.

Practical Example

A drone with remembered wind conditions from ten minutes ago should not reuse that memory blindly after entering a new street canyon. Fresh anemometer or visual evidence should dominate the old estimate.

A strong failure artifact here is a rejected-memory ledger that records item id, trust score, rejection reason, replacement observation, and whether the filter prevented a downstream failure. That ledger turns vague discussions about stale memory into measurable safety outcomes.

Research Frontier

One frontier question is whether trust thresholds should be fixed, learned, or context-conditioned. A threshold that works in a warehouse may be too permissive in a hospital or too conservative for a time-critical drone task, which makes memory governance a policy-design problem as much as a database problem.

Self Check

Can you point to one memory field that would force re-observation before action? If the answer is vague, the trust model is still decorative rather than operational.

Self Check

Can you specify one condition under which a memory should be rejected even if it comes from a trusted source? If not, the trust model is probably ignoring staleness or context conflict.

Research Frontier

One frontier question is whether trust thresholds should be fixed, learned, or context-conditioned. A threshold that works in a warehouse may be too permissive in a hospital or too conservative for a time-critical search task, which makes memory governance a policy-design problem as much as a database problem.

Self Check

Can you point to one memory field that would force a re-observation before action? If the answer is vague, the trust model is still decorative rather than operational.

Key Takeaway

Memory systems need freshness and conflict checks. Retrieval without trust gating can turn useful history into unsafe action.

Exercise 56.4.1

Define a trust score for a robot that remembers door states in an office building. Include at least one term for age and one term for conflict with current observations.

Section References

Parisotto, E. and Salakhutdinov, R. Neural Map: Structured Memory for Deep Reinforcement Learning. ICLR, 2018.

Use for differentiable spatial memory and the distinction between stored geometry and policy state.

Chaplot, D. S. et al. Neural Topological SLAM for Visual Navigation. CVPR, 2020.

Use for map-like memory that supports navigation decisions rather than generic retrieval.

What's Next?

Next, move to Chapter 57, where memory and adaptation become a continual-learning problem.