"I was useful yesterday, stale today, and somehow still very persuasive."
An Aging Memory Trace
Memory errors matter because a memory system can fail by returning stale, aliased, or overconfident information that the planner treats as if it were current state.
Memory is not safe because it came from the past; it is safe only when provenance, context match, freshness, and conflict with current observations are all checked before action conditioning.
Theory
Memory safety needs an explicit trust score. One simple model is
$$\rho(m) = \lambda_1 \cdot \mathrm{source\_reliability}(m) + \lambda_2 \cdot \mathrm{context\_match}(m) - \lambda_3 \cdot \mathrm{age}(m) - \lambda_4 \cdot \mathrm{conflict}(m).$$
When $\rho(m)$ is too low, the memory should not directly condition action. The system should re-observe, ask for human help, or choose a conservative fallback.
That trust score only works if the memory schema stores the necessary fields: write timestamp, source sensor or operator, embodiment tag, conflict with live observations, and whether the item was previously overruled by a safety monitor. Memory safety is therefore partly a data-model problem, not only a planner problem.
This requirement is especially important in dynamic environments. A memory system that cannot represent conflict with present observations effectively treats past context as more authoritative than the world itself.
| Error Type | Mechanism | Observable Symptom | Preferred Mitigation |
|---|---|---|---|
| Staleness | world changed after storage | memory conflicts with current sensors | freshness thresholds and forced re-observation |
| Aliasing | wrong but similar item retrieved | plausible yet incorrect plan branch | better metadata filters and embodiment tags |
| Overconfidence | summary presented as certain fact | system stops seeking new evidence | confidence calibration and uncertainty-aware routing |
| Poisoning | faulty or adversarial memory write | repeated harmful retrieval from same source | source validation and write-side governance |
These error classes should not be merged under a generic "hallucination" label. Each one implies a different system remedy and a different audit trail.
Worked Example
A hospital delivery robot may remember that corridor C is usually open, but if a new isolation barrier appeared this morning, that memory has become a hazard unless it is checked against current sensors or facility updates.
memory_item = {
"source_reliability": 0.9,
"context_match": 0.4,
"age": 0.8,
"conflict": 0.7,
}
rho = (
1.0 * memory_item["source_reliability"]
+ 1.2 * memory_item["context_match"]
- 1.0 * memory_item["age"]
- 1.1 * memory_item["conflict"]
)
decision = "reobserve_or_request_help" if rho < 0.2 else "memory_allowed"
print({"rho": round(rho, 2), "decision": decision})
{'rho': -0.19, 'decision': 'reobserve_or_request_help'}The expected output shows a memory that should be rejected for action guidance. High source reliability alone is not enough when age and conflict with current context are severe.
Store memory items in a database with freshness, provenance, coordinate frame, conflict score, and rejection reason, then run an acceptance filter before the planner consumes them. Logging accepted and rejected memories beside ROS 2 monitor events makes it possible to ask whether the wrong action began with the wrong remembered world state.
Use Open3D or SLAM map timestamps for geometric freshness, ROS 2 bags for replayable evidence, NetworkX for explicit dependency graphs between memory records, and PyTorch or JAX scoring models only when their trust score is calibrated against held-out failures. Weights & Biases or TensorBoard should track rejection precision, missed stale-memory failures, and downstream policy changes under the same evaluation panel.
- Score each retrieved memory for freshness, context match, source reliability, and conflict with current observations.
- Allow direct action conditioning only above a trust threshold.
- Below threshold, request re-observation, alternate planning, or human input.
- Log every rejected memory item for offline diagnosis.
- Estimate how often the safety filter prevented a downstream failure.
A stale memory that sounds plausible is often more dangerous than missing memory, because the agent may act decisively on the wrong world model.
A drone with remembered wind conditions from ten minutes ago should not reuse that memory blindly after entering a new street canyon. Fresh anemometer or visual evidence should dominate the old estimate.
A strong failure artifact here is a rejected-memory ledger that records item id, trust score, rejection reason, replacement observation, and whether the filter prevented a downstream failure. That ledger turns vague discussions about stale memory into measurable safety outcomes.
One frontier question is whether trust thresholds should be fixed, learned, or context-conditioned. A threshold that works in a warehouse may be too permissive in a hospital or too conservative for a time-critical drone task, which makes memory governance a policy-design problem as much as a database problem.
Can you point to one memory field that would force re-observation before action? If the answer is vague, the trust model is still decorative rather than operational.
Can you specify one condition under which a memory should be rejected even if it comes from a trusted source? If not, the trust model is probably ignoring staleness or context conflict.
One frontier question is whether trust thresholds should be fixed, learned, or context-conditioned. A threshold that works in a warehouse may be too permissive in a hospital or too conservative for a time-critical search task, which makes memory governance a policy-design problem as much as a database problem.
Can you point to one memory field that would force a re-observation before action? If the answer is vague, the trust model is still decorative rather than operational.
Memory systems need freshness and conflict checks. Retrieval without trust gating can turn useful history into unsafe action.
Define a trust score for a robot that remembers door states in an office building. Include at least one term for age and one term for conflict with current observations.
Section References
Parisotto, E. and Salakhutdinov, R. Neural Map: Structured Memory for Deep Reinforcement Learning. ICLR, 2018.
Use for differentiable spatial memory and the distinction between stored geometry and policy state.
Chaplot, D. S. et al. Neural Topological SLAM for Visual Navigation. CVPR, 2020.
Use for map-like memory that supports navigation decisions rather than generic retrieval.
What's Next?
Next, move to Chapter 57, where memory and adaptation become a continual-learning problem.