A Careful Control Loop
Read the figure as an action-API audit. Tool use is safe only when every callable action has typed inputs, preconditions, postconditions, timeout behavior, and a replanning path after failure.
Build And Evaluation Checklist
Depth and self-containment. This section must explain how an LLM chooses among tools, how typed action APIs constrain execution, and how verifier failures trigger replanning rather than silent drift.
Production and evaluation contract. The minimum artifact records the selected tool, its arguments, the verifier output, the replanning trigger, and the new plan. Without these fields, tool-use claims are impossible to audit.
For Tool use, action APIs, plan verification, replanning, name the language interface, grounded world state, executable action contract, and evidence artifact before trusting any claimed improvement.
For Tool use, action APIs, plan verification, replanning, write one evidence row recording instruction, world-state estimate, chosen action, verifier result, and failure label. Then identify which field would change first under command misunderstanding.
Tool use and replanning is where embodied LLM systems stop being chatbots with access to robots and become control architectures. Typed APIs and verifiers make the difference.
This section connects LLM planning to the practical machinery of action APIs, typed arguments, postconditions, and replanning under failure.
The practical question is not only which tool to call, but what evidence should force the planner to abandon the current plan and synthesize a new one.
A tool call without a verifier is just a wish. Replanning starts when the verifier says the wish did not come true.
Theory
Let $u_i$ be typed tools and let $v_i(s_t, a_t)$ be a verifier for the postcondition of tool $u_i$. An embodied planner executes a loop $$u_t, \theta_t = \phi_\text{LLM}(h_t), \qquad y_t = u_t(\theta_t), \qquad b_t = v_t(y_t, s_{t+1}),$$ then continues only if the boolean or scalar verifier signal $b_t$ passes a threshold.
This structure matters because embodied actions are long running and failure prone. A free-text chain of thought may say 'now place the mug on the tray,' but the actual robot needs a typed `place(target='tray')` call, a completion signal, and a postcondition check such as `object_on_tray=True` before the next reasoning step can be trusted.
The clean mental model is planner, tool, verifier, replan. Every transition should be explicit and typed. If the verifier fails, the planner does not merely continue with reduced confidence; it reasons over a new state that includes the failure evidence.
Worked Example
Code Fragment 1 models a single tool-call decision with verification and replanning. The point is to expose the contract between textual planning and executable interfaces.
# Verify every tool call before advancing the high-level plan.
# Failed postconditions should trigger replanning from the new world state.
# This is the smallest useful embodied tool-use loop.
tool_call = {"tool": "pick", "args": {"target": "red_mug"}}
postcondition = {"grasp_success": False}
next_step = "replan" if not postcondition["grasp_success"] else "continue"
print({"tool": tool_call["tool"], "postcondition": postcondition, "next_step": next_step})
The expected output is a failed postcondition paired with an explicit replanning decision. This is what a healthy embodied agent loop should emit when the API call returned but the world did not enter the intended state, because execution success and state success are not the same event.
Structured tool-calling runtimes and workflow frameworks like LangGraph implement most of this control shell in a few lines. They absorb message passing and state updates, but the engineer still owns typed arguments, verifier design, and failure semantics.
Practical Recipe
- Define each tool with typed arguments, explicit preconditions, and explicit postconditions.
- Run a verifier after every consequential tool call.
- Store tool failures as state updates, not as logging afterthoughts.
- Replan from the updated state rather than repeating the old chain of thought verbatim.
- Keep tool sets small and semantically distinct so tool choice remains learnable and auditable.
Many demos treat tool return values as if they were proof of world change. In robotics that is unsafe: the API may report completion while the gripper is empty, the object slipped, or the robot timed out mid-motion.
A mobile manipulator may call `navigate(goal='sink')`, then `pick(target='sponge')`, then `wipe(region='spill')`. Each tool should expose a verifiable postcondition. If the sponge is not actually grasped, it is meaningless to continue the scripted sequence.
The robot's favorite fiction genre is the API that says 'completed successfully' while the mug is still on the table.
The frontier here is better contracts between language models and robot middleware: typed schemas, richer failure reports, stronger verifiers, and planning systems that treat errors as informative state rather than as dead ends.
Can you name one postcondition in your stack that is currently assumed rather than measured, and what kind of false progress that assumption could create?
Tool use in embodied settings differs sharply from tool use in text-only agents. A web-search call either returned a result or did not; a robot-skill call may return while the world remains in the wrong state. That is why postcondition verification is structurally central, not merely best practice.
Replanning also deserves a precise meaning. It is not simply re-prompting the model. It is re-prompting on an updated state that includes fresh observations, failure evidence, elapsed time, and possibly depleted resources or changed safety margins.
| Tool or Library | Role in the Topic | Builder Advice |
|---|---|---|
| LangGraph | Planner state, tool routing, and retry loops. | Use it when you want explicit graph structure around LLM tool use. |
| ROS 2 actions | Typed skill invocation with feedback and cancelation. | Use actions when tools map to robot skills rather than instant function calls. |
| BehaviorTree.CPP | Fallback and recovery orchestration. | Use it when verifier failures should branch into deterministic recovery logic. |
| MoveIt 2 | Geometric planning behind action APIs. | Use it when high-level tools need reliable motion generation. |
| Pydantic or JSON schema | Argument validation for tool calls. | Use them to reject malformed plans before any execution request leaves the planner. |
Code Fragment 2 preserves the tool call, verifier outcome, and replanning reason in one record. That is the correct unit for comparing agent frameworks because it captures whether failures were caught early enough to matter.
- Log each tool call with arguments and timestamps.
- Store the verifier result and the specific violated postcondition.
- Pass the verifier message into the replanning prompt or state graph.
- Track how often replanning fixed the task versus only delaying failure.
- Evaluate tool-use agents on the same tool set and verifier suite when making comparisons.
The expected output is a typed tool trace that exposes not only failure, but the postcondition evidence that caused replanning. A stronger framework should still produce this exact kind of local diagnosis, otherwise apparent planner improvements may just be hiding missing verifier semantics.
If tool-using agents underperform, inspect whether the tool schema is weak, the verifier is weak, or the replanning state update is weak. Those three interfaces are where most embodied LLM loops actually break.
Typed tools, postcondition verifiers, and explicit replanning are the core of practical embodied LLM control loops.
Design one typed action API and one postcondition verifier for a robot skill of your choice. Then describe the replanning information that should be passed back to the LLM if the verifier fails.
LangGraph is a practical reference for explicit stateful tool-use loops around LLM planners.
ROS 2 Documentation. 'Creating an action.'
ROS 2 actions are a canonical typed interface for embodied tools with feedback and cancelation.
BehaviorTree.CPP Documentation. 'Integration with ROS2.'
Behavior trees provide a well-tested execution shell for tool verification and recovery.