Section 33.6: Tool use, action APIs, plan verification, replanning | Building Embodied AI: From Perception to Autonomous Action

A Careful Control Loop

Read the figure as an action-API audit. Tool use is safe only when every callable action has typed inputs, preconditions, postconditions, timeout behavior, and a replanning path after failure.

Figure 33.6: A closed-loop map for Tool use, action APIs, plan verification, replanning. The diagram forces the reader to name the input, model boundary, action interface, and evidence record before trusting the system.

Build And Evaluation Checklist

Depth and self-containment. This section must explain how an LLM chooses among tools, how typed action APIs constrain execution, and how verifier failures trigger replanning rather than silent drift.

Production and evaluation contract. The minimum artifact records the selected tool, its arguments, the verifier output, the replanning trigger, and the new plan. Without these fields, tool-use claims are impossible to audit.

Checklist Memory Anchor

For Tool use, action APIs, plan verification, replanning, name the language interface, grounded world state, executable action contract, and evidence artifact before trusting any claimed improvement.

Mini Audit Exercise

For Tool use, action APIs, plan verification, replanning, write one evidence row recording instruction, world-state estimate, chosen action, verifier result, and failure label. Then identify which field would change first under command misunderstanding.

Big Picture

Tool use and replanning is where embodied LLM systems stop being chatbots with access to robots and become control architectures. Typed APIs and verifiers make the difference.

This section connects LLM planning to the practical machinery of action APIs, typed arguments, postconditions, and replanning under failure.

The practical question is not only which tool to call, but what evidence should force the planner to abandon the current plan and synthesize a new one.

Action Is The Test

A tool call without a verifier is just a wish. Replanning starts when the verifier says the wish did not come true.

Theory

Let $u_i$ be typed tools and let $v_i(s_t, a_t)$ be a verifier for the postcondition of tool $u_i$. An embodied planner executes a loop $$u_t, \theta_t = \phi_\text{LLM}(h_t), \qquad y_t = u_t(\theta_t), \qquad b_t = v_t(y_t, s_{t+1}),$$ then continues only if the boolean or scalar verifier signal $b_t$ passes a threshold.

This structure matters because embodied actions are long running and failure prone. A free-text chain of thought may say 'now place the mug on the tray,' but the actual robot needs a typed `place(target='tray')` call, a completion signal, and a postcondition check such as `object_on_tray=True` before the next reasoning step can be trusted.

Mechanism

The clean mental model is planner, tool, verifier, replan. Every transition should be explicit and typed. If the verifier fails, the planner does not merely continue with reduced confidence; it reasons over a new state that includes the failure evidence.

Worked Example

Code Fragment 1 models a single tool-call decision with verification and replanning. The point is to expose the contract between textual planning and executable interfaces.

# Verify every tool call before advancing the high-level plan.
# Failed postconditions should trigger replanning from the new world state.
# This is the smallest useful embodied tool-use loop.
tool_call = {"tool": "pick", "args": {"target": "red_mug"}}
postcondition = {"grasp_success": False}

next_step = "replan" if not postcondition["grasp_success"] else "continue"
print({"tool": tool_call["tool"], "postcondition": postcondition, "next_step": next_step})

{'tool': 'pick', 'postcondition': {'grasp_success': False}, 'next_step': 'replan'}

The expected output is a failed postcondition paired with an explicit replanning decision. This is what a healthy embodied agent loop should emit when the API call returned but the world did not enter the intended state, because execution success and state success are not the same event.

Code Fragment 1: This loop makes the replanning trigger explicit. The planner does not advance just because the `pick` tool returned; it advances only if the postcondition confirms the intended world change actually happened.

Library Shortcut

Structured tool-calling runtimes and workflow frameworks like LangGraph implement most of this control shell in a few lines. They absorb message passing and state updates, but the engineer still owns typed arguments, verifier design, and failure semantics.

Practical Recipe

Define each tool with typed arguments, explicit preconditions, and explicit postconditions.
Run a verifier after every consequential tool call.
Store tool failures as state updates, not as logging afterthoughts.
Replan from the updated state rather than repeating the old chain of thought verbatim.
Keep tool sets small and semantically distinct so tool choice remains learnable and auditable.

Common Failure Mode

Many demos treat tool return values as if they were proof of world change. In robotics that is unsafe: the API may report completion while the gripper is empty, the object slipped, or the robot timed out mid-motion.

Practical Example

A mobile manipulator may call `navigate(goal='sink')`, then `pick(target='sponge')`, then `wipe(region='spill')`. Each tool should expose a verifiable postcondition. If the sponge is not actually grasped, it is meaningless to continue the scripted sequence.

Memory Hook

The robot's favorite fiction genre is the API that says 'completed successfully' while the mug is still on the table.

Research Frontier

The frontier here is better contracts between language models and robot middleware: typed schemas, richer failure reports, stronger verifiers, and planning systems that treat errors as informative state rather than as dead ends.

Self Check

Can you name one postcondition in your stack that is currently assumed rather than measured, and what kind of false progress that assumption could create?

Tool use in embodied settings differs sharply from tool use in text-only agents. A web-search call either returned a result or did not; a robot-skill call may return while the world remains in the wrong state. That is why postcondition verification is structurally central, not merely best practice.

Replanning also deserves a precise meaning. It is not simply re-prompting the model. It is re-prompting on an updated state that includes fresh observations, failure evidence, elapsed time, and possibly depleted resources or changed safety margins.

Tool Choices For Typed Embodied Action Loops

Tool or Library	Role in the Topic	Builder Advice
LangGraph	Planner state, tool routing, and retry loops.	Use it when you want explicit graph structure around LLM tool use.
ROS 2 actions	Typed skill invocation with feedback and cancelation.	Use actions when tools map to robot skills rather than instant function calls.
BehaviorTree.CPP	Fallback and recovery orchestration.	Use it when verifier failures should branch into deterministic recovery logic.
MoveIt 2	Geometric planning behind action APIs.	Use it when high-level tools need reliable motion generation.
Pydantic or JSON schema	Argument validation for tool calls.	Use them to reject malformed plans before any execution request leaves the planner.

Code Fragment 2 preserves the tool call, verifier outcome, and replanning reason in one record. That is the correct unit for comparing agent frameworks because it captures whether failures were caught early enough to matter.

Log each tool call with arguments and timestamps.
Store the verifier result and the specific violated postcondition.
Pass the verifier message into the replanning prompt or state graph.
Track how often replanning fixed the task versus only delaying failure.
Evaluate tool-use agents on the same tool set and verifier suite when making comparisons.

The expected output is a typed tool trace that exposes not only failure, but the postcondition evidence that caused replanning. A stronger framework should still produce this exact kind of local diagnosis, otherwise apparent planner improvements may just be hiding missing verifier semantics.

Code Fragment 2: This record keeps the replanning trigger concrete by naming the failed postcondition and its explanation. That makes later comparisons between agent frameworks much more meaningful than a bare task-failure label would be.

If tool-using agents underperform, inspect whether the tool schema is weak, the verifier is weak, or the replanning state update is weak. Those three interfaces are where most embodied LLM loops actually break.

Key Takeaway

Typed tools, postcondition verifiers, and explicit replanning are the core of practical embodied LLM control loops.

Exercise 33.6.1

Design one typed action API and one postcondition verifier for a robot skill of your choice. Then describe the replanning information that should be passed back to the LLM if the verifier fails.

Bibliography and Further Reading

Primary Sources and Tools

LangGraph Documentation.

LangGraph is a practical reference for explicit stateful tool-use loops around LLM planners.

Paper or Documentation

ROS 2 Documentation. 'Creating an action.'

ROS 2 actions are a canonical typed interface for embodied tools with feedback and cancelation.

Paper or Documentation

BehaviorTree.CPP Documentation. 'Integration with ROS2.'

Behavior trees provide a well-tested execution shell for tool verification and recovery.

Paper or Documentation