Section 59.8: LLM-based household task planner

"I made a perfect household plan for a robot that cannot open that drawer."

A Language Planner Discovering Affordances
Big Picture

LLM-based household task planner gives Capstone Projects a concrete systems role: separate language planning from grounding, affordance checks, and controller execution. The section keeps asking what the agent observes, what it remembers or updates, which action changes, and what evidence would convince a skeptical reader.

This section develops the technical contract for llm-based household task planner into a usable mental model. First we define the object of study, then we connect it to the agent loop, then we test it with a compact implementation.

The key question in LLM-based household task planner is practical: what must the agent know, what can it observe, what action is available, and what evidence shows that the action worked under the stated conditions?

Action Is The Test

Llm household task planner should be judged by the action it improves. A section claim is strong when it names the decision, the measurement, and the failure mode before a larger model or simulator is introduced.

Theory

For LLM-based household task planner, the practical design rule is to make the interface inspectable before optimization begins: inputs, outputs, units, latency, bounds, and failure labels should all be visible in the saved artifact.

Mechanism

The mechanism in LLM-based household task planner is the contract between representation and action. Name what enters the module, what leaves it, which assumptions make that transformation valid, and which log would reveal a bad handoff.

Worked Example

For LLM-based household task planner, keep one concrete rollout in view. A sensor reading becomes an estimate, the estimate constrains an action, the action changes the world, and the next observation confirms or contradicts the assumption. The section's idea is useful only if it improves that loop.

Library Shortcut

Use an LLM planner only behind typed tools, symbolic preconditions, and executable checks. The preserved fields are user goal, parsed subgoal, tool call, world-state assertion, failed precondition, revised plan, and completed physical action.

Practical Recipe

  1. Write the observation, action, and success metric before choosing a model.
  2. Build a baseline that is simple enough to debug by inspection.
  3. Add the library implementation only after the baseline behavior is understood.
  4. Record failures as structured cases: perception error, state error, planning error, control error, or evaluation error.
  5. Run at least one perturbation test before trusting the result.
Common Failure Mode

The common mistake in LLM-based household task planner is to trust a component score before checking the closed-loop interface. The failure usually appears where state, timing, authority, or evaluation context crosses a module boundary.

Practical Example

A team using LLM-based household task planner starts by writing the task panel, not by picking the largest model. They keep a baseline run, a maintained-tool run, and a perturbation run in the same result folder. The comparison is accepted only when the action trace, metric, and failure labels come from one script.

Memory Hook

When llm-based household task planner feels abstract, ask what would be different in the next frame of video, the next robot state, or the next safety margin.

Research Frontier

For LLM-based household task planner, the open research question is not whether a larger policy can produce a better demo. The sharper question is whether the method improves reliability across new scenes, new embodiments, delayed feedback, and rare failures under an evaluation protocol that another lab can reproduce.

Self Check

For LLM-based household task planner, can you name the observation, action, protected assumption, success metric, and one likely failure case? If any field is vague, rewrite the contract before adding model complexity.

Topic-Native Deepening

This project asks the student to separate planning language from embodied execution. That separation is the educational value: the capstone should show exactly which part of the task is solved by language reasoning and which part still depends on grounding, affordances, and controller feedback.

The common mistake is grading only a beautiful text plan. This section instead requires plan validity, affordance consistency, and execution outcome under a fixed household task panel.

Why This Section Matters

LLM-based household task planner becomes teachable once the student can state the operative variables, the decision boundary, and the evidence artifact. The section should therefore be read together with Chapter 31 on language and Chapter 33 on tool use and planning, where the same loop is developed from adjacent angles.

Formal Object

Let a language planner produce subgoals $g_{1:K}$ and let an executor return success probabilities $p_k$. The project should track $\Pr(\text{task success})=\prod_{k=1}^{K} p_k$ only after each subgoal has passed a grounding and affordance check, otherwise the multiplication hides impossible steps behind optimistic language.

The multiplicative view is a reminder that one impossible drawer-open action can collapse the whole task. Long plans are therefore fragile unless the project has explicit replanning and affordance validation.

Algorithm: Turn an LLM plan into an executable capstone
  1. Define a small set of household tasks with object and affordance annotations.
  2. Generate a text plan, then ground each step into robot-executable operators.
  3. Reject or repair steps that violate object availability, reachability, or safety constraints.
  4. Execute or simulate the grounded plan and log replans with reasons.
  5. Grade the project on plan validity, execution success, and explanation quality.
Planner Project Evidence
DimensionWhat To SpecifyWhy It Matters
Text planOrdered subgoals from the LLMShows high-level reasoning.
Grounded planRobot operators with object IDs and affordance checksShows whether the plan is executable.
Execution traceSuccesses, failures, and replansReveals how language and embodiment interact.
Failure noteAt least one impossible or unsafe subgoalPrevents cherry-picked polished demos.

The expected output should reveal where the text planner overreached. Invalid subgoals are not embarrassing here, they are the main evidence that grounding checks are doing real work.

Library Shortcut

After the from-scratch contract is clear, the practical route uses OpenAI-style function calling or local LLMs, VoxPoser-style planners, ROS 2 task graphs, scene graphs, Habitat or AI2-THOR. The payoff is that standard interfaces, logging, batching, and replay support move from ad hoc glue code into maintained infrastructure, while the evidence schema stays the same.

Project Or Teaching Use

Students should submit both the raw language plan and the grounded operator list. The mismatch between them is usually where the intellectual value of the capstone lives.

Research Frontier

A strong extension is mixed-initiative planning where the robot asks a short clarification question only when ambiguity or affordance failure is high. That exposes whether language should drive action directly or act as a negotiation layer.

Expected Output Interpretation

For household planning, the artifact should separate language reasoning errors from missing world state, impossible preconditions, tool failures, and execution failures.

Key Takeaway
Exercise 59.8.1

Design a method-matched experiment for LLM-based household task planner. Specify the environment, observation schema, action interface, metric, and one perturbation that targets the section's core assumption.

Section References

Savva, M. et al. Habitat: A Platform for Embodied AI Research. ICCV, 2019.

Use for simulated navigation projects, reproducible scene tasks, and embodied evaluation loops.

Cadene, R. et al. LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch. GitHub project and technical documentation, 2024.

Use for dataset conversion, policy training, and capstone projects built around open robot-learning workflows.

What's Next?

Next, continue with section-59.9. Carry forward the artifact contract from LLM-based household task planner, but change exactly one design axis before comparing results: embodiment, action interface, evaluation panel, or safety risk.