Section 59.11: Open-ended research project

"My research question became real when the baseline started winning."

An Open Project With A Fair Test
Technical illustration for Section 59.11: Open-ended research project.
Figure 59.11A: Open-ended research project scaffold: a flowchart from literature review to hypothesis formulation, experimental design (simulator choice, baseline, ablation plan), results section structure, and evaluation criteria the instructor uses to assess scientific rigor.
Big Picture

Open-ended research project gives Capstone Projects a concrete systems role: turn a research question into a falsifiable capstone with a baseline and a failure taxonomy. The section keeps asking what the agent observes, what it remembers or updates, which action changes, and what evidence would convince a skeptical reader.

This section develops the technical contract for open-ended research project into a usable mental model. First we define the object of study, then we connect it to the agent loop, then we test it with a compact implementation.

The key question in Open-ended research project is practical: what must the agent know, what can it observe, what action is available, and what evidence shows that the action worked under the stated conditions?

Action Is The Test

Open-ended research project should be judged by the action it improves. A section claim is strong when it names the decision, the measurement, and the failure mode before a larger model or simulator is introduced.

Theory

For Open-ended research project, the practical design rule is to make the interface inspectable before optimization begins: inputs, outputs, units, latency, bounds, and failure labels should all be visible in the saved artifact.

Mechanism

The mechanism in Open-ended research project is the contract between representation and action. Name what enters the module, what leaves it, which assumptions make that transformation valid, and which log would reveal a bad handoff.

Worked Example

For Open-ended research project, keep one concrete rollout in view. A sensor reading becomes an estimate, the estimate constrains an action, the action changes the world, and the next observation confirms or contradicts the assumption. The section's idea is useful only if it improves that loop.

Library Shortcut

Use the stack that matches the chosen research question, but require a typed experiment registry before adding models. The preserved fields are hypothesis, embodiment, baseline, intervention, metric, artifact path, and failure taxonomy.

Practical Recipe

  1. Write the observation, action, and success metric before choosing a model.
  2. Build a baseline that is simple enough to debug by inspection.
  3. Add the library implementation only after the baseline behavior is understood.
  4. Record failures as structured cases: perception error, state error, planning error, control error, or evaluation error.
  5. Run at least one perturbation test before trusting the result.
Common Failure Mode

The common mistake in Open-ended research project is to trust a component score before checking the closed-loop interface. The failure usually appears where state, timing, authority, or evaluation context crosses a module boundary.

Practical Example

A team using Open-ended research project starts by writing the task panel, not by picking the largest model. They keep a baseline run, a maintained-tool run, and a perturbation run in the same result folder. The comparison is accepted only when the action trace, metric, and failure labels come from one script.

Memory Hook

When open-ended research project feels abstract, ask what would be different in the next frame of video, the next robot state, or the next safety margin.

Research Frontier

For Open-ended research project, the open research question is not whether a larger policy can produce a better demo. The sharper question is whether the method improves reliability across new scenes, new embodiments, delayed feedback, and rare failures under an evaluation protocol that another lab can reproduce.

Self Check

For Open-ended research project, can you name the observation, action, protected assumption, success metric, and one likely failure case? If any field is vague, rewrite the contract before adding model complexity.

Topic-Native Deepening

The open-ended capstone is where the book stops prescribing topics and starts prescribing research hygiene. The challenge is not choosing the flashiest domain; it is formulating a question with an evidence loop that can survive contact with deadlines, limited compute, and incomplete intuition.

Students often over-scope these projects. This section therefore narrows the problem by asking for one clear thesis, one baseline, one perturbation panel, and one failure narrative that justifies the next iteration.

Why This Section Matters

Open-ended research project becomes teachable once the student can state the operative variables, the decision boundary, and the evidence artifact. The section should therefore be read together with Chapter 58 on open problems and Chapter 52 on evaluation, where the same loop is developed from adjacent angles.

Formal Object

An open-ended project can be summarized by $(Q,B,P,A)$: a question $Q$, baseline $B$, perturbation panel $P$, and artifact set $A$. If any element is missing, the project tends to become either a broad survey or a tool-demo rather than a real embodied experiment.

This tuple is intentionally minimal. It forces the student to say what is being tested, against what, under which stressors, and with which evidence. Everything else, including model choice, is downstream.

Algorithm: Turn an idea into a tractable research capstone
  1. Write the research question in one sentence with a measurable outcome.
  2. Choose the simplest baseline that could disprove the fancy method.
  3. Define a perturbation panel that will expose failure if the idea is weak.
  4. Specify the artifact bundle: code, config, metrics, replay, and one failure note.
  5. Freeze scope before implementation and only reopen it if the evidence requires it.
Open-Ended Project Scoping Gates
DimensionWhat To SpecifyWhy It Matters
QuestionOne hypothesis about perception, planning, control, or adaptationPrevents tool collection from masquerading as research.
BaselineA simple alternative that could winCreates a real decision problem.
Perturbation panelShift, noise, latency, horizon, or embodiment changeTests whether the hypothesis generalizes.
Artifact bundleMetrics, replay, config, and postmortemMakes the work gradeable and publishable.
def validate_charter(payload: dict[str, object]) -> dict[str, object]:
    assert payload, "payload must not be empty"
    return payload

# Open-ended project charter.
charter = {
    "question": "Does retrieval-augmented policy memory reduce long-horizon kitchen failures?",
    "baseline": "same policy without retrieval memory",
    "perturbation_panel": ["delayed observations", "object moved mid-task"],
    "artifact_bundle": ["config", "metrics", "replay", "failure_note"],
}
print(validate_charter(charter))
{'question': 'Does retrieval-augmented policy memory reduce long-horizon kitchen failures?', 'baseline': 'same policy without retrieval memory', 'perturbation_panel': ['delayed observations', 'object moved mid-task'], 'artifact_bundle': ['config', 'metrics', 'replay', 'failure_note']}
Code Fragment 59.11.A summarizes the topic-specific evidence card for open-ended research project.

The expected output should make the project falsifiable. If the charter cannot be disproved by a baseline on a defined panel, it is still an interest area, not yet a research project.

Library Shortcut

After the from-scratch contract is clear, the practical route uses Hydra, Git, Weights & Biases, LeRobot, ROS 2, Habitat, MuJoCo, Jupyter. The payoff is that standard interfaces, logging, batching, and replay support move from ad hoc glue code into maintained infrastructure, while the evidence schema stays the same.

Project Or Teaching Use

This section works well as a proposal defense. A five-minute live review of the charter often reveals missing baselines or missing perturbations before any compute has been wasted.

Research Frontier

The frontier extension is meta-research on embodied evaluation itself. Some of the strongest student projects ask whether our current benchmarks actually predict field performance.

Expected Output Interpretation

For the open-ended project, the artifact should make the research claim falsifiable: hypothesis, controlled comparison, evidence file, and the smallest next experiment are all visible.

Key Takeaway
Exercise 59.11.1

Design a method-matched experiment for Open-ended research project. Specify the environment, observation schema, action interface, metric, and one perturbation that targets the section's core assumption.

Section References

Savva, M. et al. Habitat: A Platform for Embodied AI Research. ICCV, 2019.

Use for simulated navigation projects, reproducible scene tasks, and embodied evaluation loops.

Cadene, R. et al. LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch. GitHub project and technical documentation, 2024.

Use for dataset conversion, policy training, and capstone projects built around open robot-learning workflows.

What's Next?

Next, move to Chapter 60, where the same evidence discipline is applied at the next scale.