Your First Experiment

A complete walkthrough: from installation to running your first autonomous experiment and interpreting the results.

[!NOTE] CLI: Install Remoroo with Python 3.10+. Execution uses Docker (default) or a Python venv sandbox—your repo can use whatever languages and commands run inside that environment (shell, Node, Rust, etc.). Tutorials here use Python for clarity.

What Remoroo Actually Does

Before we dive in, here's what Remoroo handles out of the box:

Use Case	Example Goal	Metrics
ML Training	"Train my classifier to 92% accuracy with inference < 50ms"	`accuracy >= 0.92, inference_ms < 50`
Pipeline Optimization	"Make our ETL pipeline run in under 2 seconds"	`runtime_s <= 2.0, correctness == true`
Multi-Service Planners	"Optimize all three planning services without breaking outputs"	`planner_a_runtime_s < baseline, planner_b_runtime_s < baseline, ...`
Large Codebase Refactoring	"Add type hints to all functions in the auth module"	`mypy --strict` passes

These aren't toy problems. Remoroo navigates multi-file repos, handles tradeoffs between competing metrics, and validates results automatically.

Remoroo v2

Current releases default to the v2 agent loop. Older v1 / legacy pipeline modes are unsupported for new work; the CLI may still expose --v1 for exceptional cases.

Primary run artifacts live under <repo>/.remoroo/runs/<run-id>/ (trace, checkpoint, final_report.md, final_patch.diff, metrics files).

Prerequisites

Before you begin:

Python 3.10+ (python.org) — for the remoroo CLI
Docker (docker.com) — default sandbox, or use --engine venv if you do not use Docker
Git — Remoroo works best in version-controlled repos

python --version   # 3.10+
docker --version   # if using default --engine docker
git --version

Step 1: Installation

pip install remoroo

Verify:

remoroo --help

Step 2: Authentication

remoroo login

Opens browser to sign in. Credentials saved to ~/.config/remoroo/credentials.

Verify:

remoroo whoami

Step 3: Your First Experiment

Let's run a real optimization — not a toy example.

Example: Optimize an ML Training Pipeline

Suppose you have a training script that's too slow and accuracy is borderline:

# train.py (your existing code)
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split

class SimpleClassifier(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

def train_model():
    # ... your training loop
    pass

if __name__ == "__main__":
    train_model()

Run Remoroo

remoroo run --local \
  --goal "Optimize the neural network to achieve accuracy >= 0.85, loss <= 0.5, training_time < 30s. Save metrics to artifacts/metrics.json." \
  --metrics "accuracy >= 0.85, loss <= 0.5, training_time < 30"

What Happens

Baseline: Remoroo runs your code as-is and captures current metrics
Analysis: The agent identifies bottlenecks (learning rate, architecture, batch size)
Iteration: It patches train.py, runs again, checks metrics
Validation: All three constraints must pass — not just one
Result: SUCCESS if all metrics met, with a clean patch to apply

Expected Output

╭──────────────── Run Summary ────────────────╮
│ SUCCESS                                     │
│ Run ID: 20260203-143022-ml-training         │
│ Artifacts: .remoroo/runs/20260203-143022-ml-training │
╰─────────────────────────────────────────────╯

📈 Detailed Performance
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ Metric         ┃ Baseline  ┃ Final   ┃ Progress ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ accuracy       │ 0.72      │ 0.87    │ +0.15    │
│ loss           │ 0.81      │ 0.42    │ -0.39    │
│ training_time  │ 45.2      │ 22.1    │ -23.1    │
└────────────────┴───────────┴─────────┴──────────┘

📄 Report: final_report.md
🩹 Clean Patch: final_patch.diff

Example: Large Codebase Pipeline Optimization

For multi-file codebases:

remoroo run --local \
  --repo ./my-etl-pipeline \
  --goal "Optimize the ETL pipeline to run in under 2 seconds while maintaining correctness" \
  --metrics "runtime_s <= 2.0, correctness == true"

The agent will:

Navigate your entire codebase
Identify slow modules (tokenization, feature building, I/O)
Patch multiple files in a single run
Verify both runtime AND correctness

Example: Optimize Multiple Planning Services

When you have interdependent services:

remoroo run --local \
  --repo ./planner-suite \
  --goal "Optimize all three planners without changing their outputs" \
  --metrics "planner_a_runtime_s < baseline planner_a_runtime_s, planner_b_runtime_s < baseline planner_b_runtime_s, planner_c_runtime_s < baseline planner_c_runtime_s"

Remoroo automatically:

Runs baseline to capture current performance
Compares final metrics against baseline
Ensures no metric regresses

Understanding Artifacts

Every run creates a directory under your repo:

.remoroo/runs/<run-id>/
├── metrics.json           # Final metric values
├── baseline_metrics.json  # Before changes
├── final_report.md        # What the agent did and why
├── final_patch.diff       # Apply with: git apply ...
├── trace.jsonl            # Step-by-step trace (v2)
├── checkpoint.json        # Resume / inspection (v2)
├── system_diagram.md      # When generated
└── ...                    # Other engine outputs as versions evolve

A cache copy may also appear under ~/.cache/remoroo/runs/<repo-name>/<run-id>/ depending on --out and sync behavior—prefer .remoroo/runs/<run-id>/ in the repo for day-to-day inspection.

Applying the Patch

After a successful run:

cd your-repo
git apply .remoroo/runs/<run-id>/final_patch.diff
git diff  # Review changes

CLI Quick Reference

Command	Description
`remoroo run` / `remoroo run --local`	Run locally (default; Docker or `--engine venv`)
`remoroo run --resume RUN_ID`	Attach to an existing run
`remoroo list`	List runs (`--attachable` for attach targets)
`remoroo attach --id RUN_ID`	Attach worker to a server run
`remoroo abort RUN_ID`	Abort run on control plane
`remoroo run --repo PATH`	Repository root
`remoroo run --goal "..."` / `--metrics "..."`	Non-interactive goal/metrics
`remoroo run --budget HOURS`	Wall-time budget (default 10h)
`remoroo run --yes` / `--verbose` / `--no-patch`	Confirmations, logging, patch prompt
`remoroo worker --repo PATH`	Standalone polling worker (advanced)
`remoroo login` / `whoami` / `logout`	Auth (`~/.config/remoroo/credentials`)

Full flags: CLI Reference.

Troubleshooting

1. "Docker is not running"

Cannot connect to Docker daemon

Fix: Start Docker Desktop or:

sudo systemctl start docker  # Linux

2. "Authentication required"

Fix: Run remoroo login and complete browser flow.

3. "Metric not met after max turns"

The agent couldn't satisfy your constraints.

Fixes:

Check if the metric is actually achievable
Simplify goals (optimize one thing at a time first)
Review final_report.md to understand what was tried

4. "Patch failed to apply"

Your working directory has conflicts.

Fix:

git stash
git apply .remoroo/runs/<run-id>/final_patch.diff
git stash pop

5. "Timeout exceeded"

Fixes:

Ensure verification command runs quickly
Check for infinite loops
Reduce dataset/input sizes for faster iteration

Tips for Success

Use Baseline-Relative Metrics: runtime_s < baseline runtime_s is more robust than hardcoded thresholds.
Multi-Metric = Real Problems: Don't simplify to single metrics. Real constraints (accuracy AND speed) are what Remoroo handles best.
Version Control: Always run in a git repo. git diff and git checkout . are your safety net.
Check the Report: final_report.md explains the agent's reasoning — essential for understanding trade-offs.
Start with Your Actual Code: Remoroo shines on real codebases, not synthetic examples.

Next Steps

Why Remoroo? — Use cases and philosophy
CLI Reference — Full command documentation

Ready? Run your first experiment:

remoroo run --local