Your First Experiment

A complete walkthrough: from installation to running your first autonomous experiment and interpreting the results.

[!NOTE] CLI: Install Remoroo with Python 3.10+. Execution uses Docker (default) or a Python venv sandbox—your repo can use whatever languages and commands run inside that environment (shell, Node, Rust, etc.). Tutorials here use Python for clarity.

What Remoroo Actually Does

Before we dive in, here's what Remoroo handles out of the box:

Use CaseExample GoalMetrics
ML Training"Train my classifier to 92% accuracy with inference < 50ms"accuracy >= 0.92, inference_ms < 50
Pipeline Optimization"Make our ETL pipeline run in under 2 seconds"runtime_s <= 2.0, correctness == true
Multi-Service Planners"Optimize all three planning services without breaking outputs"planner_a_runtime_s < baseline, planner_b_runtime_s < baseline, ...
Large Codebase Refactoring"Add type hints to all functions in the auth module"mypy --strict passes

These aren't toy problems. Remoroo navigates multi-file repos, handles tradeoffs between competing metrics, and validates results automatically.


Remoroo v2

Current releases default to the v2 agent loop. Older v1 / legacy pipeline modes are unsupported for new work; the CLI may still expose --v1 for exceptional cases.

Primary run artifacts live under <repo>/.remoroo/runs/<run-id>/ (trace, checkpoint, final_report.md, final_patch.diff, metrics files).


Prerequisites

Before you begin:

  • Python 3.10+ (python.org) — for the remoroo CLI
  • Docker (docker.com) — default sandbox, or use --engine venv if you do not use Docker
  • Git — Remoroo works best in version-controlled repos
python --version   # 3.10+
docker --version   # if using default --engine docker
git --version

Step 1: Installation

pip install remoroo

Verify:

remoroo --help

Step 2: Authentication

remoroo login

Opens browser to sign in. Credentials saved to ~/.config/remoroo/credentials.

Verify:

remoroo whoami

Step 3: Your First Experiment

Let's run a real optimization — not a toy example.

Example: Optimize an ML Training Pipeline

Suppose you have a training script that's too slow and accuracy is borderline:

# train.py (your existing code)
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split

class SimpleClassifier(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

def train_model():
    # ... your training loop
    pass

if __name__ == "__main__":
    train_model()

Run Remoroo

remoroo run --local \
  --goal "Optimize the neural network to achieve accuracy >= 0.85, loss <= 0.5, training_time < 30s. Save metrics to artifacts/metrics.json." \
  --metrics "accuracy >= 0.85, loss <= 0.5, training_time < 30"

What Happens

  1. Baseline: Remoroo runs your code as-is and captures current metrics
  2. Analysis: The agent identifies bottlenecks (learning rate, architecture, batch size)
  3. Iteration: It patches train.py, runs again, checks metrics
  4. Validation: All three constraints must pass — not just one
  5. Result: SUCCESS if all metrics met, with a clean patch to apply

Expected Output

╭──────────────── Run Summary ────────────────╮
│ SUCCESS                                     │
│ Run ID: 20260203-143022-ml-training         │
│ Artifacts: .remoroo/runs/20260203-143022-ml-training │
╰─────────────────────────────────────────────╯

📈 Detailed Performance
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ Metric         ┃ Baseline  ┃ Final   ┃ Progress ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ accuracy       │ 0.72      │ 0.87    │ +0.15    │
│ loss           │ 0.81      │ 0.42    │ -0.39    │
│ training_time  │ 45.2      │ 22.1    │ -23.1    │
└────────────────┴───────────┴─────────┴──────────┘

📄 Report: final_report.md
🩹 Clean Patch: final_patch.diff

Example: Large Codebase Pipeline Optimization

For multi-file codebases:

remoroo run --local \
  --repo ./my-etl-pipeline \
  --goal "Optimize the ETL pipeline to run in under 2 seconds while maintaining correctness" \
  --metrics "runtime_s <= 2.0, correctness == true"

The agent will:

  • Navigate your entire codebase
  • Identify slow modules (tokenization, feature building, I/O)
  • Patch multiple files in a single run
  • Verify both runtime AND correctness

Example: Optimize Multiple Planning Services

When you have interdependent services:

remoroo run --local \
  --repo ./planner-suite \
  --goal "Optimize all three planners without changing their outputs" \
  --metrics "planner_a_runtime_s < baseline planner_a_runtime_s, planner_b_runtime_s < baseline planner_b_runtime_s, planner_c_runtime_s < baseline planner_c_runtime_s"

Remoroo automatically:

  • Runs baseline to capture current performance
  • Compares final metrics against baseline
  • Ensures no metric regresses

Understanding Artifacts

Every run creates a directory under your repo:

.remoroo/runs/<run-id>/
├── metrics.json           # Final metric values
├── baseline_metrics.json  # Before changes
├── final_report.md        # What the agent did and why
├── final_patch.diff       # Apply with: git apply ...
├── trace.jsonl            # Step-by-step trace (v2)
├── checkpoint.json        # Resume / inspection (v2)
├── system_diagram.md      # When generated
└── ...                    # Other engine outputs as versions evolve

A cache copy may also appear under ~/.cache/remoroo/runs/<repo-name>/<run-id>/ depending on --out and sync behavior—prefer .remoroo/runs/<run-id>/ in the repo for day-to-day inspection.

Applying the Patch

After a successful run:

cd your-repo
git apply .remoroo/runs/<run-id>/final_patch.diff
git diff  # Review changes

CLI Quick Reference

CommandDescription
remoroo run / remoroo run --localRun locally (default; Docker or --engine venv)
remoroo run --resume RUN_IDAttach to an existing run
remoroo listList runs (--attachable for attach targets)
remoroo attach --id RUN_IDAttach worker to a server run
remoroo abort RUN_IDAbort run on control plane
remoroo run --repo PATHRepository root
remoroo run --goal "..." / --metrics "..."Non-interactive goal/metrics
remoroo run --budget HOURSWall-time budget (default 10h)
remoroo run --yes / --verbose / --no-patchConfirmations, logging, patch prompt
remoroo worker --repo PATHStandalone polling worker (advanced)
remoroo login / whoami / logoutAuth (~/.config/remoroo/credentials)

Full flags: CLI Reference.


Troubleshooting

1. "Docker is not running"

Cannot connect to Docker daemon

Fix: Start Docker Desktop or:

sudo systemctl start docker  # Linux

2. "Authentication required"

Fix: Run remoroo login and complete browser flow.

3. "Metric not met after max turns"

The agent couldn't satisfy your constraints.

Fixes:

  • Check if the metric is actually achievable
  • Simplify goals (optimize one thing at a time first)
  • Review final_report.md to understand what was tried

4. "Patch failed to apply"

Your working directory has conflicts.

Fix:

git stash
git apply .remoroo/runs/<run-id>/final_patch.diff
git stash pop

5. "Timeout exceeded"

Fixes:

  • Ensure verification command runs quickly
  • Check for infinite loops
  • Reduce dataset/input sizes for faster iteration

Tips for Success

  1. Use Baseline-Relative Metrics: runtime_s < baseline runtime_s is more robust than hardcoded thresholds.

  2. Multi-Metric = Real Problems: Don't simplify to single metrics. Real constraints (accuracy AND speed) are what Remoroo handles best.

  3. Version Control: Always run in a git repo. git diff and git checkout . are your safety net.

  4. Check the Report: final_report.md explains the agent's reasoning — essential for understanding trade-offs.

  5. Start with Your Actual Code: Remoroo shines on real codebases, not synthetic examples.


Next Steps

Ready? Run your first experiment:

remoroo run --local