Your First Experiment
A complete walkthrough: from installation to running your first autonomous experiment and interpreting the results.
[!NOTE] CLI: Install Remoroo with Python 3.10+. Execution uses Docker (default) or a Python venv sandbox—your repo can use whatever languages and commands run inside that environment (shell, Node, Rust, etc.). Tutorials here use Python for clarity.
What Remoroo Actually Does
Before we dive in, here's what Remoroo handles out of the box:
| Use Case | Example Goal | Metrics |
|---|---|---|
| ML Training | "Train my classifier to 92% accuracy with inference < 50ms" | accuracy >= 0.92, inference_ms < 50 |
| Pipeline Optimization | "Make our ETL pipeline run in under 2 seconds" | runtime_s <= 2.0, correctness == true |
| Multi-Service Planners | "Optimize all three planning services without breaking outputs" | planner_a_runtime_s < baseline, planner_b_runtime_s < baseline, ... |
| Large Codebase Refactoring | "Add type hints to all functions in the auth module" | mypy --strict passes |
These aren't toy problems. Remoroo navigates multi-file repos, handles tradeoffs between competing metrics, and validates results automatically.
Remoroo v2
Current releases default to the v2 agent loop. Older v1 / legacy pipeline modes are unsupported for new work; the CLI may still expose --v1 for exceptional cases.
Primary run artifacts live under <repo>/.remoroo/runs/<run-id>/ (trace, checkpoint, final_report.md, final_patch.diff, metrics files).
Prerequisites
Before you begin:
- Python 3.10+ (python.org) — for the
remorooCLI - Docker (docker.com) — default sandbox, or use
--engine venvif you do not use Docker - Git — Remoroo works best in version-controlled repos
python --version # 3.10+
docker --version # if using default --engine docker
git --version
Step 1: Installation
pip install remoroo
Verify:
remoroo --help
Step 2: Authentication
remoroo login
Opens browser to sign in. Credentials saved to ~/.config/remoroo/credentials.
Verify:
remoroo whoami
Step 3: Your First Experiment
Let's run a real optimization — not a toy example.
Example: Optimize an ML Training Pipeline
Suppose you have a training script that's too slow and accuracy is borderline:
# train.py (your existing code)
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split
class SimpleClassifier(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.fc1 = nn.Linear(input_dim, hidden_dim)
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = torch.relu(self.fc1(x))
return self.fc2(x)
def train_model():
# ... your training loop
pass
if __name__ == "__main__":
train_model()
Run Remoroo
remoroo run --local \
--goal "Optimize the neural network to achieve accuracy >= 0.85, loss <= 0.5, training_time < 30s. Save metrics to artifacts/metrics.json." \
--metrics "accuracy >= 0.85, loss <= 0.5, training_time < 30"
What Happens
- Baseline: Remoroo runs your code as-is and captures current metrics
- Analysis: The agent identifies bottlenecks (learning rate, architecture, batch size)
- Iteration: It patches
train.py, runs again, checks metrics - Validation: All three constraints must pass — not just one
- Result: SUCCESS if all metrics met, with a clean patch to apply
Expected Output
╭──────────────── Run Summary ────────────────╮
│ SUCCESS │
│ Run ID: 20260203-143022-ml-training │
│ Artifacts: .remoroo/runs/20260203-143022-ml-training │
╰─────────────────────────────────────────────╯
📈 Detailed Performance
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ Metric ┃ Baseline ┃ Final ┃ Progress ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ accuracy │ 0.72 │ 0.87 │ +0.15 │
│ loss │ 0.81 │ 0.42 │ -0.39 │
│ training_time │ 45.2 │ 22.1 │ -23.1 │
└────────────────┴───────────┴─────────┴──────────┘
📄 Report: final_report.md
🩹 Clean Patch: final_patch.diff
Example: Large Codebase Pipeline Optimization
For multi-file codebases:
remoroo run --local \
--repo ./my-etl-pipeline \
--goal "Optimize the ETL pipeline to run in under 2 seconds while maintaining correctness" \
--metrics "runtime_s <= 2.0, correctness == true"
The agent will:
- Navigate your entire codebase
- Identify slow modules (tokenization, feature building, I/O)
- Patch multiple files in a single run
- Verify both runtime AND correctness
Example: Optimize Multiple Planning Services
When you have interdependent services:
remoroo run --local \
--repo ./planner-suite \
--goal "Optimize all three planners without changing their outputs" \
--metrics "planner_a_runtime_s < baseline planner_a_runtime_s, planner_b_runtime_s < baseline planner_b_runtime_s, planner_c_runtime_s < baseline planner_c_runtime_s"
Remoroo automatically:
- Runs baseline to capture current performance
- Compares final metrics against baseline
- Ensures no metric regresses
Understanding Artifacts
Every run creates a directory under your repo:
.remoroo/runs/<run-id>/
├── metrics.json # Final metric values
├── baseline_metrics.json # Before changes
├── final_report.md # What the agent did and why
├── final_patch.diff # Apply with: git apply ...
├── trace.jsonl # Step-by-step trace (v2)
├── checkpoint.json # Resume / inspection (v2)
├── system_diagram.md # When generated
└── ... # Other engine outputs as versions evolve
A cache copy may also appear under ~/.cache/remoroo/runs/<repo-name>/<run-id>/ depending on --out and sync behavior—prefer .remoroo/runs/<run-id>/ in the repo for day-to-day inspection.
Applying the Patch
After a successful run:
cd your-repo
git apply .remoroo/runs/<run-id>/final_patch.diff
git diff # Review changes
CLI Quick Reference
| Command | Description |
|---|---|
remoroo run / remoroo run --local | Run locally (default; Docker or --engine venv) |
remoroo run --resume RUN_ID | Attach to an existing run |
remoroo list | List runs (--attachable for attach targets) |
remoroo attach --id RUN_ID | Attach worker to a server run |
remoroo abort RUN_ID | Abort run on control plane |
remoroo run --repo PATH | Repository root |
remoroo run --goal "..." / --metrics "..." | Non-interactive goal/metrics |
remoroo run --budget HOURS | Wall-time budget (default 10h) |
remoroo run --yes / --verbose / --no-patch | Confirmations, logging, patch prompt |
remoroo worker --repo PATH | Standalone polling worker (advanced) |
remoroo login / whoami / logout | Auth (~/.config/remoroo/credentials) |
Full flags: CLI Reference.
Troubleshooting
1. "Docker is not running"
Cannot connect to Docker daemon
Fix: Start Docker Desktop or:
sudo systemctl start docker # Linux
2. "Authentication required"
Fix: Run remoroo login and complete browser flow.
3. "Metric not met after max turns"
The agent couldn't satisfy your constraints.
Fixes:
- Check if the metric is actually achievable
- Simplify goals (optimize one thing at a time first)
- Review
final_report.mdto understand what was tried
4. "Patch failed to apply"
Your working directory has conflicts.
Fix:
git stash
git apply .remoroo/runs/<run-id>/final_patch.diff
git stash pop
5. "Timeout exceeded"
Fixes:
- Ensure verification command runs quickly
- Check for infinite loops
- Reduce dataset/input sizes for faster iteration
Tips for Success
-
Use Baseline-Relative Metrics:
runtime_s < baseline runtime_sis more robust than hardcoded thresholds. -
Multi-Metric = Real Problems: Don't simplify to single metrics. Real constraints (accuracy AND speed) are what Remoroo handles best.
-
Version Control: Always run in a git repo.
git diffandgit checkout .are your safety net. -
Check the Report:
final_report.mdexplains the agent's reasoning — essential for understanding trade-offs. -
Start with Your Actual Code: Remoroo shines on real codebases, not synthetic examples.
Next Steps
- Why Remoroo? — Use cases and philosophy
- CLI Reference — Full command documentation
Ready? Run your first experiment:
remoroo run --local