Autonomous engineeringfor deep tech teams
Remoroo runs autonomous research on your code locally, overnight. It edits, tests, evaluates, keeps or reverts. You wake up to better results.
▸ remoroo-session · autoresearch/mar31
❯
▸ Reading program.md… ✓
▸ Baseline: val_bpb = 2.2396 (commit 9138841)
▸ Time budget: 20 min per experiment
▸ 30 experiments · 8 kept · 22 discarded
▸ val_bpb: 2.2396 → 1.5484 (31% lower)
▸ Verdict: VERIFIED · REPRODUCIBLE
The reality of manual ML research
Without Remoroo
$ vim train.py
> tweak learning_rate=3e-4
$ uv run train.py
> wait 60 minutes…
> val_bpb: 2.24 (no change)
> try batch_size 2^15…
> wait 60 more minutes…
> NaN loss.
$ git checkout .
2 hours. 0 progress.
no verdict. no structure.
no proof.
With Remoroo
$ remoroo run --local
program.md
▸ 30 experiments completed
▸ 8 kept · 22 discarded
▸ val_bpb: 2.24 → 1.55
▸ VERIFIED · REPRODUCIBLE
You slept through it.
How it works
Write a program.md. Remoroo runs experiments overnight.
▸ remoroo-session · autoresearch
❯
Spec program.md (TIME_BUDGET=1200, metric: val_bpb)
File train.py (model, optimizer, training loop)
Eval prepare.py → evaluate_bpb (fixed, untouchable)
P
Plan
E
Edit
T
Train
E
Evaluate
val_bpb
—
vs baseline
—
train.py
- ATTN_PATTERN = "L" * DEPTH
+ ATTN_PATTERN = "SSSL"
Billed: 1 credit (8h experiment)
Verified results
LR SCHEDULE SEARCH
val_bpb
2.24 → 1.99
11% lower
train.py
14 experiments · 6 kept
VERIFIED
ARCHITECTURE SEARCH
val_bpb
1.55 → 1.55
banded attn (SSSL)
train.py
30 experiments · 8 kept
VERIFIED
MULTI-OBJECTIVE
val_bpb + memory
3 constraints → all passed
all passed
train.py
22 experiments · 5 kept
VERIFIED
Not a coding agent.
An autonomous research engine.
| Coding Agents | Remoroo | |
|---|---|---|
| Time scale | Seconds | Hours to overnight |
| Task scope | Fix one bug | 30-experiment search |
| Execution | None / one-shot | Sandboxed, time-budgeted |
| Metric evaluation | None | Fixed eval harness |
| Keep / discard | Human decides | Autonomous, metric-based |
| Failure handling | Retry prompt | Case-based recovery |
| Output | Suggested code | Verified patch + proof |
| Reproducibility | None | Artifact replay + git |
| Billing | Per token/seat | Per 10h experiment |