Autonomous engineeringfor deep tech teams

Remoroo runs autonomous research on your code locally, overnight. It edits, tests, evaluates, keeps or reverts. You wake up to better results.

Read the docs →

▸ remoroo-session · autoresearch/mar31

❯

▸ Reading program.md… ✓

▸ Baseline: val_bpb = 2.2396 (commit 9138841)

▸ Time budget: 20 min per experiment

▸ 30 experiments · 8 kept · 22 discarded

▸ val_bpb: 2.2396 → 1.5484 (31% lower)

▸ Verdict: VERIFIED · REPRODUCIBLE

The reality of manual ML research

Without Remoroo

$ vim train.py

> tweak learning_rate=3e-4

$ uv run train.py

> wait 60 minutes…

> val_bpb: 2.24 (no change)

> try batch_size 2^15…

> wait 60 more minutes…

> NaN loss.

$ git checkout .

2 hours. 0 progress.

no verdict. no structure.

no proof.

With Remoroo

$ remoroo run --local

program.md

▸ 30 experiments completed

▸ 8 kept · 22 discarded

▸ val_bpb: 2.24 → 1.55

▸ VERIFIED · REPRODUCIBLE

You slept through it.

How it works

Write a spec (e.g. program.md). Point Remoroo at it, and it runs experiments overnight.

▸ remoroo-session · autoresearch

❯

Spec program.md (TIME_BUDGET=1200, metric: val_bpb)

File train.py (model, optimizer, training loop)

Eval prepare.py → evaluate_bpb (fixed, untouchable)

Plan

Edit

Train

Evaluate

val_bpb

—

vs baseline

—

train.py

- ATTN_PATTERN = "L" * DEPTH

+ ATTN_PATTERN = "SSSL"

Illustrative billing · credits = Haiku-hour units (× model tier — see Pricing)

Verified results

LR SCHEDULE SEARCH

val_bpb

2.24 → 1.99

11% lower

train.py

14 experiments · 6 kept

VERIFIED

ARCHITECTURE SEARCH

val_bpb

1.55 → 1.55

banded attn (SSSL)

train.py

30 experiments · 8 kept

VERIFIED

MULTI-OBJECTIVE

val_bpb + memory

3 constraints → all passed

all passed

train.py

22 experiments · 5 kept

VERIFIED

Explore all use cases →

Most tools suggest changes.

Remoroo runs them.

	Coding Agents	Remoroo
Time scale	Seconds	Hours to overnight
Task scope	Fix one bug	30-experiment search
Execution	None / one-shot	Sandboxed, time-budgeted
Metric evaluation	None	Fixed eval harness
Keep / discard	Human decides	Autonomous, metric-based
Failure handling	Retry prompt	Case-based recovery
Output	Suggested code	Verified patch + proof
Reproducibility	None	Artifact replay + git
Billing	Per token/seat	Run wall time in credits (Haiku-hour units)

Try Remoroo on a real task.

Install in 30 seconds.

Free tier includes monthly run credits — see Pricing.

Read the docs →