Autonomous engineeringfor deep tech teams

Remoroo runs autonomous research on your code locally, overnight. It edits, tests, evaluates, keeps or reverts. You wake up to better results.

Read the docs →
▸ remoroo-session · autoresearch/mar31
Reading program.md…
Baseline: val_bpb = 2.2396 (commit 9138841)
Time budget: 20 min per experiment
30 experiments · 8 kept · 22 discarded
val_bpb: 2.23961.5484 (31% lower)
Verdict: VERIFIED · REPRODUCIBLE

The reality of manual ML research

Without Remoroo
$ vim train.py
> tweak learning_rate=3e-4
$ uv run train.py
> wait 60 minutes…
> val_bpb: 2.24 (no change)
> try batch_size 2^15…
> wait 60 more minutes…
> NaN loss.
$ git checkout .
2 hours. 0 progress.
no verdict. no structure.
no proof.
With Remoroo
$ remoroo run --local
program.md
▸ 30 experiments completed
▸ 8 kept · 22 discarded
▸ val_bpb: 2.24 → 1.55
▸ VERIFIED · REPRODUCIBLE
You slept through it.

How it works

Write a program.md. Remoroo runs experiments overnight.

▸ remoroo-session · autoresearch
Spec program.md (TIME_BUDGET=1200, metric: val_bpb)
File train.py (model, optimizer, training loop)
Eval prepare.py → evaluate_bpb (fixed, untouchable)
P
Plan
E
Edit
T
Train
E
Evaluate
val_bpb
vs baseline
train.py
- ATTN_PATTERN = "L" * DEPTH
+ ATTN_PATTERN = "SSSL"
Billed: 1 credit (8h experiment)

Verified results

LR SCHEDULE SEARCH
val_bpb
2.241.99
11% lower
train.py
14 experiments · 6 kept
VERIFIED
ARCHITECTURE SEARCH
val_bpb
1.551.55
banded attn (SSSL)
train.py
30 experiments · 8 kept
VERIFIED
MULTI-OBJECTIVE
val_bpb + memory
3 constraintsall passed
all passed
train.py
22 experiments · 5 kept
VERIFIED

Not a coding agent.

An autonomous research engine.

Coding AgentsRemoroo
Time scaleSecondsHours to overnight
Task scopeFix one bug30-experiment search
ExecutionNone / one-shotSandboxed, time-budgeted
Metric evaluationNoneFixed eval harness
Keep / discardHuman decidesAutonomous, metric-based
Failure handlingRetry promptCase-based recovery
OutputSuggested codeVerified patch + proof
ReproducibilityNoneArtifact replay + git
BillingPer token/seatPer 10h experiment

It didn't guess. It proved.

Install in 30 seconds.

Your first successful run is free.