6d0412536aa747f8e2c7a0df4843a8879bba0af3a93884619f09f3116d8c6968b40fe12813d57296e71c54cbf026f6bef2a2808414c14b459507d2d8a5de2632
Hashes trail pre-comparison deviations, all committed to Git BEFORE any trajectory benchmark number is published:
c3c5129 fixed a token-budget arithmetic error (50K→150K steps);
30c3b18 reduced lm-eval batch 8→2 to fit V100 16GB;
892daaf three pre-trajectory-eval corrections — training blockade_warmup_steps 2000→200 (silent arg-binding), eval wrapper now matches training ecology-warmup schedule per-checkpoint, eval wrapper fixes BOS-in-continuation bug (SentencePiece tokenizers only; GPT-2 history unaffected).
All deviations applied uniformly to baseline and every trajectory checkpoint (apples-to-apples preserved). Success criteria and failure signals unchanged.
| Metric | Base Gemma 3 270M |
|---|---|
| Baseline evaluation pending… | |