T³ v3.5 → Gemma 3 270M — Transfer Experiment [HALTED]

Halted 2026-04-25 — training data corrupted at load (uint16 read of uint32 shards). Architecture not implicated. Run is void; see prereg banner for full post-mortem.
Hypothesis: T³ transfer crosses base Gemma reasoning composite before 75% of training, without regressing on knowledge or multilingual coverage.
Pre-registration: github.com/GMaN1911/t3-gemma-transfer · frozen 2026-04-22 10:44 MDT · training started 2026-04-22 11:45 MDT
SHA-256 (frozen): 6d0412536aa747f8e2c7a0df4843a8879bba0af3a93884619f09f3116d8c6968
SHA-256 (current): b40fe12813d57296e71c54cbf026f6bef2a2808414c14b459507d2d8a5de2632 Hashes trail pre-comparison deviations, all committed to Git BEFORE any trajectory benchmark number is published: c3c5129 fixed a token-budget arithmetic error (50K→150K steps); 30c3b18 reduced lm-eval batch 8→2 to fit V100 16GB; 892daaf three pre-trajectory-eval corrections — training blockade_warmup_steps 2000→200 (silent arg-binding), eval wrapper now matches training ecology-warmup schedule per-checkpoint, eval wrapper fixes BOS-in-continuation bug (SentencePiece tokenizers only; GPT-2 history unaffected). All deviations applied uniformly to baseline and every trajectory checkpoint (apples-to-apples preserved). Success criteria and failure signals unchanged.
Pre-registered failure signals (published here if observed): (1) all 8 reasoning benchmarks track val PPL monotonically; (2) no sigma differentiation inflection by 50% training; (3) reasoning and knowledge benchmarks move together.
Step / 150,000
Progress
Training PPL
LR
Last update

Training loss · Ultimate Mix+

Training PPL on the Ultimate Mix+ tokenized corpus. This is NOT the held-out benchmark eval — those land on the right when trajectory checkpoints complete.

Benchmark trajectory vs frozen baseline

Base Gemma 3 270M evaluated ONCE on a locked lm-eval-harness config — that's the horizontal reference. T³ checkpoints land here at 25 / 37.5 / 50 / 62.5 / 75 / 87.5 / 100% of training.
MetricBase Gemma 3 270M
Baseline evaluation pending…

Synthesis feed

Tier-2 synthesis digests (~15-30 min cadence) summarizing training dynamics and trajectory signals. Generated automatically; no ecology internals exposed.
Loading synthesis feed…