Zero retraining. Eighteen seconds per chunk on a Mac Studio CPU. The optimum we measure here independently re-discovers a kernel parameter band Hypernym had previously identified through unrelated internal research — different model, different context length, same kernel.
Sweep the kernel parameter across its full operating range on Qwen2.5-0.5B-Instruct, measure perplexity on three held-out text chunks. WikiText benefits from gating up to a measurable peak, then falls off a cliff. Code and PRISMA narrative damage monotonically — different content wants different positions on the curve.
Different content types have different optimal points on the kernel curve. Natural English wants gating; structured code and tightly-bound narrative want softmax. The trained Qwen2.5-0.5B weights tolerate substantial deformation on WikiText because the natural distractor density gives the gate something to skip.
Hypernym had previously identified a kernel parameter band through earlier internal research on a larger model at long context against WikiText-2 perplexity. The benchmark we just ran is a fresh measurement on a smaller model (Qwen 2.5-0.5B), at a shorter context length (256 tokens), against the same content.
Insert a 5-digit passkey at varying depths (256 / 512 / 1024 tokens) and positions (25% / 50% / 75% of context), ask the model to recall it, score retrieval accuracy. Three samples per cell, five substrate variants.
The PPL benchmark and the LITM benchmark are independent metrics. Both measure something different — perplexity is a distribution-shape test; passkey retrieval is a binding test. Both find the same edge:
WikiText delta from softmax: −1.07% at the peak → +9.9% past the threshold. Code: +3.2% → +31.3%. PRISMA: +2.4% → +30.1%. Past the threshold, the gate begins pruning the head of the distribution rather than the tail; gating becomes damage.
The cliff-regime kernel retrieves perfectly at depth 512 and 1024 but only 66.7% at depth 256 with mid-context needle. The aggressive gate prunes the position holding the passkey when the context is short and the needle is in the middle — exactly the LITM phenomenon.
Two independent metrics, same kernel parameter. The kernel polynomial crosses zero at a specific structural point. Past that point, attention loses both shape (PPL) and binding (LITM) coherence. The geometric structure determines the operational ceiling, not engineering choice.
Photonic γ=0.5 — the 1D dielectric stack we benchmarked — preserves top-1 in greedy generation of simple prompts ("Paris" comes out correctly). But on perplexity it's catastrophic (+778% to +2119% above softmax). On passkey retrieval it's worse still — 7.4% accuracy across 27 trials, predicting " the" almost everywhere.
All substrates × all datasets × 4 chunks each (mean PPL). Sanity column shows softmax = QD τ=0 to floating-point noise.
WikiText
Python
PRISMA
24.07 PPL
13.36 PPL
46.88 PPL
0.000%
0.000%
0.000%
−0.143%
+0.076%
+0.051%
−0.529%
+0.379%
+0.257%
−0.931%
+0.998%
+0.684%
−1.073% ★
+3.198%
+2.408%
+9.897%
+31.31%
+30.07%
+1052%
+2119%
+778%
The benchmark databases are SQLite files in
spectral_engine/bass_attention/cavity_validation/hypercircuits/.
Every row is queryable.
substrate_ppl.db — 96 rows in substrate_ppl, 24,576 rows of per-position log-probssubstrate_litm.db — 135 rows of passkey retrieval trials with predicted top-1 + log P(correct)per_token_full_model.db — 600K rows of per-token attention dynamics across all 24 layers × 14 headsmeep_validation_full.db — 1,792 cells of MIT MEEP FDTD vs in-venv photonic adapter vs softmaxcoupling_sweep.db — modality-coupling-parameter sweep across 5 substratescavity_transformer.py.
Every benchmark row is in SQLite, queryable in milliseconds, repeatable on any machine
with the same setup.