What it means

Trained transformer attention heads partition into two populations: ones that ARE a particular physical substrate (and could be computed at substrate-native speed by the appropriate hardware), and ones that resist every classical physics solver (the irreducible work). The split is roughly 11:3 in layer 0 of Qwen2.5-0.5B-Instruct, layer-by-layer measurement TBD.

PHYSICAL — 11 heads IRREDUCIBLE — 3 heads computable in dielectric / FEM / quantum-dot hardware no classical-physics solver matches them h1 100% h5 88% h7 66% h13 66% h6 70% h4 66% h0 59% h9 57% h11 53% h8 48% h12 44% ? h2 39% ? h3 38% ? h10 36% no thin dielectric stack reproduces them no acoustic cavity reproduces them no transmission line reproduces them no quantum-dot chain reproduces them these are the moat
Layer 0 of Qwen2.5-0.5B-Instruct, 14 attention heads, sorted by photonic agreement with softmax. Eleven heads have photonic affinity (right-most 1D dielectric stack captures their attention to ≥40% match). Three heads (10, 3, 2) are irreducibly non-photonic and resist every classical-physics solver tested.

For compute hardware

Photonic

Lightmatter / Lightelligence / Salience

Up to 11 of 14 heads per layer of trained Qwen are 1D-dielectric-stack realizable. The compilation step (weights → dielectric profile) is in modality_harness/photonic_modality.py. The "AI" your hardware would run is mostly already photonic; the trained network has selected for it.

11/14
heads/layer photonic-affine

Quantum dot / LC

Atom / Rigetti / D-Wave / IonQ / PsiQuantum

Tight-binding chain implementation reproduces softmax to machine precision at zero tunneling, smoothly interpolates to cavity-modulated attention as tunneling rises. Compilation in modality_harness/quantum_dot_modality.py. Native target for your hardware.

τ ↔ θ
measured Wick map

Microwave / RF

superconducting cavities, RF arrays

ABCD cascade reproduces a small fraction of softmax decisions at default parameters, but the modality-coupling sweep shows the right knob is the cutoff frequency, not segment length. Tunable for resonance with the trained-attention spectrum.

tunable
cutoff is the lever

MEMS / acoustic / mechanical

resonator arrays, photoacoustic, mechanical compute

FEM solvers (Helmholtz / string) capture mid-range heads at coupling parameters around α=50–200. Useful for the heads that fight the simpler 1D dielectric stack — the second tier of "physical" heads.

2nd tier
non-photonic physical

For interpretability research

The substrate fingerprint is a new orthogonal axis for mechanistic interpretability. Sparse-autoencoder feature extraction tells you what a head fires on; the substrate fingerprint tells you what physics it implements. The two together could partition heads into:

physical-feature heads (a particular physics computing a particular feature),
physical-mixing heads (a particular physics with no clear feature attribution),
irreducible-feature heads (clean SAE feature, no physics analog),
irreducible-mixing heads (the moat — neither feature-clean nor physics-clean, doing what only attention does).

For the field

A trained transformer is not a uniform "soft" computational object. It is a fabricated object whose components have settled into specific physical regimes during training. The components that ARE optics, ARE tight-binding chains, ARE acoustic cavities — those components could be moved out of the GPU and into native substrate hardware if the compilation step is built. We have a working version of that compilation step at hypercircuits/modality_harness/.

The components that ARE NOT any classical physics — the irreducible three per layer in this layer-0 measurement — those are the part of the network that demands GPUs (or some genuinely-symbolic accelerator) to run. They are also, plausibly, where the intelligence lives.

The Leep — what comes next
Each substrate's solver has a closed-form transfer-matrix replacement: a 2×2 site-by-site recursion that reproduces the simulator's output to 0.001–0.36% accuracy at 10⁵–10⁹× the speed, validated across 18 physics domains (forge/SCORECARD.md). The simulator is the slow proof; the Leep is what runs in production. The simulator is on disk now. The Leep is the next build.

Source code, SQLite databases, and reproducibility steps live in spectral_engine/bass_attention/cavity_validation/hypercircuits/. The atlas contains the per-head measurements. The method page describes how each substrate is invoked.