Trained transformer attention heads partition into two populations: ones that ARE a particular physical substrate (and could be computed at substrate-native speed by the appropriate hardware), and ones that resist every classical physics solver (the irreducible work). The split is roughly 11:3 in layer 0 of Qwen2.5-0.5B-Instruct, layer-by-layer measurement TBD.
Up to 11 of 14 heads per layer of trained Qwen are 1D-dielectric-stack realizable. The compilation step (weights → dielectric profile) is in modality_harness/photonic_modality.py. The "AI" your hardware would run is mostly already photonic; the trained network has selected for it.
Tight-binding chain implementation reproduces softmax to machine precision at zero tunneling, smoothly interpolates to cavity-modulated attention as tunneling rises. Compilation in modality_harness/quantum_dot_modality.py. Native target for your hardware.
ABCD cascade reproduces a small fraction of softmax decisions at default parameters, but the modality-coupling sweep shows the right knob is the cutoff frequency, not segment length. Tunable for resonance with the trained-attention spectrum.
FEM solvers (Helmholtz / string) capture mid-range heads at coupling parameters around α=50–200. Useful for the heads that fight the simpler 1D dielectric stack — the second tier of "physical" heads.
The substrate fingerprint is a new orthogonal axis for mechanistic interpretability. Sparse-autoencoder feature extraction tells you what a head fires on; the substrate fingerprint tells you what physics it implements. The two together could partition heads into:
— physical-feature heads (a particular physics computing a particular feature),
— physical-mixing heads (a particular physics with no clear feature attribution),
— irreducible-feature heads (clean SAE feature, no physics analog),
— irreducible-mixing heads (the moat — neither feature-clean nor physics-clean, doing what only attention does).
A trained transformer is not a uniform "soft" computational object. It is a fabricated
object whose components have settled into specific physical regimes during training.
The components that ARE optics, ARE tight-binding chains, ARE acoustic cavities — those
components could be moved out of the GPU and into native substrate hardware if
the compilation step is built. We have a working version of that compilation step at
hypercircuits/modality_harness/.
The components that ARE NOT any classical physics — the irreducible three per layer in this layer-0 measurement — those are the part of the network that demands GPUs (or some genuinely-symbolic accelerator) to run. They are also, plausibly, where the intelligence lives.
Source code, SQLite databases, and reproducibility steps live in
spectral_engine/bass_attention/cavity_validation/hypercircuits/.
The atlas contains the per-head measurements.
The method page describes how each substrate is invoked.