Fourteen attention heads. Each ran through real MIT MEEP FDTD plus the in-venv 1D transfer-matrix photonic adapter at 128 query positions in a real residual stream. For each head, the bar shows the fraction of cells where MEEP's stored-energy peak coincided with softmax's argmax. The fingerprint card on the right encodes which physical substrates capture the head most cleanly.
Below: each head's two-line summary. Type label classifies by photonic affinity. Numeric rank is by MEEP-equals-softmax percentage at layer 0, 128 positions.
Bitwise photonic. Both MEEP and the 1D transfer matrix agree with softmax on every one of the 128 query positions tested. The trained W_Q · W_Kᵀ coupling matrix happens to be exactly what light does in a thinly-layered dielectric stack.
Sharp photonic insphere. TM agrees with softmax in 97.7% of cells, MEEP in 87.5%. The transfer-matrix approximation captures the head almost perfectly; full FDTD shows it's still a 1D dielectric phenomenon with minor deviation from the ideal.
Sharp insphere. TM at 93%, MEEP at 65.6% — full FDTD diverges more than the simpler model, suggesting Head 7 is an idealized 1D photonic geometry that the full MEEP cell's PML and finite resolution don't perfectly capture.
Mid-sharp. TM 75.8%, MEEP 66.4%. Both reach majority agreement — this head's coupling is photonic but with non-trivial residue that neither implementation fully captures.
MEEP 70.3%, TM 56.3%. Real FDTD captures it better than the simple TM — suggests this head's geometry needs the PML/full-cell physics that MEEP provides but the analytic 1D approximation drops.
MEEP 66.4%, TM 67.2%. Both implementations agree with each other (54.7%) and with softmax at similar rates — a stable mid-range head where the photonic match is partial but consistent.
MEEP 59.4%, TM 36.7%. Diffuse-mid. The TM particularly fails — its frequency-domain assumption misses something MEEP's time-domain captures partially. Residual stream's first head is broader than the 1D approximation can describe.
MEEP 57%, TM 37.5%. The 0.76 mean rel-error vs softmax is the largest of the diffuse heads — Head 9 is doing something the 1D dielectric stack can correlate with but not reproduce.
TM 77.3%, MEEP only 53.1%. Unusual asymmetry — the simple TM agrees with softmax more than the full MEEP. This is the rare case where the analytic approximation is more accurate than the FDTD: Head 11 is geometry the TM idealization captures and PML disturbs.
MEEP 48.4%, TM 46.1%. Mid-diffuse, both implementations broadly track softmax half the time. Sits at the boundary between physically-captured and irreducible.
MEEP 43.8%, TM 32.8%. Below half — drifting toward irreducibility. Photonic substrate captures less than chance for many query positions; this head's W_Q · W_Kᵀ has structure no thin dielectric stack reproduces.
MEEP 39.1%, TM 35.9%. The two photonic implementations drift toward each other and away from softmax — they're computing similar (non-softmax) attention distributions, not the trained one.
MEEP 37.5%, TM 50%. The TM does better than MEEP on this head — close to chance for either. The photonic cavity is not what this head is doing.
MEEP 35.9%, TM 28.9%, MEEP=TM only 9.4%. The lowest-agreement head in the layer. Quantum dot 36%, acoustic 30%, mechanical 30%, microwave 6% — every classical-physics substrate fails on Head 10. This is the head doing the irreducible symbolic-manipulation work.
What this means for hardware compute is on the meaning page.