The atlas of layer 0

Fourteen attention heads. Each ran through real MIT MEEP FDTD plus the in-venv 1D transfer-matrix photonic adapter at 128 query positions in a real residual stream. For each head, the bar shows the fraction of cells where MEEP's stored-energy peak coincided with softmax's argmax. The fingerprint card on the right encodes which physical substrates capture the head most cleanly.

MEEP top-1 = softmax top-1, per head, layer 0 128 query positions per head, real MIT MEEP FDTD 100% 75% 50% 25% 0% h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 MEEP FDTD In-venv 1D TM
Heads 1, 5, 7, 13 (orange labels) are the photonically-aligned heads — both the in-venv transfer matrix and full MIT MEEP FDTD agree with softmax in 65–100% of cells. Head 1 is the extreme: 100% bitwise across all 128 query positions. Head 10 (highlighted in white) is the worst-fit — neither photonic implementation reaches 40%.

Per-head classification

Below: each head's two-line summary. Type label classifies by photonic affinity. Numeric rank is by MEEP-equals-softmax percentage at layer 0, 128 positions.

Head 1

Optical (perfect)

Bitwise photonic. Both MEEP and the 1D transfer matrix agree with softmax on every one of the 128 query positions tested. The trained W_Q · W_Kᵀ coupling matrix happens to be exactly what light does in a thinly-layered dielectric stack.

100.0%
MEEP=SM and TM=SM

Head 5

Optical (sharp)

Sharp photonic insphere. TM agrees with softmax in 97.7% of cells, MEEP in 87.5%. The transfer-matrix approximation captures the head almost perfectly; full FDTD shows it's still a 1D dielectric phenomenon with minor deviation from the ideal.

87.5%
MEEP=SM, 97.7% TM=SM

Head 7

Optical (insphere)

Sharp insphere. TM at 93%, MEEP at 65.6% — full FDTD diverges more than the simpler model, suggesting Head 7 is an idealized 1D photonic geometry that the full MEEP cell's PML and finite resolution don't perfectly capture.

65.6%
MEEP=SM, 93.0% TM=SM

Head 13

Mostly optical

Mid-sharp. TM 75.8%, MEEP 66.4%. Both reach majority agreement — this head's coupling is photonic but with non-trivial residue that neither implementation fully captures.

66.4%
MEEP=SM, 75.8% TM=SM

Head 6

Mid-photonic

MEEP 70.3%, TM 56.3%. Real FDTD captures it better than the simple TM — suggests this head's geometry needs the PML/full-cell physics that MEEP provides but the analytic 1D approximation drops.

70.3%
MEEP=SM, 56.3% TM=SM

Head 4

Mid-photonic

MEEP 66.4%, TM 67.2%. Both implementations agree with each other (54.7%) and with softmax at similar rates — a stable mid-range head where the photonic match is partial but consistent.

66.4%
MEEP=SM, 67.2% TM=SM

Head 0

Diffuse

MEEP 59.4%, TM 36.7%. Diffuse-mid. The TM particularly fails — its frequency-domain assumption misses something MEEP's time-domain captures partially. Residual stream's first head is broader than the 1D approximation can describe.

59.4%
MEEP=SM, 36.7% TM=SM

Head 9

Diffuse

MEEP 57%, TM 37.5%. The 0.76 mean rel-error vs softmax is the largest of the diffuse heads — Head 9 is doing something the 1D dielectric stack can correlate with but not reproduce.

57.0%
MEEP=SM, 37.5% TM=SM

Head 11

Diffuse-photonic

TM 77.3%, MEEP only 53.1%. Unusual asymmetry — the simple TM agrees with softmax more than the full MEEP. This is the rare case where the analytic approximation is more accurate than the FDTD: Head 11 is geometry the TM idealization captures and PML disturbs.

53.1%
MEEP=SM, 77.3% TM=SM

Head 8

Diffuse

MEEP 48.4%, TM 46.1%. Mid-diffuse, both implementations broadly track softmax half the time. Sits at the boundary between physically-captured and irreducible.

48.4%
MEEP=SM, 46.1% TM=SM

Head 12

Weak-physical

MEEP 43.8%, TM 32.8%. Below half — drifting toward irreducibility. Photonic substrate captures less than chance for many query positions; this head's W_Q · W_Kᵀ has structure no thin dielectric stack reproduces.

43.8%
MEEP=SM, 32.8% TM=SM

Head 2

Weak-physical

MEEP 39.1%, TM 35.9%. The two photonic implementations drift toward each other and away from softmax — they're computing similar (non-softmax) attention distributions, not the trained one.

39.1%
MEEP=SM, 35.9% TM=SM

Head 3

Weak-physical

MEEP 37.5%, TM 50%. The TM does better than MEEP on this head — close to chance for either. The photonic cavity is not what this head is doing.

37.5%
MEEP=SM, 50.0% TM=SM

Head 10

Irreducible

MEEP 35.9%, TM 28.9%, MEEP=TM only 9.4%. The lowest-agreement head in the layer. Quantum dot 36%, acoustic 30%, mechanical 30%, microwave 6% — every classical-physics substrate fails on Head 10. This is the head doing the irreducible symbolic-manipulation work.

35.9%
MEEP=SM, 28.9% TM=SM
?
Reading the table
Eleven heads are physical to varying degrees — the photonic substrate reproduces softmax decisions at majority-or-better rates. Three heads (10, 3, 2) are weakly captured or irreducible; they're doing the work no thin dielectric stack does. The same model architecture, same training, same prompt — and the individual heads have settled into different physical behaviors.

What this means for hardware compute is on the meaning page.