Atlas — Substrate Fingerprints

The atlas of layer 0

Fourteen attention heads. Each ran through real MIT MEEP FDTD plus the in-venv 1D transfer-matrix photonic adapter at 128 query positions in a real residual stream. For each head, the bar shows the fraction of cells where MEEP's stored-energy peak coincided with softmax's argmax. The fingerprint card on the right encodes which physical substrates capture the head most cleanly.

Heads 1, 5, 7, 13 (orange labels) are the photonically-aligned heads — both the in-venv transfer matrix and full MIT MEEP FDTD agree with softmax in 65–100% of cells. Head 1 is the extreme: 100% bitwise across all 128 query positions. Head 10 (highlighted in white) is the worst-fit — neither photonic implementation reaches 40%.

Per-head classification

Below: each head's two-line summary. Type label classifies by photonic affinity. Numeric rank is by MEEP-equals-softmax percentage at layer 0, 128 positions.

Head 1

Optical (perfect)

Bitwise photonic. Both MEEP and the 1D transfer matrix agree with softmax on every one of the 128 query positions tested. The trained W_Q · W_Kᵀ coupling matrix happens to be exactly what light does in a thinly-layered dielectric stack.

100.0%

MEEP=SM and TM=SM

Head 5

Optical (sharp)

Sharp photonic insphere. TM agrees with softmax in 97.7% of cells, MEEP in 87.5%. The transfer-matrix approximation captures the head almost perfectly; full FDTD shows it's still a 1D dielectric phenomenon with minor deviation from the ideal.

87.5%

MEEP=SM, 97.7% TM=SM

Head 7

Optical (insphere)

Sharp insphere. TM at 93%, MEEP at 65.6% — full FDTD diverges more than the simpler model, suggesting Head 7 is an idealized 1D photonic geometry that the full MEEP cell's PML and finite resolution don't perfectly capture.

65.6%

MEEP=SM, 93.0% TM=SM

Head 13

Mostly optical

Mid-sharp. TM 75.8%, MEEP 66.4%. Both reach majority agreement — this head's coupling is photonic but with non-trivial residue that neither implementation fully captures.

66.4%

MEEP=SM, 75.8% TM=SM

Head 6

Mid-photonic

MEEP 70.3%, TM 56.3%. Real FDTD captures it better than the simple TM — suggests this head's geometry needs the PML/full-cell physics that MEEP provides but the analytic 1D approximation drops.

70.3%

MEEP=SM, 56.3% TM=SM

Head 4

Mid-photonic

MEEP 66.4%, TM 67.2%. Both implementations agree with each other (54.7%) and with softmax at similar rates — a stable mid-range head where the photonic match is partial but consistent.

66.4%

MEEP=SM, 67.2% TM=SM

Head 0

Diffuse

MEEP 59.4%, TM 36.7%. Diffuse-mid. The TM particularly fails — its frequency-domain assumption misses something MEEP's time-domain captures partially. Residual stream's first head is broader than the 1D approximation can describe.

59.4%

MEEP=SM, 36.7% TM=SM

Head 9

Diffuse

MEEP 57%, TM 37.5%. The 0.76 mean rel-error vs softmax is the largest of the diffuse heads — Head 9 is doing something the 1D dielectric stack can correlate with but not reproduce.

57.0%

MEEP=SM, 37.5% TM=SM

Head 11

Diffuse-photonic

TM 77.3%, MEEP only 53.1%. Unusual asymmetry — the simple TM agrees with softmax more than the full MEEP. This is the rare case where the analytic approximation is more accurate than the FDTD: Head 11 is geometry the TM idealization captures and PML disturbs.

53.1%

MEEP=SM, 77.3% TM=SM

Head 8

Diffuse

MEEP 48.4%, TM 46.1%. Mid-diffuse, both implementations broadly track softmax half the time. Sits at the boundary between physically-captured and irreducible.

48.4%

MEEP=SM, 46.1% TM=SM

Head 12

Weak-physical

MEEP 43.8%, TM 32.8%. Below half — drifting toward irreducibility. Photonic substrate captures less than chance for many query positions; this head's W_Q · W_Kᵀ has structure no thin dielectric stack reproduces.

43.8%

MEEP=SM, 32.8% TM=SM

Head 2

Weak-physical

MEEP 39.1%, TM 35.9%. The two photonic implementations drift toward each other and away from softmax — they're computing similar (non-softmax) attention distributions, not the trained one.

39.1%

MEEP=SM, 35.9% TM=SM

Head 3

Weak-physical

MEEP 37.5%, TM 50%. The TM does better than MEEP on this head — close to chance for either. The photonic cavity is not what this head is doing.

37.5%

MEEP=SM, 50.0% TM=SM

Head 10

Irreducible

MEEP 35.9%, TM 28.9%, MEEP=TM only 9.4%. The lowest-agreement head in the layer. Quantum dot 36%, acoustic 30%, mechanical 30%, microwave 6% — every classical-physics substrate fails on Head 10. This is the head doing the irreducible symbolic-manipulation work.

35.9%

MEEP=SM, 28.9% TM=SM

Reading the table

Eleven heads are physical to varying degrees — the photonic substrate reproduces softmax decisions at majority-or-better rates. Three heads (10, 3, 2) are weakly captured or irreducible; they're doing the work no thin dielectric stack does. The same model architecture, same training, same prompt — and the individual heads have settled into different physical behaviors.

What this means for hardware compute is on the meaning page.