Attention assigns each query position a probability distribution over keys via
softmax(Q · Kᵀ / √d_head). The same arithmetic admits a physical reading:
each key is a site, each query–key inner product is a site energy, and the resulting
distribution is the thermal occupation of a one-dimensional system at unit temperature.
Once you accept that reading, every classical-physics solver becomes a candidate attention engine. We compile the head's weights as that solver's native configuration, run the solver, and read the resulting site occupation as the attention weights. No retraining. The trained weights are the substrate parameters.
(W_Q, W_K, W_V, W_O, b_Q, b_K, b_V) are
translated into the configuration space of five different physical solvers, each
invoked through its native open-source library. The harness is at
hypercircuits/modality_harness/; the base interface is a
HeadAdapter abstract class with two methods: __init__
(compile weights into substrate config) and run_query (run the physics,
read attention).
The abstract base class HeadAdapter takes the head's weight matrices, biases,
and RoPE tables, plus a modality_params dict with substrate-specific
hyperparameters (τ for QD, α for FEM modalities, dx
for microwave, γ for photonic). Each subclass implements
__init__ (transposing weights to substrate config) and run_query
(running physics, returning the d_model-space residual contribution).
The harness ModalityHarness orchestrates a full forward pass: per layer,
per query position, per head, the chosen substrate's solver runs and the output is
accumulated into the residual stream. With QuantumDotHeadAdapter(τ=0.0)
the harness reproduces the standard softmax forward pass to machine precision
(verified on Qwen2.5-0.5B, full 24 layers, top-1 ' Paris' match on
"The capital of France is").
Adding a sixth substrate is one new file and a registry entry. The plumbing is done.
See the atlas page for which substrate matched which head.