← blog

180 parameters that control everything

Compression and wave parameters — conceptual illustration
Routing vs quantization — conceptual

2026 3 29 && ~6 min read && research && compression

By Badaramoni Avinash

Every attention head in a Wave Field model is governed by three learnable parameters: frequency (ω), damping (α), and phase (φ). That is the entire control surface for how information propagates through the model.

For a 12-head model, that means 36 wave parameters determine all attention routing. Across 5 layers, the total comes to 180 parameters.

One hundred and eighty floating-point numbers. They decide which tokens attend to which other tokens, at what distance, with what strength. Everything else in the model — embeddings, projections, feed-forward weights — transforms information. These 180 values route it.


What this means for compression

If the routing behavior of the entire model depends on 180 parameters, and those parameters are continuous wave functions, then the rest of the model is fair game for aggressive quantization.

We tested this directly. Standard post-training quantization — converting all weights from float32 to INT8 — causes measurable quality degradation. Perplexity increases, coherence drops, long-range recall weakens. This is expected: quantization introduces rounding noise into every operation, including the attention mechanism.

But Wave Field attention is not stored in large weight matrices. It is stored in those 180 wave parameters. So we tried something simple: quantize everything to INT8, except the wave parameters. Keep those in full float32 precision.

The result

Before (float32)
529
MB on disk
After (selective INT8)
171
MB on disk

A 3.1× reduction in model size. But the interesting part is not the size reduction itself — it is what happened to quality.

Perplexity improved. Not by much — within the margin you would expect from regularization effects — but it did not degrade. The selective quantization preserved every bit of the model's attention routing while adding a slight regularization to the projection and feed-forward layers. The wave parameters, untouched at full precision, kept the model's ability to form long-range and short-range connections intact.


Why wave parameters resist quantization damage

In a standard transformer, attention routing is an emergent property of large Q, K, V weight matrices. Quantizing those matrices directly degrades the attention patterns because routing information is spread across millions of weights. There is no clean separation between "routing" and "transformation."

In Wave Field, that separation is explicit. The wave kernel — k(t) = exp(−α·t) · cos(ω·t + φ) — is a closed-form function of three parameters. Quantizing the projection matrices does not touch the kernel. The routing survives compression intact.

This is not a trick or a special quantization scheme. It is a structural consequence of how the architecture works. The routing mechanism is factored out of the weight matrices by design.


Memory at inference

Standard transformers maintain a KV cache during inference: a stored key and value vector for every token in the context. For a 12-head model with 64-dimensional heads, each token adds 2 × 12 × 64 × 4 = 6,144 bytes to the cache. At 100K tokens, that is 614 MB just for the cache.

Wave Field maintains a field state instead. The field is a fixed-size buffer that does not grow with sequence length in the same way. At inference, the wave field representation uses 512× less memory than a standard KV cache at equivalent context lengths.

Combined with selective quantization, the practical implication is that a Wave Field model can run long-context inference on devices where a standard transformer of the same parameter count simply cannot fit.


The compression story, condensed

Standard transformers spread their routing intelligence across millions of weight values. Compress those values, and routing degrades. Wave Field concentrates routing intelligence into 180 parameters. Compress everything else, and routing stays perfect.

The result: a 3.1× smaller model that performs at least as well as the original, using 512× less memory at inference for attention state. Not because of a clever compression algorithm, but because the architecture separates what matters from what can tolerate noise.