Rethinking attention with wave physics

Abstract flowing wave-like forms in deep indigo, blue, and purple with pale highlights
Wave interference — deep field, pale crests (conceptual)

We're building O(N log N) attention mechanisms that run on consumer GPUs. No quadratic bottleneck. No datacenter required.

21.8×
faster at 32K context
512×
less memory than KV cache
256K
context on $2K GPU
what is wave field

Every large language model today uses self-attention — each token compares itself with every other token. This is powerful, but it scales quadratically: O(N²). At 32K tokens, a single attention layer performs over a billion operations. At 128K, it's physically impossible on most hardware.

Wave Field replaces this with something fundamentally different. Tokens don't compare with each other at all. Instead, they deposit information onto a continuous field, and wave physics propagates that information via FFT convolution. Each attention head is a damped oscillator with three learnable parameters: frequency (ω), damping (α), and phase (φ).

The result is O(N log N) complexity — and it changes what's computationally possible on consumer hardware.

how it works
1. scatter
tokens deposit values onto a continuous 1D field
2. convolve
FFT-based wave kernel propagates information — O(N log N)
3. gather
tokens read back enriched representations from the field
k(t) = exp(−α·t) · cos(ω·t + φ)
the wave kernel — three learnable parameters per head
benchmarks
GPU VRAM Max Context Speed
RTX 3090 24GB 256K 66K tok/s
RTX 5090 32GB 256K 157K tok/s
H100 80GB 512K 183K tok/s
Standard transformer any OOM at 32K
measured on 130M model, float32, same hardware
view all benchmarks →
links
GitHub
benchmarks & results
Paper
coming soon
LinkedIn
updates & posts
Contact
get in touch
latest

How Wave Propagation Replaces Attention

An interactive exploration of O(N log N) attention. Visualize wave kernels, compare GPU benchmarks, and understand why standard transformers hit a wall at 32K tokens.

180 Parameters That Control Everything

We discovered that all attention routing in a 132M model is controlled by just 2,160 wave parameters. The rest compresses to INT8 with improved quality.

Training on Consumer GPUs

Standard transformers OOM at 32K on a 3090. Wave Field runs 256K. We measured it across RTX 3090, 5090, H100, and Blackwell.

Variable Heads Self-Organize

Small attention heads naturally learn long-range waves. Large heads learn local grammar. The architecture discovers its own specialization.

WaveEngine: 52KB of Pure C

A complete inference engine in 1,670 lines of C. Custom FFT, wave kernels, tokenizer. Runs on phone, laptop, server — no Python required.

contact

Reach out for collaborations, questions, or press.

badaramoni.avinash@gmail.com

LinkedIn profile