Most LLM inference requires Python, PyTorch, CUDA, and a GPU. The dependency stack is measured in gigabytes. The minimum viable deployment is a cloud server with a $10,000 accelerator.
WaveEngine is a complete inference engine for Wave Field models. It is written in C. It compiles to a 52KB binary. It has zero external dependencies. It runs on a CPU.
WaveEngine contains everything needed to load a trained Wave Field model and generate text from it. There is no dependency on any external library — not even the C standard library's math functions for the critical path. Every operation is implemented from scratch:
The entire engine is a single-file compilation unit. No build system beyond a C compiler is required. cc waveengine.c -o waveengine -lm produces the binary.
Standard transformer inference is computationally expensive because the core operation — self-attention — requires large matrix multiplications at every layer. These multiplications are where GPUs earn their keep: thousands of parallel cores performing O(N²) multiply-accumulate operations.
Wave Field attention replaces matrix multiplication with FFT convolution. The FFT is an inherently sequential-friendly algorithm — it has good cache locality, predictable memory access patterns, and O(N log N) complexity. It runs well on CPUs. It does not need the massive parallelism of a GPU to be practical.
This is not about optimization tricks or hand-tuned SIMD. It is a structural property of the architecture. When your core operation is FFT instead of matmul, the hardware requirements change fundamentally.
The Android deployment is particularly notable. A 6.2 MB APK contains the inference engine and the model weights. This is an LLM that fits in the space of a photograph. It runs without network access, without cloud APIs, without sending data anywhere.
WaveEngine does not include training. It is inference only — it loads a model that was trained using the full Python/PyTorch training pipeline and runs it forward. The training stack is necessarily more complex: it needs automatic differentiation, optimizer state, gradient accumulation, and all the machinery that makes neural network training work.
But inference is where deployment happens. And for deployment, the question is: what is the minimum viable runtime? For standard transformers, the answer involves Python interpreters, CUDA runtimes, GPU drivers, and gigabytes of framework code. For Wave Field, the answer is 52 kilobytes of C.
Every dependency in an inference stack is a point of failure, a security surface, a compatibility constraint, and a deployment burden. PyTorch alone pulls in hundreds of megabytes of compiled code and requires specific Python versions, specific CUDA versions, and specific GPU drivers.
WaveEngine has none of this. It compiles on any system with a C compiler — which is every system. It links against libm for standard math functions and nothing else. There is no package manager, no virtual environment, no container, no orchestration layer. The binary is the deployment.
This is not minimalism for its own sake. It is a practical consequence of an architecture whose core operation — FFT convolution — is simple enough to implement correctly in a small amount of code, and efficient enough to run without hardware acceleration.
To put the binary size in perspective: the favicon on most websites is larger than WaveEngine. A single emoji in a modern font takes more storage. The HTTP response headers for a typical API call to a cloud-hosted LLM contain more bytes than the entire engine that could run the model locally.
This is what becomes possible when the attention mechanism is a wave equation instead of a matrix multiplication. The algorithm is simple. The implementation is small. The deployment is everywhere.