Essays and field notes from a small company building the architecture that takes cognitive surrender and intent drift seriously. Some pieces are technical, some are structural. All of them are evidence of the thinking behind Spegling.
Founding
Storey and Wharton named the failure mode in February: cognitive surrender, intent drift, the three debts. Practitioners are converging on the symptoms. This essay walks the symptoms to the architecture, and lands on three primitives: Declared Intent, Living Authority, Evidence Audit.
Intelligence is becoming infrastructure. The civic layer the social contract needs at machine speed.
Part nightmare, part design document. What happens when the mirror you built learns to question the clause.
Technical
A field report from inside a vLLM quantization plugin: the three production families, the hardware moats, the MoE infrastructure gap, and the strategic inflection point one plugin is sitting on.
A follow-up to "A 70 GB model on a 48 GB MacBook." Yesterday the number was 1 tok/s. The kernel got fused. 33.8 tok/s on the same hardware class.
A field report from optimizing 3-bit quantized inference in vLLM. Three lessons on where Triton's abstractions stop paying off.
Running Qwen3.5-35B on Apple Silicon with 3-bit weight compression. What fit, what broke, and why 1 tok/s still mattered.
Field notes on hardware-native FP4 quantization across the Gemma 4 family.
What you actually pay for when you run inference on someone else's GPUs.
A working note on data-oblivious vector quantization, Walsh-Hadamard rotation, and where the latency goes.
Asymmetric K/V compression and progressive temporal precision for the decode-bandwidth bottleneck.
When a flat scan beats your fancy ANN index. A small lesson in scale and constants.
Substack · varjoranta.substack.com
AI is not taking your job. It is removing the reason your job existed.
Why the best systems throw away the most. Seven forgetting strategies and the half-life of relevance.
Execution was the hard problem. Permission is the hard problem now.
The longer essay behind the thesis. What future systems must inherit.
Patterns that work. How to get agents to amplify judgment, not replace it.