Sovereign, not borrowed
Why we built our own decoder from the tokenizer up instead of fine-tuning a foundation model — and what that buys us when the API license changes or the rate limit gates flip.
Calibrated conviction beats raw confidence
The model that says "BUY 80%" only matters if 80% is real. Why our 6x/3x/2x/1x sizing ties to calibrated probability, not to the raw softmax score.
Abstention is a decision
NO_TRADE isn't the absence of a call — it's a positive vote against entering. How we measure abstention coverage and why the brutal-test pass rate matters more than win rate.
The brutal evaluation
What "92% should be NO_TRADE" really means, and why testing against the hard cases instead of the easy ones is the only honest production signal.
Why a cluster of small heads beats one big model
Each Qovaryx head is small, specialized, and explainable. The cluster votes. What that buys you when a single 9B model would hallucinate on a tail-risk question.
CPU, not cloud
The decoder runs on your laptop. Sub-millisecond inference, no GPU, no data leaving your machine. Why we made every architectural choice in service of that.
Planned research (23 articles) — detail pages rolling out weekly
These framings are queued for publication. Titles and abstracts are stable; long-form detail pages ship as the underlying work clears internal review. Nudge the queue on Discord.
Shell-governed cognition
The deterministic wrapper around the model carries most of the measured alpha. The verifier layer is not advisory — it has hard veto authority. Decisions pass deterministic gates before emission.
Model as subsystem
The neural model is one component inside a governed decision system, not the whole intelligence. The shell, verifiers, monitors, and execution layer are first-class architectural peers — each measured independently.
Verifier-governed inference
Every decision is verified against deterministic checks before finalization. The verifier is a function, not a model — it cannot drift. The model learns what outputs the verifier will accept.
Legacy brain crystallization
We compress the structural law of a domain into high-density crystal atoms before introducing noisy large-scale replay. 24× row compression. The learning curve shifts from token-dominated to structure-dominated.
Weighted learning density
Progress is measured by how much useful structure a training row contributes per token, not by row count. The gradient reflects the law's confidence, not the surface area of the text.
EVO20 training genome
Twenty staged curriculum phases with explicit roles, budgets, and verifier gates. Finance spine, language spine, math spine, crystals, chart anchors, and replay anneal — every phase has a named purpose.
Train the law before the noise
Structural rules taught first on the smallest corpus that carries them. Large noisy replay deferred. Models develop a prior over structural setups before encountering messy real data.
Adaptive compute architecture
The model spends compute adaptively based on input difficulty and stakes. Reflex layers handle obvious cases cheaply. Hard inputs trigger deep latent reasoning. The model gains the right to abstain from thinking.
Sparse cognition
Stored capacity is broad; active footprint is narrow; compute follows evidence. Mixture-of-depths and adaptive routing let a small model carry a large specialist surface without ballooning active memory.
Compact frontier architectures
A model constrained to consumer hardware is a different object. Storage is scarce at design time. Ternary FFNs, low-rank SwiGLU, routed experts, sparse MoE — architectural commitments, not optimizations.
Storage-aware intelligence
Three numbers reported together: trainable parameters, expected packed storage, and active compute path. A win only counts if it improves capability without hiding cost in another column.
The execution reality layer
Gross directional accuracy is not tradability. Decisions must survive realistic execution friction (10bps per call: spread, slippage, impact, fill quality). Most published alpha evaporates at this cliff.
Adversarial market evaluation
We train an adversary to generate plausible-but-pathological market trajectories the policy fails on. Catches scenarios never seen in history but physically realistic.
The multi-part promotion gate
A specialist must clear multiple independent checks before promotion: above-majority accuracy on date-disjoint holdout, diversity under perturbation, positive bootstrap-CI lower bound, no class collapse.
Eval and reproducibility contract
Every serious run records model config, tokenizer, corpus hash, optimizer, precision, curriculum phase, hardware, checkpoints, audit file, and eval status. Internal reproducibility without publishing recipes.
Date-disjoint forward holdouts
Random splits leak via per-symbol and per-date memorization. We enforce date-disjoint holdouts with multi-day purge gaps so the model is tested on regimes it has never seen.
Local sovereign AI
Runs on hardware the operator owns. Learns from a corpus the operator controls. Emits an audit trail the operator can read end-to-end. No remote API dependency.
Intelligence per watt
The unit of progress is not parameters, tokens, or FLOPs. It is how much useful, audited, decision-grade cognition we extract from a fixed envelope of stored bits, active compute, and electricity.
5.69× training throughput on consumer GPU
Through engineering improvements in shard handling, token budgeting, attention selection, and curriculum, we achieved 5.69× speedup on consumer-GPU training (2,392 → 13,617 tokens/sec).
Committee of specialists, routed
Instead of one model collapsing on many decision surfaces, a small committee of compact specialists — each trained on a single task — coordinated by a learned router and gated by a deterministic veto.
Outcome-aligned speculative curriculum
The multi-token prediction head's target predicts tokens from the higher-R-multiple counterfactual sibling instead of the actual training row. The body learns to encode trajectory class without altering surface tokens.
Conviction-conditioned loss temperature
Loss temperature is set per-row by the conviction label. High-conviction rows get a sharpened loss; low-conviction rows get a softened loss. A calibration regularizer anchors emitted confidence to human labels.