Built in public

Qovaryx Research

Architecture notes, devlog entries, and lessons from training a sovereign AI cluster from scratch for options trading. Six long-form articles published; 23 more queued and listed below. Public framings, not training recipes.

Not financial advice. Research articles describe model architecture and training methodology — they are not trade signals. Trading options involves substantial risk of loss.

Sovereign, not borrowed

Why we built our own decoder from the tokenizer up instead of fine-tuning a foundation model — and what that buys us when the API license changes or the rate limit gates flip.

Calibrated conviction beats raw confidence

The model that says "BUY 80%" only matters if 80% is real. Why our 6x/3x/2x/1x sizing ties to calibrated probability, not to the raw softmax score.

Abstention is a decision

NO_TRADE isn't the absence of a call — it's a positive vote against entering. How we measure abstention coverage and why the brutal-test pass rate matters more than win rate.

The brutal evaluation

What "92% should be NO_TRADE" really means, and why testing against the hard cases instead of the easy ones is the only honest production signal.

Why a cluster of small heads beats one big model

Each Qovaryx head is small, specialized, and explainable. The cluster votes. What that buys you when a single 9B model would hallucinate on a tail-risk question.

CPU, not cloud

The decoder runs on your laptop. Sub-millisecond inference, no GPU, no data leaving your machine. Why we made every architectural choice in service of that.

Planned research (23 articles) — detail pages rolling out weekly

These framings are queued for publication. Titles and abstracts are stable; long-form detail pages ship as the underlying work clears internal review. Nudge the queue on Discord.

Shell-governed cognition

The deterministic wrapper around the model carries most of the measured alpha. The verifier layer is not advisory — it has hard veto authority. Decisions pass deterministic gates before emission.

Model as subsystem

The neural model is one component inside a governed decision system, not the whole intelligence. The shell, verifiers, monitors, and execution layer are first-class architectural peers — each measured independently.

Verifier-governed inference

Every decision is verified against deterministic checks before finalization. The verifier is a function, not a model — it cannot drift. The model learns what outputs the verifier will accept.

Legacy brain crystallization

We compress the structural law of a domain into high-density crystal atoms before introducing noisy large-scale replay. 24× row compression. The learning curve shifts from token-dominated to structure-dominated.

Weighted learning density

Progress is measured by how much useful structure a training row contributes per token, not by row count. The gradient reflects the law's confidence, not the surface area of the text.

EVO20 training genome

Twenty staged curriculum phases with explicit roles, budgets, and verifier gates. Finance spine, language spine, math spine, crystals, chart anchors, and replay anneal — every phase has a named purpose.

Train the law before the noise

Structural rules taught first on the smallest corpus that carries them. Large noisy replay deferred. Models develop a prior over structural setups before encountering messy real data.

Adaptive compute architecture

The model spends compute adaptively based on input difficulty and stakes. Reflex layers handle obvious cases cheaply. Hard inputs trigger deep latent reasoning. The model gains the right to abstain from thinking.

Sparse cognition

Stored capacity is broad; active footprint is narrow; compute follows evidence. Mixture-of-depths and adaptive routing let a small model carry a large specialist surface without ballooning active memory.

Compact frontier architectures

A model constrained to consumer hardware is a different object. Storage is scarce at design time. Ternary FFNs, low-rank SwiGLU, routed experts, sparse MoE — architectural commitments, not optimizations.

Storage-aware intelligence

Three numbers reported together: trainable parameters, expected packed storage, and active compute path. A win only counts if it improves capability without hiding cost in another column.

The execution reality layer

Gross directional accuracy is not tradability. Decisions must survive realistic execution friction (10bps per call: spread, slippage, impact, fill quality). Most published alpha evaporates at this cliff.

Adversarial market evaluation

We train an adversary to generate plausible-but-pathological market trajectories the policy fails on. Catches scenarios never seen in history but physically realistic.

The multi-part promotion gate

A specialist must clear multiple independent checks before promotion: above-majority accuracy on date-disjoint holdout, diversity under perturbation, positive bootstrap-CI lower bound, no class collapse.

Eval and reproducibility contract

Every serious run records model config, tokenizer, corpus hash, optimizer, precision, curriculum phase, hardware, checkpoints, audit file, and eval status. Internal reproducibility without publishing recipes.

Date-disjoint forward holdouts

Random splits leak via per-symbol and per-date memorization. We enforce date-disjoint holdouts with multi-day purge gaps so the model is tested on regimes it has never seen.

Local sovereign AI

Runs on hardware the operator owns. Learns from a corpus the operator controls. Emits an audit trail the operator can read end-to-end. No remote API dependency.

Intelligence per watt

The unit of progress is not parameters, tokens, or FLOPs. It is how much useful, audited, decision-grade cognition we extract from a fixed envelope of stored bits, active compute, and electricity.

5.69× training throughput on consumer GPU

Through engineering improvements in shard handling, token budgeting, attention selection, and curriculum, we achieved 5.69× speedup on consumer-GPU training (2,392 → 13,617 tokens/sec).

Committee of specialists, routed

Instead of one model collapsing on many decision surfaces, a small committee of compact specialists — each trained on a single task — coordinated by a learned router and gated by a deterministic veto.

Outcome-aligned speculative curriculum

The multi-token prediction head's target predicts tokens from the higher-R-multiple counterfactual sibling instead of the actual training row. The body learns to encode trajectory class without altering surface tokens.

Conviction-conditioned loss temperature

Loss temperature is set per-row by the conviction label. High-conviction rows get a sharpened loss; low-conviction rows get a softened loss. A calibration regularizer anchors emitted confidence to human labels.

More coming. Want to nudge the queue? Join the Discord.