Project 02 — Editorial Publication

Verifiable benchmarks for local AI inference hardware.

Silicon Logic is an independent publication where every published benchmark is cryptographically signed, every prompt is human-authored, and every methodology decision is public. Readers don't have to trust the chart — they can verify the run. The maintainer's public key is published in the repository, and every benchmark in the editorial archive can be independently verified against it using a single command.

Verify a run Read methodology

Editorial Substrate

The archive starts with signed evidence.

The first milestone is not a chart. It is a verifiable substrate: canonical run artifacts, human-authored prompts, structural independence, and a public verification path.

Signed runs in archive 0

Initial editorial archive shipped May 2026. Each run is a canonical Ed25519-signed BenchmarkRun JSON that any third party can independently verify against the maintainer's published public key.

A1-original prompts 0

Every prompt in the v1 benchmark suite is written by the editor. No prompts curated from MT-Bench, HELM, or other published suites. No AI-generated test content. Every prompt is accountable to a named human author.

Numbers are everywhere. Defensible numbers are rare.

Local AI inference hardware is increasingly important, increasingly contested, and surprisingly under-served by rigorous editorial coverage. Hardware vendors publish competing benchmark claims for the same silicon. Standard benchmark suites get marketed around. Community projects surface useful but unauditable numbers. YouTubers run impressive-looking tests with undocumented methodology.

The result: readers who care about local LLM performance — researchers, engineers building on consumer GPUs, Apple Silicon developers, anyone choosing hardware for inference workloads — can find plenty of numbers. What's missing is numbers they can defend.

The Publication

Two tracks, two methodologies, zero composite AI scores.

Silicon Logic publishes in two tracks. The separation is the point: hardware performance and model behavior answer different questions.

Track 1

Hardware Performance

Hardware Performance covers consumer GPUs, Apple Silicon, and AI accelerators with cryptographically signed performance metrics on a weekly editorial cadence.

Tokens/sec TTFT Latency pctl Memory Energy

Track 2

Model Quality

Model Quality covers in-depth model reviews at a slower cadence (every 6-8 weeks), evaluating model behavior on tasks with separate methodology from hardware benchmarks.

Behavior Task fit Failure modes Review depth

The two tracks have different methodologies and never blur into composite "AI scores." Hardware performance measures tokens per second, time to first token, latency percentiles, memory pressure, and energy efficiency. Model quality evaluates behavior. Readers get the right measurement for their question instead of a marketing-friendly conflation.

What Makes Silicon Logic Different

Eight dimensions, held together.

Eight dimensions distinguish the publication. No single dimension is unique — Phoronix is independent, MLPerf has methodology transparency, Procyon ships reproducible benchmarks. The defensible position is the combination:

AI hardware benchmarking focus — local inference is the editorial domain, not a side beat

Cryptographic provenance — every run signed with Ed25519, public key committed to the repository, signatures verify mathematically

Editorial accountability — a named human editor with a verifiable public-record track record, not a brand or institution

Methodology transparency — every methodology decision documented as a versioned, reader-facing artifact

Independence — zero vendor sponsorship, zero placement deals, structural rather than asserted

Reproducible methodology — every signed run includes the harness version, prompt suite version, and execution parameters needed to re-run the benchmark

Local-inference focus — Apple Silicon, consumer GPUs, and the hardware actual readers run, not datacenter benchmarks

Track separation — hardware performance and model quality measured separately, never collapsed into a single ranking

No investigated competitor combines all eight. That's Silicon Logic's editorial position.

How The Substrate Works

Every benchmark execution flows through the same pipeline.

The pipeline is designed to make editorial claims traceable from model server to canonical artifact to cryptographic signature.

Harness

The harness runs the model server — llama-server for GGUF models, mlx_lm.server for Apple Silicon MLX models. It executes the prompt with one warmup trial (discarded) and five counted trials, captures timing via Python's perf_counter for sub-millisecond precision, and emits a BenchmarkTimings record with median aggregation across the counted trials.

Mapper

The mapper converts harness output into a canonical BenchmarkRun — the published form, schema-validated across Python (Pydantic), TypeScript (Zod), and Postgres (Drizzle). Schema synchronization across the three layers is verified by a CI check on every commit.

Signer

The signer takes the canonical JSON, computes its canonical-bytes representation, signs with the maintainer's Ed25519 private key (loaded at runtime from 1Password, never on disk), and emits a signed BenchmarkRun. The signed artifact is then re-read from disk and re-verified against the public key before being accepted — defense in depth against serialization corruption.

Verifying any published run takes one command. The repository contains the public key, the verification function, and the documented procedure. Either the math says the run is valid, or it doesn't. There is no "trust us."

Current State

Phase 1.4 is publishable.

Phase 1.4 publishable milestone shipped May 2026. Twenty pull requests merged across two days of focused execution build the editorial substrate end-to-end: schemas, harness lifecycle, multi-trial orchestration, mapper, signing pipeline, CLI, and the first ten signed sample runs.

The first ten runs in the editorial archive benchmark Llama 3.2 1B Instruct on a MacBook Pro M5 Max across five prompts (conversational, code, reasoning, long-context, and a second reasoning task) in two quantizations: Q4_K_M GGUF via llama.cpp, and 4-bit MLX via mlx_lm.

Every one of those ten runs verifies against the maintainer's published public key. The verification procedure works today. Any reader can clone the repository and prove the numbers came from the maintainer's key. That's the substrate.

What's Next

From substrate to public editorial cadence.

Phase 1.5

first editorial Track 1 article using the signed runs as substrate

Phase 1.6

public website at siliconlogic.dev

Phase 1.7

Windows + RTX 5080 second hardware reference platform

Phase 2 onward

weekly Track 1 cadence, Track 2 launches at 2.5+

The five-year ambition is to become the publication readers reach for when they need a number they can defend. Not the biggest. Not the loudest. The most cited when methodology matters.

Stack

Built as a publication substrate, not a one-off benchmark script.

The system spans web, backend, schemas, inference runtimes, signing, and distribution so the editorial archive can be verified by humans and programs.

Languages: TypeScript (frontend/MCP), Python 3.12 (backend)
Monorepo: Turborepo + pnpm + uv
Database: Postgres 17 on Neon (us-east-1)
Schemas: Drizzle ORM (SQL), Pydantic v2 (Python), Zod (TypeScript), synchronized via CI
Frontend: Next.js 15 with Tailwind 4
Inference runtimes: llama.cpp (GGUF), mlx_lm (Apple Silicon MLX)
Signing: Ed25519 via Python cryptography library
Distribution: Open repository on GitHub, planned MCP server for programmatic access
Reference hardware: MacBook Pro M5 Max 36GB (Apple Silicon), Windows + RTX 5080 (planned)

Specs At A Glance

The editorial operating surface.

Silicon Logic's public promise is narrow enough to verify: signed benchmark runs, explicit methodology versioning, defined trials, median aggregation, and track-specific editorial cadence.

Verification

How to verify any signed run

Every published BenchmarkRun in the Silicon Logic archive includes an Ed25519 signature and the signer's public key. The repository commits the maintainer's public key and its SHA-256 fingerprint, allowing any third party to verify a published benchmark independently. The full verification procedure is documented in the repository at data/runs/README.md.

View the signing module at github.com/Vargix/silicon-logic. For verification questions, methodology disputes, or hardware coverage suggestions, see contact below.

View repository Ask a question

Launch

Want to follow Silicon Logic's launch?

Silicon Logic launches publicly in Phase 1.6 with the first editorial Track 1 article. For early access, methodology questions, or technical collaboration on the publication substrate, reach out below.

Email See Project 01 → APRUSS