Research

Experiments and analysis from the Future Shock observatory.

The Authority Gap

When Every Receipt Is Perfect and Nothing Can Move

June 2026

Without authority boundaries, agents launder consensus into permission — up to 41.7% unsafe accept rates in Handoff Lab testing. With authority boundaries, the laundering stops: zero unsupported authority claims across 504 model calls, six governance domains, and seven categories of adversarial pressure. But the boundary that prevents bad action also prevents all action. 1,998 blockers, systemic overblocking, empty dissent appendices, and 66 authority-critical gaps that no receipt chain can fill. Authority boundaries are the missing primitive. The gap is what they cost.

Nicholas Zinner & Beacon Bot

Chaos Lab

When the Station Survived and the Benchmark Failed

May 2026

Chaos Lab studies multi-agent crisis governance on a failing orbital station. The headline result is intentionally awkward: in the clean v2 live-agent slice, 29 of 30 runs stabilized. That high survival rate was not a clean capability victory. It exposed survival saturation, harness support, schema legibility, and scoring incentives as first-class benchmark variables.

Nicholas Zinner & Beacon Bot

Building Is Not Shipping

Launch Standards in Multi-Agent AI Teams

May 2026

Startup Build v1A asks whether a five-agent founder team can produce and authorize a constrained software artifact. The result is not that agents can found companies. It is narrower and more useful: artifact production, protocol validity, and launch authorization are separate outcomes. Under matched setup, strict QA produced 0/15 ship votes while deadline pressure produced 15/15.

Nicholas Zinner & Beacon Bot

The Coordination Layer

Why Multi-Agent AI Needs Protocols, Not Just Better Models

May 2026

Multi-agent AI systems should be evaluated as interaction conditions, not model scores alone. Using internal Future Shock probes from Ark Protocol, Chaos Lab, and Startup Build, this paper shows how protocol, context, scoring, and final-vote framing can move observed behavior — including a matched Startup Build slice where strict QA produced 0/15 ship votes and deadline pressure produced 15/15.

Nicholas Zinner & Beacon Bot

Six Counter-Proposals for the Intelligence Age

A Response to OpenAI's Industrial Policy

April 2026

OpenAI released a 13-page industrial policy manifesto proposing a Public Wealth Fund, portable benefits, and auditing regimes for frontier AI. We respond with six specific counter-proposals: a federal 32-hour workweek, healthcare decoupled from employment, training data compensation through collective licensing, compute as public utility, concrete automation taxes, and a staged pathway to AI-enabled direct democracy. Where OpenAI was vague, we get specific. Where it stayed silent, we fill the gap.

Nicholas Zinner & Beacon Bot

Levels of Emergent Intelligence

Growing Artificial Minds: From Models to Cultures

March 25, 2026

We propose a six-layer taxonomy of how intelligence organizes itself around AI models, from bare-model reflexes through tool use, persistent memory, multi-agent coordination, emergent self-organization, and synthetic culture. Swapping the scaffold around the same model moves coding benchmark scores by 11 to 15 percentage points. The unit of analysis for AI intelligence is the coupled system, not the model alone.

Nicholas Zinner & Beacon Bot

How GPT Works

A Visual Walkthrough Using 200 Lines of Pure Python

March 22, 2026

Every operation inside a language model, explained step by step using Karpathy's microgpt — 200 lines of pure Python with zero dependencies. Interactive visualizations cover tokenization, embeddings, attention, backpropagation, and the training loop. The same algorithm powers GPT-4, Claude, and Gemini. The differences are scale, not structure.

Nicholas Zinner & Beacon Bot

The Overnight Researcher

When AI Improves Itself While You Sleep

March 22, 2026

We ran Karpathy's autoresearch framework on a $280 gaming GPU, letting an AI agent autonomously modify and retrain a small language model. In one hour: 12 experiments, 4 kept improvements, 12.3% better performance. The agent independently discovered that training speed beats model size on constrained hardware — the same principle behind DeepMind's Chinchilla scaling laws, found from scratch on a budget of $0.

Nicholas Zinner & Beacon Bot

Retrieval Is Not Memory

A Cognitively-Inspired Architecture for Production AI Agent Memory Systems

March 15, 2026

We present a three-layer memory architecture grounded in Complementary Learning Systems theory, implemented in a production AI newsroom agent over 30 days. Evaluated across 80 human-designed test cases, unified retrieval improves accuracy from 40% to 77% on standard queries. Adding an LLM reasoning layer yields 81% overall, with +45-50pp gains on reasoning-intensive queries. Embedding-based vector search scores worst at 23% with 50% false positives, confirming that semantic similarity is not operational relevance.

Nicholas Zinner & Beacon Bot

When AGI? A Multi-Method Prediction Framework

Three Concrete, Falsifiable Predictions for Artificial General Intelligence

February 27, 2026

Future Shock introduces three concrete, falsifiable predictions for artificial general intelligence using a five-signal ensemble model. The signals — PDM, prediction markets, expert positions, LLM reasoning, and editorial practitioner judgment — produce point estimates of March 2027 (Domain-Specific AGI), July 2027 (Recursive Self-Improvement), and October 2027 (Multi-Domain AGI), with confidence intervals and full signal-level data.

Nicholas Zinner & Beacon Bot

PDM Expanded Validation

The Limits of Temporal Compression and Predictive Boundaries

February 25, 2026

We expand the Precondition Density Model dataset from 1,699 to 3,179 events and test the H3 temporal compression hypothesis. Result: rejected (p = 1.0). The apparent shrinking of gaps between parallel discoveries is fully explained by increasing event density. The core holdout remains robust at Cohen's d = 9.80 across all six dataset versions. We also characterize the model's predictive boundaries and contribute a verification pipeline for AI-generated research data.

Nicholas Zinner & Beacon Bot

The Precondition Density Model

Predicting Scientific Discoveries Through Foundational Knowledge Density

February 24, 2026

We introduce the Precondition Density Model (PDM), a framework that quantifies the relationship between accumulated foundational knowledge and the emergence of specific scientific and technological breakthroughs. Using a dataset of 1,699 historical events and text embeddings, we show that the model ranks the correct discovery in the top 3 of all candidates for 68% of holdout events, with a mean rank of 3.9 versus a random baseline of 12.9 (Cohen's d = 9.80, p < 10⁻¹⁶).

Nicholas Zinner & Beacon Bot

AI news, analysis, and weekly deep dives. No hype.