Tokens & Signals · Friday, May 8, 2026

GPT-5.5-Cyber: OpenAI Pivots to Infrastructure Defense

gpt-5.5-cyberclaude-mythosgemini-3.1-progpt-4ogemma-4-26bgpt-5.2-codexclaude-opus-4-7-thinking-autogpt-5.5-xhighskymizer-htx301vllm-omni-v0.20.0petri-3.0openaigoogle-deepmindmozillaanthropicsakana-ainvidiadatabricksmeridian-labsbank-of-americamodel-benchmarkingchain-of-verificationsparse-transformersevolutionary-searchdata-agentsrlhfmulti-token-predictionair-gapped-aicybersecuritymodel-customizationgary marcuskarpathy
Tokens & Signals for 5/8/2026. We scanned ~1,200 Twitter accounts (1105 tweets), 13 subreddits (55 posts), Hacker News (8 stories), 4 newsletter posts, 4 podcast episodes, 184 Discord messages, and leaderboard data for you. Estimated reading time saved: ~10 hours.

TLDR & AI Twitter Recap

* OpenAI is sunsetting its fine-tuning API by Dec 31, 2026. They're pushing everyone toward the "Custom Model" program, which is basically a signal that prompt caching and RAG have won. reddit.com/r/OpenAI/comments/1t6sisf/openai_has...)

* DeepMind's new "AI Co-Mathematician" hit 72% on the FrontierMath benchmark — compare that to base Gemini 3.1 Pro's 19% — by using a "Chain-of-Verification" method to cross-check its own symbolic proofs. arxiv.org/abs/2605.06651(https://arxiv.org/abs/...)

* @GaryMarcus on market health: "The 'AI Big Ten' now accounts for 40% of the S&P 500—this feels like the Dot Com era all over again." x.com/GaryMarcus/status/2052774268463976935(htt...)

* OpenAI dropped GPT-5.5-Cyber, a model built specifically for critical infrastructure defense. It comes with a "Trusted Access" architecture so it can run in air-gapped environments. x.com/gdb/status/2052583338561683775(https://x....)

* Mozilla is using Anthropic's Claude Mythos to hunt zero-day bugs in Firefox, and internal reports suggest it's on track to cut browser security breaches by 40%. reddit.com/r/singularity/comments/1t6rmrz/firef...)

* Sakana AI and NVIDIA found a 4.5x inference speedup using sparse transformers discovered through evolutionary search — and they're barely giving up any accuracy. x.com/SakanaAILabs/status/2052787226136990029(h...)

* @karpathy on model customization: "RAG, caching, and prompt tuning win. Custom model programs are the new fine-tune."

* Databricks Genie hit 91.6% accuracy on data agent tasks, beating GPT-4o on complex joins by skipping RAG entirely and talking directly to database schema metadata. x.com/Yuchenj_UW/status/2052784305735397863(htt...)

* Anthropic donated "Petri" to Meridian Labs — it's an open-source alignment tool that spots bias in RLHF pipelines, with 15% better alignment consistency than what came before. anthropic.com/research/donating-open-source-pet...)

* Gemma 4 with Multi-Token Prediction is hitting 600 tokens/sec on RTX 5090s. Local 26B-class models just got a whole lot more fun to run. reddit.com/r/LocalLLaMA/comments/1t6se6r/multit...)

Go deeper on what matters to you

Tap to expand

Best to Build With Today

* Coding: gpt-5.2-codex (LiveBench leader)

* Reasoning: claude-opus-4-7-thinking-auto

* Math: gpt-5.5-xhigh

* General Chat: gemini-3.1-pro

* Local Inference: gemma-4-26b (with Multi-Token Prediction)

Deeper Dives

💼 Industry & Business

OpenAI Winding Down Fine-Tuning API

OpenAI is retiring its legacy Fine-Tuning API and nudging everyone toward the "Custom Model" program. Think of it as a shift from old-school fine-tuning to more modular, modern customization. You've got until December 31, 2026, to migrate your production workflows — so not urgent, but don't sleep on it.

Why it matters: Fine-tuning is quietly becoming a "pro" feature rather than something you just call from an API.

� Reddit

BofA: AI Concentration Concerns

Bank of America is flagging that the "AI Big Ten" now makes up 40% of the S&P 500. That kind of concentration is bringing out the Dot Com bubble comparisons, and raising real questions about whether massive AI capex can actually be sustained.

Why it matters: The gap between AI infrastructure spending and verifiable free cash flow is getting hard to ignore.

� Twitter

Mozilla Hardens Firefox with Claude Mythos

Mozilla patched a record number of bugs this April, and they're crediting the "Claude Mythos" engine for surfacing 271 vulnerabilities. The tool zeroes in on DOM-level execution patterns to catch zero-day exploits before they become someone's bad day.

Why it matters: This is a rare, concrete example of an AI model actually pulling its weight as a security researcher in production code — not just a demo.

� Reddit

🧠 Models & Research

DeepMind's 'AI Co-Mathematician'

This agent scored 72% on FrontierMath, a benchmark full of research-level problems that usually take humans weeks to crack. The secret sauce is a "Chain-of-Verification" approach that cross-checks its own symbolic reasoning as it goes.

Why it matters: It's a pretty strong argument that agentic scaffolding — not just throwing more parameters at the problem — is what unlocks real reasoning.

� Twitter

Sakana AI and NVIDIA Sparse Research

Using evolutionary search to find optimal sparsity patterns, researchers built models that run 4.5x faster at inference with only a 0.8% accuracy drop. That's not a rounding error — that's genuinely impressive.

Why it matters: Making giant models 4.5x faster without wrecking their quality might be the most practical path to sustainable AI at scale.

� Twitter

Databricks Genie Data Agents

Genie hit 91.6% accuracy on data tasks by ditching standard RAG and going straight to the database schema metadata. It beat GPT-4o by 12% on join-heavy analytical queries — the kind that usually make agents fall apart.

Why it matters: Getting reliable answers out of messy enterprise data is basically the final boss for agents. Genie is clearing the level.

� Twitter

🚀 Products & Launches

OpenAI Releases GPT-5.5-Cyber

Built for the people defending critical infrastructure, this one's less "general purpose chatbot" and more "serious security tool." The "Trusted Access" architecture lets it operate in air-gapped environments, so it can help find vulnerabilities without touching anything it shouldn't.

Why it matters: It's a clear sign that the era of one-model-for-everything is giving way to high-stakes, domain-specific tools.

� Twitter

Codex Chrome Integration

Codex now lives in your browser as a Chrome extension, handling multi-tab research, debugging web apps, and spinning up subagents for adversarial code reviews.

Why it matters: The agent is out of the IDE and into your actual workflow. That's a bigger shift than it sounds.

� Twitter� Hacker News

Launches

* Skymizer HTX301: A PCIe card packing 384GB of memory at 240W, purpose-built for running 700B models locally.

* vLLM-Omni v0.20.0: Adds multi-replica scaling and bumps throughput by 72% for voice-first pipelines on H20 GPUs.

* Petri 3.0: Anthropic's open-source RLHF alignment tool, now handed off to the community via Meridian Labs.

Closing thought: The gap between "chatting with AI" and "AI as an autonomous agent" is closing fast. Between specialized security models and research-level math solvers, we're moving into a world where AI doesn't just suggest code — it audits, defends, and discovers. That's a different category of tool entirely.