Tokens & Signals · Friday, March 27, 2026

Claude Mythos: The Next Leap in Reasoning

claude-mythosopus-4.6codexarc-agi-3glm-5.1claude-opus-4.5hermes-agentgpt-5.2-codexclaude-opus-4-6-thinking-autogemini-3.1-pro-previewgpt-oss-20banthropicopenaigooglezhipu-ainous-researchperplexitysamsungcoding-agentsinference-optimizationmodel-benchmarkingopen-weightsjailbreakingdata-centersvibe-codingreasoningquantizationfchollettekniumamasadkarpathy
Tokens & Signals for 3/27/2026. We scanned ~1,200 Twitter accounts (1187 tweets), 13 subreddits (59 posts), Hacker News (9 stories), 4 newsletter posts, 4 podcast episodes, 282 Discord messages, and leaderboard data for you. Estimated reading time saved: ~12 hours.

TLDR & AI Twitter Recap

* Anthropic's "Claude Mythos" Leaked: Internal docs confirm a new tier beyond Opus 4.6, promising a "step-change" in reasoning and coding. Q3 2026 is the target. x.com/birdabo/status/2037390154038714384

* OpenAI's Codex Pivot: Codex (v0.117.0) now has plugin support for Slack, Notion, and Gmail — turning it into a full-blown agent-capable automation engine. developers.openai.com/codex/plugins

* Google's Efficiency Win: The TurboQuant algorithm just dropped, cutting LLM memory usage by 6x and pushing inference speeds 8x faster. x.com/jukan05/status/2037317421389037723

* ARC-AGI-3 Reality Check: Frontier models can't crack 30% on this contamination-proof visual reasoning benchmark. Pattern matching is not reasoning. x.com/fchollet/status/2037330635459842234

* GLM-5.1 Released: Zhipu AI's new open-weights powerhouse is hitting 77.8 on SWE-bench-Verified, putting it right in Claude Opus 4.5 territory. x.com/kimmonismus/status/2037507667732709392

* Google's $5B Bet: Google is financing a massive 1GW+ Texas data center built specifically to fuel Anthropic's training compute. x.com/ns123abc/status/2037607841867903062

* Pentagon Injunction: A federal judge blocked the DoD's "supply chain risk" label on Anthropic, keeping their path to government contracts wide open. x.com/adcock_brett/status/2037559392858722789

* GODMODE Jailbreak: Nous Research added an auto-jailbreaking skill to Hermes Agent. The safety arms race just escalated. x.com/Teknium/status/2037284871513768344

* Perplexity x Samsung: Perplexity is now the brains behind "Browsing Assist" on millions of Samsung devices. x.com/perplexity_ai/status/2037556796139921847

* @amasad on "vibe coding": "The divide between using AI for 100x efficiency and just generating 'coding slop' is the defining engineering debate of the moment." x.com/amasad/status/2037275196240052724

* @karpathy on AI dev: "Software is eating the world, and now English is eating software."


Go deeper on what matters to you

Tap to expand

Best to Build With Today

* Codinggpt-5.2-codex (83.6 LiveBench) is the current gold standard.

* Reasoningclaude-opus-4-6-thinking-auto leads with 88.7 on LiveBench.

* Chatgemini-3.1-pro-preview is the top-ranked assistant for general conversation.

* Open-sourceGLM-5.1 (Zhipu AI) is the new performance leader for coding tasks.

* Value pickgpt-oss-20B ($0.1/M tokens) remains the smartest choice for high-volume, low-stakes workloads.


Deeper Dives

🧠 Models & Research

Anthropic's "Claude Mythos" Leak

Anthropic accidentally exposed internal docs for "Claude Mythos," a new model tier built for multi-step logical planning. It's slated to succeed Opus 4.6 with serious gains in cybersecurity and reasoning. Anthropic confirmed it's in testing but is sitting on the benchmarks for now.

� Twitter� Reddit

Google TurboQuant Algorithm

Google's new quantization algorithm cuts KV cache memory by 6x. Dynamic quantization does the heavy lifting, pushing LLM inference speeds up by 8x. Big news for anyone trying to run large models on hardware that isn't a data center.

� Reddit� Hacker News

ARC-AGI-3 Benchmark Results

Francois Chollet's new benchmark throws abstract grid-world puzzles at models to test reasoning without any language pattern crutches. Frontier models are stuck below 30%, which is a pretty damning reminder that today's "AI" is still sophisticated pattern matching — nothing more.

� Twitter� Reddit� Hacker News

Zhipu AI's GLM-5.1

Zhipu AI shipped GLM-5.1, an open-weights model claiming coding parity with Claude Opus 4.5. It scored 77.8 on SWE-bench-Verified, and honestly, it's hard to argue with the numbers — open-source is closing the gap fast.

� Twitter� Reddit

🚀 Products & Launches

OpenAI Codex Plugin Support

Codex now plays nicely with Slack, Figma, and Notion via plugins. That upgrade takes it from a smart autocomplete tool to an agent that can handle full end-to-end workflows. Enterprise beta is live now.

� Twitter

Perplexity for Samsung

Perplexity just became the default AI engine powering Samsung's "Browsing Assist." That's a massive distribution win — their search-based AI is now in front of millions of Galaxy and Windows users out of the box.

� Twitter

💼 Industry & Business

Google Finances $5B Texas Data Center for Anthropic

Google is putting $5 billion into a 1GW+ data center in Abilene, Texas. The facility gets leased to Anthropic, handing them dedicated training capacity without the upfront pain of buying all that hardware themselves.

� Twitter

Court Halts Pentagon's Anthropic Blacklist

A federal judge shut down the Pentagon's attempt to label Anthropic a "supply chain risk," citing a lack of transparency in the process. Anthropic can keep competing for government contracts — for now.

� Twitter� Reddit


Funding & Deals

* Google & Anthropic: $5B infrastructure financing package for a dedicated 1GW+ data center in Abilene, Texas, to scale future training efforts.


Launches

* GLM-5.1: Zhipu AI's high-performance open-weights coding model.

* Codex Plugins (v0.117.0): Native integration for Slack, Figma, Notion, and Gmail.

* TurboQuant: Google's new memory compression algorithm for LLMs.


Closing thought: Between the ARC-AGI-3 results and the "vibe coding" debates, it feels like we're finally moving past the marketing hype and actually starting to measure what these models can really do — for better or worse.