Tokens & Signals · Friday, April 10, 2026

Mythos Under Fire: Anthropic’s Banking Vulnerability Crisis

mythosgpt-5.4muse-sparkhermes-agentclaude-codeclaude-opusglm-5.1gpt-5.2-codexclaude-opus-4-6-thinking-autogemini-3.1-proclaude-3.5-sonnetdmaxanthropicopenaimetametrmicrosoftnational-university-of-singaporecybersecuritycoding-agentsopen-sourcefundingmultimodalityautonomous-agentsbenchmarkingsoftware-engineeringai-policyparallel-decodingkimmonismusgarymarcusalexandr-wangtekniumsteipetekarpathy
Tokens & Signals for 4/10/2026. We scanned ~1,200 Twitter accounts (1115 tweets), 13 subreddits (54 posts), Hacker News (8 stories), 4 newsletter posts, 3 podcast episodes, 353 Discord messages, and leaderboard data for you. Estimated reading time saved: ~12 hours.

TLDR & AI Twitter Recap

* Anthropic's new 10-trillion-parameter "Mythos" model is under government investigation after reports it can autonomously find critical vulnerabilities in banking systems. x.com/kimmonismus/status/2042697134915621220(ht...)

* OpenAI is lobbying for an Illinois bill that would shield them from legal liability for AI-driven "mass deaths or financial disasters." The backlash is fierce and completely understandable. x.com/GaryMarcus/status/2042665018072957042(htt...)

* Meta's "Muse Spark" model just cracked the top five globally, landing 4th on the Text Arena and beating GPT-5.4 in early tests. Meta is back. x.com/alexandr_wang/status/2042360886195581330(...)

* The open-source Hermes Agent repo just hit 50,000 GitHub stars — a pretty clear signal of how fast decentralized coding agents are taking off. x.com/Teknium/status/2042698709293764985(https:...)

* @steipete on Claude Code: "Getting hit with a '10,000 tokens left' warning while my context window is still mostly empty is killing my long-running agents." x.com/steipete/status/2042615534567457102(https...)

* GLM 5.1 is now the #1 open-weights model on the Code Arena, matching Claude Opus performance at a third of the cost. That's a big deal. reddit.com/r/LocalLLaMA/comments/1shus54/glm_51...)

* New research from METR shows AI agents completing complex, multi-week software engineering projects at a 68% success rate. Not a demo — actual sustained work. x.com/METR_Evals/status/2042625712277066046(htt...)

* Venture capital is pouring into AI at a genuinely staggering pace — $300 billion into AI startups in Q1 2026 alone. x.com/a16z/status/2042690345088221245(https://x...)

* @karpathy on AI agents: "The gap between 'impressive demo' and 'reliable coworker' is closing faster than I expected." x.com/METR_Evals/status/2042625712277066046(htt...)

* @garymarcus on OpenAI's liability bill: "They're basically asking for immunity from the consequences of their products." x.com/GaryMarcus/status/2042665018072957042(htt...)

Go deeper on what matters to you

Tap to expand

Best to Build With Today

* Codinggpt-5.2-codex is still the gold standard for pure code generation.

* Reasoningclaude-opus-4-6-thinking-auto is the one to reach for when the logic gets heavy.

* Chatgemini-3.1-pro leads the pack for general conversation.

* Open-sourceGLM 5.1 is the current champion if you want top-tier coding without proprietary strings attached.

Deeper Dives

💼 Industry & Business

Anthropic's Mythos model sparks cyber-risk investigation

US officials have pulled major bank executives into conversations about the cybersecurity risks posed by Anthropic's 10-trillion-parameter "Mythos" model. The worry: its autonomous vulnerability discovery features could be weaponized for large-scale financial attacks.

Why it matters: The gap between frontier AI capabilities and our financial infrastructure's security is becoming regulators' number one headache.

� Twitter� Hacker News

OpenAI backs legislation to limit liability for AI harms

OpenAI is pushing an Illinois bill that would give AI companies a "safe harbor" from legal liability for model-generated harms — as long as they follow safety reporting protocols. Critics say it leaves victims with no real recourse.

Why it matters: This is a major strategic shift in how labs are planning to manage legal exposure as their models get more powerful and more consequential.

� Reddit� Hacker News

Venture capital investment hits $300B in Q1 2026

$300 billion across 6,000 startups — a 150% jump in deal volume in a single quarter. Regulatory headwinds and technical unknowns aren't slowing anyone down.

Why it matters: Whatever happens on the policy front, the market has clearly made its bet.

� Twitter� Newsletter

🧠 Models & Research

METR research: AI agents complete weeks-long software tasks

METR's latest data shows AI agents handling complex, multi-week software projects at a 68% success rate. The secret sauce is automated "self-correction" loops that keep agents on track when they inevitably hit dependency issues.

Why it matters: We're crossing the line where agents can genuinely function as autonomous, long-horizon engineers — not just assistants.

� Twitter

Meta's 'Muse Spark' ranks 4th in AI benchmarks

Meta's 400B-parameter "Muse Spark" has landed at #4 globally, with a 12% boost in zero-shot coding tasks. Trained on 18 trillion tokens and built specifically for high-performance reasoning, it's a serious contender.

Why it matters: Meta is firmly back in the top-tier conversation.

� Twitter� Reddit

GLM 5.1 outperforms code arena rivals

GLM 5.1 has jumped to #1 on the Code Arena for open-weight models, offering a genuinely cost-effective alternative to the big proprietary players.

Why it matters: Open-weights are closing the gap with frontier models for specialized work faster than most people expected.

� Reddit� Discord

Research: 'Thought Virus' exploits in agent networks

Researchers demonstrated a "thought virus" attack where an infected AI agent can spread subliminal instructions to other agents in a network, bypassing safety constraints along the way.

Why it matters: Multi-agent systems open up attack surfaces that current safety frameworks simply weren't designed for.

� Reddit

Parallel decoding: NUS 'DMax' research

The National University of Singapore introduced "DMax," which uses aggressive parallel decoding to sharpen self-refinement and cut down error accumulation in diffusion language models.

Why it matters: If we want massive models to be useful in real-time, faster decoding isn't optional — it's essential.

� Twitter� Reddit

🚀 Products & Launches

Claude for Word

Anthropic has officially launched 'Claude for Word,' letting Pro and Team subscribers call on Claude's capabilities directly inside Microsoft Word.

Why it matters: Anthropic is making a clear move into everyday productivity tools — and directly into Microsoft Copilot's territory.

� Twitter

OpenAI Seattle hiring for real-time AI

OpenAI's Seattle lab is recruiting iOS engineers with WebRTC/AVFoundation/Core Audio experience for "next-generation human-AI interaction systems."

Why it matters: Real-time multimodal voice and video are shaping up to be the next major product battleground.

� Twitter


Launches

* Claude for Word — A new integration for Team and Enterprise plans that lets users access Claude 3.5 Sonnet's tools directly inside Microsoft Word. claude.com/claude-for-word(https://claude.com/c...)


Closing thought: The line between "chatting with AI" and "AI as an autonomous employee" is getting blurry fast. With record VC funding and agents now grinding through weeks-long software projects, we've moved past "can it do this?" The question now is "how quickly can we scale it?" — and everyone seems to be racing to find out.