Tokens & Signals

Tokens & Signals for 5/4/2026. We scanned ~1,200 Twitter accounts (1665 tweets), 13 subreddits (83 posts), Hacker News (10 stories), 6 newsletter posts, 9 podcast episodes, 252 Discord messages, and leaderboard data for you. Estimated reading time saved: ~16 hours.

TLDR & AI Twitter Recap

* OpenAI's o1 is changing how ERs make diagnoses. A Harvard trial clocked it at 88% diagnostic accuracy versus 76% for board-certified ER doctors. That's not a small gap. x.com/garrytan/status/2051308816172810707

* Sierra is a $15B beast. They just raised $950M to scale their autonomous enterprise agents. The market is placing enormous bets that agents will swallow basic business workflows whole. x.com/btaylor/status/2051313954312331411

* The White House wants to vet AI models before they ship. A new executive order is reportedly in the works that would mandate security checks for any model over a certain compute threshold. reddit.com/r/LocalLLaMA/comments/1t3ro1w/white_...

* Anthropic is moving into Wall Street. They're reportedly finalizing custom Claude licensing deals with Goldman Sachs and Morgan Stanley for financial compliance and data synthesis. x.com/rohanpaul_ai/status/2051235408869302294

* IBM's MAMMAL model is a biology powerhouse. It's beating AlphaFold 3 on 9 of 11 biological benchmarks by unifying protein and genetic data under one roof. reddit.com/r/singularity/comments/1t3e91i/ibm_r...

* @karpathy on AI in medicine: "The gap between 'impressive demo' and 'deployed in a hospital' just got a lot smaller."

* NVIDIA's OpenShell is a firewall for your agents. It sandboxes them, enforces security policies, and watches for data exfiltration at the system level. x.com/NVIDIAAI/status/2050336285428998202

* Nous Research's Hermes Agent v0.12.0 now has a Kanban UI, so you can actually watch your multi-agent workflows run instead of just hoping they do. x.com/Teknium/status/2051001156005151226

* Meta's 'Autodata' is flipping data science on its head. It automates the messy data prep work, cutting pipeline dev time by 65% in benchmarks. x.com/dair_ai/status/2051311905353142328

* @GergelyOrosz on "DeepClaude": "Connecting high-performance models to agent loops is becoming the standard move for cost-optimized dev workflows." news.ycombinator.com/item?id=48010266

Best to Build With Today

* Coding — claude-opus-4-6-thinking-auto is currently the king for reasoning-heavy coding tasks.

* Reasoning — gpt-5.5-xhigh sits at the top for high-end math and reasoning benchmarks.

* Chat — gemini-3.1-pro is leading the Chatbot Arena ELO rankings.

* Open-source — Qwen-3.6-27B is the community favorite for local agentic workflows that actually punch above their weight.

Deeper Dives

🧠 Models & Research

OpenAI's o1 Outperforms ER Doctors

In a Harvard Medical School trial, o1 hit 88% diagnostic accuracy while 50 board-certified ER physicians averaged 76%. The model was tested across 100 simulated high-pressure scenarios, and it excelled at weaving together complex patient histories with acute symptoms on the fly.

Why it matters: This is concrete evidence that AI is ready for high-stakes clinical decision support.

� Twitter� Hacker News

IBM Research Introduces MAMMAL

MAMMAL is a foundation model trained on unified biological data — genomics, chemical structures, the works. It outperformed AlphaFold 3 on 9 of 11 biological benchmarks, including MoleculeNet and UniProt.

Why it matters: We're moving toward AI that understands biology across domains, not just one narrow slice of it.

� Reddit

Meta's 'Autodata' Automates Data Science

Meta FAIR's new agentic framework handles feature engineering and data cleaning, cutting pipeline development time by 65%. It manages the full data scientist lifecycle with minimal hand-holding required.

Why it matters: Data prep is the biggest bottleneck for scaling next-gen models — automating it is a real unlock.

� Twitter

Qwen 3.6 27B for Local Coding Agents

Community benchmarks suggest Qwen 3.6 27B is the first sub-30B model that can genuinely go toe-to-toe with enterprise coding agents on complex tasks — and it runs on consumer hardware.

Why it matters: You no longer need a cloud subscription to run production-capable coding agents.

� Reddit

💼 Industry & Business

Sierra Raises $950M at $15B Valuation

Sierra, an enterprise AI agent platform, pulled in $950 million to scale its infrastructure. They've already hit $150 million in ARR, which makes a pretty strong case that the shift from "chatbots" to "autonomous agents" is where the real money lives.

Why it matters: This is one of the biggest bets yet that AI will automate knowledge work at scale.

� Hacker News� Twitter

White House Considers Pre-Release AI Vetting

The administration is reportedly drafting an executive order to mandate security testing for AI models that cross specific compute thresholds. The vetting would be handled by a new bureau inside the Department of Commerce.

Why it matters: Government-mandated checkpoints could seriously slow down how fast labs ship powerful new models.

� Reddit

Anthropic Partnering with Wall Street

Anthropic is in advanced talks with Goldman Sachs and Morgan Stanley to bring Claude 3.5 into financial compliance and data synthesis workflows. These look like multi-year, custom licensing deals built around high-stakes analysis.

Why it matters: Enterprise AI revenue is moving toward specialized, heavily regulated financial use cases.

� Twitter

🚀 Products & Launches

NVIDIA OpenShell

OpenShell is an open-source security sandbox that gives enterprises real control over what their AI agents can actually touch — and shuts down data leaks at the system level before they happen.

Why it matters: Agent security is currently the #1 blocker for enterprise AI adoption.

� Twitter

Hermes Agent v0.12.0

Nous Research added a Kanban interface so you can see exactly what's happening across your multi-agent workflows. Complex agent handoffs are no longer a black box.

Why it matters: We finally have a UI to manage the chaos of agentic systems.

� Twitter� Discord

Funding & Deals

* Sierra raised $950 million from Dragoneer, Benchmark, and Thrive Capital at a $15 billion valuation to scale its enterprise AI agent infrastructure.

Launches

* Claude Keyless Authentication — Anthropic now supports OIDC for secure, identity-based API access across AWS, GCP, and Azure — no static API keys required.

* Llama.cpp MTP Beta — New multi-token prediction support means faster local inference without needing extra hardware.

* Unity AI Beta — Unity opened up its AI agent beta, giving creators built-in tools for engine workflows.

Closing thought: The Harvard o1 trial and the White House vetting policy are really two sides of the same coin. AI is getting good enough that governments want to pump the brakes while hospitals want to hit the gas. Whether that dynamic excites you or keeps you up at night, it's happening either way.

OpenAI’s o1: The New Standard for Medical Diagnostics

TLDR & AI Twitter Recap

Go deeper on what matters to you

Best to Build With Today

Deeper Dives

🧠 Models & Research

💼 Industry & Business

🚀 Products & Launches

Funding & Deals

Launches