Tokens & Signals

Tokens & Signals for 3/13/2026. We scanned ~605 Twitter accounts, 13 subreddits (0 posts), Hacker News (9 stories), 10 newsletters, 10 podcasts, and leaderboard data for you. Estimated reading time saved: ~24 hours.

TLDR

Anthropic killed the long-context tax: Claude 4.6 now offers a 1M token window at standard pricing, no more premium multipliers. x.com/birdabo/status/2032516870864253441

OpenAI released GPT-5.4, which is currently dominating coding benchmarks and natively handles complex "computer use" tasks. x.com/OpenAIDevs/status/2032209975280533676

Tobi Lütke used Andrej Karpathy's "autoresearch" pattern to squeeze a 53% performance gain out of Shopify's core template engine. x.com/simonw/status/2032305546129485851

Elon Musk is personally re-hiring xAI candidates they previously rejected, admitting the company "was not built correctly the first time." x.com/elonmusk/status/2032341856944865487

Perplexity's "Computer" agent is now on iOS and baked into the Samsung Galaxy S26 as a system-level assistant. x.com/perplexity_ai/status/2032494752642568417

GPU rental prices are climbing again as compute hits a bottleneck — the buyer's market for startups is over. x.com/SemiAnalysis_/status/2032532352069472286

ChatGPT's retention is kind of insane: 71% of users are still active at Month 10, leaving every other consumer AI app in the dust. x.com/apoorv03/status/2032532956829593977

Best to Build With Today

Coding — GPT-5.4 (xhigh): Now leads both Artificial Analysis and LiveBench for agentic coding.

Reasoning — Claude Opus 4.6 (thinking-auto): Takes the #1 spot on LiveBench Reasoning (88.7) for deep, multi-step logic.

General Chat — Gemini 3.1 Pro Preview: Still the Chatbot Arena champion for overall helpfulness and creative writing.

Open-source — gpt-oss-20B: The current king of price-performance for local inference.

Deeper Dives

🚀 Products & Launches

Anthropic brings 1M context to Claude 4.6 at standard pricing

Anthropic has dropped the "long-context premium" for Claude Opus 4.6 and Sonnet 4.6. The full 1 million token window now runs at standard API rates — $5/$25 per million tokens for Opus and $3/$15 for Sonnet — with zero usage multipliers.

Why it matters: It knocks out the biggest economic barrier to doing serious long-context work, like refactoring an entire codebase in one shot.

� Twitter

OpenAI drops GPT-5.4 with a focus on coding

GPT-5.4 is built for professional workflows, hitting a 75% success rate on OSWorld-Verified benchmarks for computer use. It's also significantly more accurate than its predecessor — 33% fewer false claims — though it does tend to be a bit wordier.

Why it matters: It's a direct, high-performance shot across the bow at the specialized agent tools gaining ground in the coding space.

� Twitter

Perplexity Computer hits mobile

Perplexity's "Computer" feature — which orchestrates 19 models in parallel for agentic tasks — is coming to iOS. They've also locked in a big partnership with Samsung, landing the app as a system-level assistant on the Galaxy S26 to replace Bixby.

Why it matters: AI agents are escaping the browser and becoming actual operating system infrastructure.

� Twitter

🧠 Models & Research

Shopify's 53% performance boost with 'autoresearch'

Tobi Lütke took Andrej Karpathy's 'autoresearch' pattern — where agents iteratively code, test, and improve — and pointed it at Shopify's Liquid template engine. After 120 automated experiments and nearly 1,000 unit tests, they pulled out a 53% performance gain.

Why it matters: This puts to rest the idea that coding agents are only good for toy projects. They can dig into production-grade, legacy code and actually ship improvements.

� Twitter

Google researchers release Aletheia

Google researchers introduced Aletheia, a model-agnostic framework designed to improve automated mathematical reasoning through iterative proof verification.

Why it matters: It gives models a way to "self-correct" mid-reasoning on complex logical chains, which means less hand-holding from humans.

� Twitter

💼 Industry & Business

xAI is hiring back its rejects

Elon Musk and head of HR Baris Akis are going back through old xAI interview records and reaching out to high-potential candidates they previously passed on. Musk straight-up admitted the company wasn't built correctly the first time and is essentially resetting their entire talent strategy.

Why it matters: It's a surprisingly candid admission of struggle from a lab that's supposed to be in full sprint mode.

� Twitter� Hacker News

NVIDIA GPU rental prices are rising

After a stretch of relatively affordable compute, rental prices for NVIDIA GPUs are creeping back up as demand outpaces supply again.

Why it matters: A tighter compute market could pump the brakes on the pace of innovation for startups that depend on public cloud infrastructure.

� Twitter

Sakana AI secures Ministry of Defense contract

Japanese firm Sakana AI has landed a research contract with Japan's Ministry of Defense.

Why it matters: Another data point in the very clear trend of national defense agencies moving aggressively into domestic AI research partnerships.

� Twitter

Launches

Claude 4.6 — 1M context is now standard pricing for Opus and Sonnet models.

GPT-5.4 — OpenAI's newest frontier model with native computer use.

Perplexity Computer Mobile — Agentic multi-model orchestration now on iOS.

AI Twitter Recap

@simonw on the Shopify optimization: "It's wild to see an agent run 120 experiments on a massive production codebase and actually ship a 53% speedup." x.com/simonw/status/2032305546129485851

@elonmusk on xAI's talent reset: "We are reviewing every past candidate that didn't make the cut. If you were smart and we missed you, we're coming back around." x.com/elonmusk/status/2032341856944865487

@birdabo on Claude's new pricing: "Anthropic just effectively slashed prices for anyone building on long-context. No more multipliers, just standard rates." x.com/birdabo/status/2032516870864253441

@apoorv03 on retention: "ChatGPT's 71% retention at 10 months isn't just good, it's basically the gold standard for consumer AI apps." x.com/apoorv03/status/2032532956829593977

@cryptopunk7213 on GPU myths: "You don't always need the latest hardware; better models are making 3-year-old GPUs perform like new machines." x.com/cryptopunk7213/status/2032530622476918827

Closing thought: Today felt like the day the "wrapper" disappeared. Whether it's Anthropic making million-token contexts cheap enough to throw at entire codebases, or Perplexity living inside your phone's OS, the infrastructure is finally getting out of the way of the actual work.