Tokens & Signals · Friday, March 13, 2026

GPT-5.4: OpenAI’s Agentic Pivot

claude-4.6gpt-5.4claude-opus-4.6claude-sonnet-4.6gemini-3.1-pro-previewgpt-oss-20baletheiaanthropicopenaishopifyxaiperplexitysamsunggooglenvidiasakana-ailong-contextcomputer-usecoding-agentsautoresearchmathematical-reasoninggpu-rentalconsumer-retentionos-integrationcompute-bottlenecktobi-lutkeandrej-karpathyelon-muskbaris-akissimonwbirdaboapoorv03cryptopunk7213
Tokens & Signals for 3/13/2026. We scanned ~605 Twitter accounts, 13 subreddits (0 posts), Hacker News (9 stories), 10 newsletters, 10 podcasts, and leaderboard data for you. Estimated reading time saved: ~24 hours.

TLDR

  • Anthropic killed the long-context tax: Claude 4.6 now offers a 1M token window at standard pricing, no more premium multipliers. x.com/birdabo/status/2032516870864253441
  • OpenAI released GPT-5.4, which is currently dominating coding benchmarks and natively handles complex "computer use" tasks. x.com/OpenAIDevs/status/2032209975280533676
  • Tobi Lütke used Andrej Karpathy's "autoresearch" pattern to squeeze a 53% performance gain out of Shopify's core template engine. x.com/simonw/status/2032305546129485851
  • Elon Musk is personally re-hiring xAI candidates they previously rejected, admitting the company "was not built correctly the first time." x.com/elonmusk/status/2032341856944865487
  • Perplexity's "Computer" agent is now on iOS and baked into the Samsung Galaxy S26 as a system-level assistant. x.com/perplexity_ai/status/2032494752642568417
  • GPU rental prices are climbing again as compute hits a bottleneck — the buyer's market for startups is over. x.com/SemiAnalysis_/status/2032532352069472286
  • ChatGPT's retention is kind of insane: 71% of users are still active at Month 10, leaving every other consumer AI app in the dust. x.com/apoorv03/status/2032532956829593977

  • Best to Build With Today

  • CodingGPT-5.4 (xhigh): Now leads both Artificial Analysis and LiveBench for agentic coding.
  • ReasoningClaude Opus 4.6 (thinking-auto): Takes the #1 spot on LiveBench Reasoning (88.7) for deep, multi-step logic.
  • General ChatGemini 3.1 Pro Preview: Still the Chatbot Arena champion for overall helpfulness and creative writing.
  • Open-sourcegpt-oss-20B: The current king of price-performance for local inference.

  • Deeper Dives

    🚀 Products & Launches

    Anthropic brings 1M context to Claude 4.6 at standard pricing

    Anthropic has dropped the "long-context premium" for Claude Opus 4.6 and Sonnet 4.6. The full 1 million token window now runs at standard API rates — $5/$25 per million tokens for Opus and $3/$15 for Sonnet — with zero usage multipliers.

    Why it matters: It knocks out the biggest economic barrier to doing serious long-context work, like refactoring an entire codebase in one shot.

    � Twitter

    OpenAI drops GPT-5.4 with a focus on coding

    GPT-5.4 is built for professional workflows, hitting a 75% success rate on OSWorld-Verified benchmarks for computer use. It's also significantly more accurate than its predecessor — 33% fewer false claims — though it does tend to be a bit wordier.

    Why it matters: It's a direct, high-performance shot across the bow at the specialized agent tools gaining ground in the coding space.

    � Twitter

    Perplexity Computer hits mobile

    Perplexity's "Computer" feature — which orchestrates 19 models in parallel for agentic tasks — is coming to iOS. They've also locked in a big partnership with Samsung, landing the app as a system-level assistant on the Galaxy S26 to replace Bixby.

    Why it matters: AI agents are escaping the browser and becoming actual operating system infrastructure.

    � Twitter

    🧠 Models & Research

    Shopify's 53% performance boost with 'autoresearch'

    Tobi Lütke took Andrej Karpathy's 'autoresearch' pattern — where agents iteratively code, test, and improve — and pointed it at Shopify's Liquid template engine. After 120 automated experiments and nearly 1,000 unit tests, they pulled out a 53% performance gain.

    Why it matters: This puts to rest the idea that coding agents are only good for toy projects. They can dig into production-grade, legacy code and actually ship improvements.

    � Twitter

    Google researchers release Aletheia

    Google researchers introduced Aletheia, a model-agnostic framework designed to improve automated mathematical reasoning through iterative proof verification.

    Why it matters: It gives models a way to "self-correct" mid-reasoning on complex logical chains, which means less hand-holding from humans.

    � Twitter

    💼 Industry & Business

    xAI is hiring back its rejects

    Elon Musk and head of HR Baris Akis are going back through old xAI interview records and reaching out to high-potential candidates they previously passed on. Musk straight-up admitted the company wasn't built correctly the first time and is essentially resetting their entire talent strategy.

    Why it matters: It's a surprisingly candid admission of struggle from a lab that's supposed to be in full sprint mode.

    � Twitter� Hacker News

    NVIDIA GPU rental prices are rising

    After a stretch of relatively affordable compute, rental prices for NVIDIA GPUs are creeping back up as demand outpaces supply again.

    Why it matters: A tighter compute market could pump the brakes on the pace of innovation for startups that depend on public cloud infrastructure.

    � Twitter

    Sakana AI secures Ministry of Defense contract

    Japanese firm Sakana AI has landed a research contract with Japan's Ministry of Defense.

    Why it matters: Another data point in the very clear trend of national defense agencies moving aggressively into domestic AI research partnerships.

    � Twitter


    Launches

  • Claude 4.6 — 1M context is now standard pricing for Opus and Sonnet models.
  • GPT-5.4 — OpenAI's newest frontier model with native computer use.
  • Perplexity Computer Mobile — Agentic multi-model orchestration now on iOS.

  • AI Twitter Recap

  • @simonw on the Shopify optimization: "It's wild to see an agent run 120 experiments on a massive production codebase and actually ship a 53% speedup." x.com/simonw/status/2032305546129485851
  • @elonmusk on xAI's talent reset: "We are reviewing every past candidate that didn't make the cut. If you were smart and we missed you, we're coming back around." x.com/elonmusk/status/2032341856944865487
  • @birdabo on Claude's new pricing: "Anthropic just effectively slashed prices for anyone building on long-context. No more multipliers, just standard rates." x.com/birdabo/status/2032516870864253441
  • @apoorv03 on retention: "ChatGPT's 71% retention at 10 months isn't just good, it's basically the gold standard for consumer AI apps." x.com/apoorv03/status/2032532956829593977
  • @cryptopunk7213 on GPU myths: "You don't always need the latest hardware; better models are making 3-year-old GPUs perform like new machines." x.com/cryptopunk7213/status/2032530622476918827
  • Closing thought: Today felt like the day the "wrapper" disappeared. Whether it's Anthropic making million-token contexts cheap enough to throw at entire codebases, or Perplexity living inside your phone's OS, the infrastructure is finally getting out of the way of the actual work.