Tokens & Signals · Monday, June 8, 2026

Apple Intelligence: The Gemini-Powered Siri Arrives

geminihermes-agentgemma-4-mtpclaude-opus-4-6-thinking-autoclaude-opus-4-8-xhigh-effortgemini-3.1-proideogram-4llama.cppapplegooglexiaomiopenaianthropicmicrosoftsakana ainousresearchagent-loopsmulti-token-predictionon-device-aiopen-sourcebioweapons-safetyquantizationmixture-of-expertsrecursive-self-improvementhbm4karpathytekniumgabriel almon
Tokens & Signals for 6/8/2026. We scanned ~1,200 Twitter accounts (1472 tweets), 13 subreddits (75 posts), Hacker News (10 stories), 6 newsletter posts, 9 podcast episodes, 221 Discord messages, and leaderboard data for you. Estimated reading time saved: ~15 hours.

TLDR & AI Twitter Recap

* Apple Intelligence is official: WWDC 2026 debuted a rebuilt Siri powered by a custom 1.2 trillion parameter Google Gemini model, though the EU launch is blocked due to DMA conflicts. x.com/kimmonismus/status/2063583457612337399

* Xiaomi's speed test: The MiMo team claims they've hit over 1,000 tokens-per-second on a 1 trillion parameter MoE model using only 8 GPUs. news.ycombinator.com/item?id=48446639

* Safety summit: CEOs from OpenAI, Anthropic, and Microsoft briefed Congress on the risks of AI being used to design bioweapons, pushing for federal pre-deployment testing mandates. reddit.com/r/OpenAI/comments/1typovl/ai_ceos_fr...

* Agent loops vs. prompts: We're moving away from single-shot prompting toward persistent agent loops that handle their own discovery, execution, and verification. x.com/shannholmberg/status/2063924108535197842

* @karpathy on the shift to loops: "The prompt is dead. Long live the loop."

* @Teknium on Hermes Agent: "Community contribution is the new product moat. Watching the Hermes ecosystem overtake legacy tools is a testament to open-source agent harnesses." x.com/Teknium/status/2064060667636908530

* Gemma 4 MTP landed: It's now in llama.cpp with support for multi-token prediction, giving us 15% faster local inference on consumer hardware. github.com/ggml-org/llama.cpp/pull/23398

* NotebookLM's new muscles: Google's research tool now supports agentic reasoning, web search, and Excel exports for AI Ultra subscribers. x.com/NotebookLM/status/2064016460964585549

* Gabriel Almon's exit: He just left OpenAI to start a new venture, citing "one last product to build before AGI." x.com/gabriel1/status/2063698980324737381


Go deeper on what matters to you

Tap to expand

Best to Build With Today

* Codingclaude-opus-4-6-thinking-auto is currently the top choice for complex, multi-file software engineering.

* Reasoningclaude-opus-4-8-xhigh-effort is leading the pack in logic and reasoning benchmarks.

* Chatgemini-3.1-pro remains the most consistent assistant for general-purpose tasks.

* Image generationIdeogram 4 is the community favorite for character consistency and IP knowledge without needing extra LoRAs.

* Open-sourcegemma-4-mtp via llama.cpp is the go-to for high-performance local inference.


Deeper Dives

🚀 Products & Launches

Apple Intelligence Unveiled

Apple has launched "Apple Intelligence," a ground-up rebuild of Siri baked into iOS 20 and macOS 17. It runs on a 1.2 trillion parameter custom Gemini model licensed from Google, with a three-tier architecture: on-device for basic tasks, Private Cloud Compute for the middle ground, and Google's servers for heavy lifting. Privacy is still the headline, with stateless processing throughout.

Why it matters: This is Apple's biggest AI swing in 15 years — and the fact that they licensed the core model from Google tells you everything about how hard this stuff is to build from scratch.

� Twitter� Hacker News

Google NotebookLM Agentic Upgrades

NotebookLM just got a serious upgrade, gaining the ability to execute code, search the web, and spit out structured documents like Excel sheets. It now shows its thinking steps and can handle complex research tasks end to end.

Why it matters: It's no longer just a glorified summarizer. NotebookLM can now go out, do the research, and build knowledge repositories on its own — which is a pretty different product.

� Twitter

🧠 Models & Research

Xiaomi MiMo UltraSpeed Benchmark

Xiaomi's MiMo team just clocked a 1 trillion parameter Mixture-of-Experts model at 1,000+ tokens-per-second on plain 8-GPU servers. The trick: FP4 quantization combined with "DFlash" speculative decoding, routing tokens through just 0.5% of active parameters at a time.

Why it matters: Pulling this off on non-specialized hardware could completely change the economics of running massive models in real-time applications.

� Reddit� Hacker News

Transition from prompted agents to agent loops

The industry is moving away from static prompted agents — where an LLM just responds to a single sequence — toward agent loops that maintain persistent state and keep iterating on their own work. The practical upside: roughly 40% fewer hallucinations on complex, long-horizon tasks.

Why it matters: Agents that can catch and fix their own mistakes are what finally gets us past "smart autocomplete."

� Twitter

Sakana AI launches Recursive Self-Improvement Lab

Sakana AI's RSI Lab introduces a framework where models autonomously generate, test, and refine their own training data using a generative feedback loop to improve their own architecture.

Why it matters: Moving from human-led R&D to models improving themselves is one of those milestones that sounds abstract until it isn't.

� Twitter

💼 Industry & Business

OpenAI, Anthropic, and Microsoft warn Congress

The three companies went to Capitol Hill together to sound the alarm on AI being used to design bioweapons, pushing for a federal framework that requires pre-deployment testing for dual-use biological capabilities.

Why it matters: You don't often see these three in the same room agreeing on anything. The fact that they coordinated on this signals that safety regulation is moving from think-piece territory to actual legislation.

� Reddit

Claude Code usage limits

Anthropic is pushing toward enterprise seat-based licensing for automated coding agents. The model is genuinely capable, but users are hitting the friction of token-burn limits when running complex autonomous tasks.

Why it matters: The automation works — the harder problem right now is figuring out how to price it at scale.

� Twitter�️ Podcast

Projected HBM4 costs

HBM4 manufacturing costs are projected to land around $53 per gigabyte by 2027, driven by the complexity of 3D-stacked memory layers.

Why it matters: That's a significant capital expenditure hanging over GPU manufacturers and frontier labs alike, and it's going to squeeze margins across the board.

� Twitter


Launches

* Gemma 4 MTP — Merged into llama.cpp for significantly faster, multi-token inference on consumer-grade hardware. github.com/ggml-org/llama.cpp/pull/23398

* Hermes Agent v0.12 — Added cron scheduling and new WhatsApp/Signal integration for self-improving agents. github.com/NousResearch/hermes-agent


Closing thought: The Apple-Google team-up signals that even the most cash-rich tech company on the planet can't go it alone on AI anymore — which means the partnerships and regulatory battles shaping this industry are only going to get stranger from here.