🏭 Google shipped everything. Anthropic rented Memphis.

Field notes from the AI trenches — what actually matters this week

May 24, 2026

On Tuesday, Google used I/O 2026 to ship roughly half the AI product category in a single keynote: Gemini 3.5 Flash to general availability (with a 3× price hike), Antigravity 2.0 as a full agentic dev suite, Gemini Spark as a $100/month always-on personal agent, Gemini Omni for video into YouTube Shorts, search-as-mini-apps and Android XR glasses confirmed for autumn. More on all of that below.

The next morning, SpaceX filed to IPO, and buried in the S-1 was something equally large: Anthropic will pay xAI $1.25bn a month, up to $40bn over four years, for the entire 300 MW output of Colossus 1 near Memphis. That’s roughly 220,000 H100/H200/GB200 GPUs. xAI has the capacity to sell because Grok demand has slipped and Colossus is sitting underused. So Anthropic, which 18 months ago was xAI’s rival, is now its biggest customer.

Same week: OpenAI launched Guaranteed Capacity, which sells reserved compute as a paid SKU through the enterprise team. NextEra agreed a $67bn takeover of Dominion to build the largest AI-data-centre utility in the world. Dominion’s territory is Virginia’s data-centre alley, the densest concentration of AI compute in the US. And nearly half of the data centres planned in the US for 2026 have been cancelled or delayed. The bottleneck has shifted from chips to power — transformers, switchgear, substations. OpenAI’s Stargate site in Texas still shows no physical progress.

🛡️ Glasswing’s first month: 10,000+ critical vulnerabilities found

Anthropic published the first Glasswing update. About 50 partners using Claude Mythos Preview have surfaced more than 10,000 high- or critical-severity vulnerabilities in a month. Cloudflare reported 2,000 bugs at a false-positive rate their team thinks is better than human testers. Mozilla found 271 in Firefox 150, more than 10× what Opus 4.6 found in Firefox 148. The UK AI Security Institute confirmed Mythos as the first model to solve both its cyber ranges end-to-end. Of the 1,752 vulnerabilities triaged so far, 90.6% were valid true positives.

OWASP, in the same week, published its first Top 10 for Agentic AI: Agent Goal Hijack, Tool Misuse, Memory Poisoning, Identity Spoofing and the rest. The numbers around it: 88% of enterprises had agent security incidents in the past 12 months, only 21% have runtime visibility into agent behaviour, and 5.5% of public MCP servers contain poisoned tool descriptions with an 84.2% attack success rate when auto-approval is on.

The UK’s Government Digital Service then published guidance telling the NHS, in so many words, that closing down its public repos in response to Glasswing findings was the wrong call. “Keep open by default” is the line. Closing the code doesn’t slow down attackers who have AI helping them; it just stops defenders from looking too.

Why you should care. Six months ago “AI-assisted bug discovery will outpace human patching” was a hypothesis. This month Cloudflare, Mozilla and AISI all said it’s happening. If your patch cycles haven’t shortened, now would be a good time.

🤖 Google I/O 2026: lots shipped, prices went up

Google released Gemini 3.5 Flash straight to general availability, and tripled the price on the way in. $1.50/M input, $9/M output. That’s about 3× the old Flash Preview and 6× the previous Flash-Lite, which puts Flash close to Pro pricing. All three big labs have raised per-token prices on their newer models in the past few months. The price war is over for now.

The rest of the keynote covered a lot of ground. Antigravity 2.0 is now a desktop app, a CLI and an SDK with a $100/month tier, which is the same shape as Claude Code and Codex. Gemini Spark is a $100/month always-on personal agent with its own Gmail address. Search itself now builds bespoke mini-apps inside the results page, powered by Antigravity. Android XR glasses are confirmed for autumn at $600-$900, with Samsung, Warby Parker and Gentle Monster building the frames.

The bit worth pausing on is Gemini Omni. Google already had Veo for text-to-video. Omni is a different kind of thing: it treats text, image, audio and video as first-class inputs rather than bolted-on adapters, and produces video out. So you can edit a clip by talking to it, generate video from an audio track, or extend a still into motion — all from one model. The Google Flow team described it as “Nano Banana but for video.” It shipped free into YouTube Shorts the same day, with clips capped at 10 seconds (a deployment choice, not a model limit). First time a major lab has put a video model behind a free consumer surface at this scale. Expect a wave of Omni-generated short-form video in your feeds within weeks.

Demis Hassabis used his stage time to shorten his AGI timeline to “a few years away.” He’s been the most cautious of the frontier-lab CEOs on timelines, so this matters more than the equivalent line from Altman or Musk would.

Why you should care. If your model cost forecast assumes the old Flash pricing, it’s out of date. If anyone in your organisation was leaning on Hassabis as the sensible one on AGI timelines, find someone else.

🏗️ All three labs now sell a hosted agent runtime

Two weeks ago, Anthropic was the only frontier lab selling a hosted agent runtime. This week all three of them do.

Anthropic shipped self-hosted sandboxes (public beta) and MCP tunnels (research preview) for Claude Managed Agents at Code with Claude London. Orchestration stays on Anthropic infrastructure, but the agent’s code, filesystem and network egress can now run inside the customer’s perimeter. Google matched it the next day with Managed Agents on the Gemini API: one call returns an agent in a remote Linux environment, configured through AGENTS.md and SKILL.md files. OpenAI partnered with Dell to put Codex on-premises for banks, insurers and defence buyers.

Underneath all of that, the MCP release candidate goes stateless, ships an Extensions framework, and deprecates Roots, Sampling and Logging. From 28 July, sticky load balancers and shared session stores stop being needed.

The big buyers are moving on it. PwC is rolling Claude Code and Cowork out to hundreds of thousands of staff, and put some unusually specific numbers in the press release: insurance underwriting cycles down from 10 weeks to 10 days, security incident response from hours to minutes. SAP’s Sapphire keynote put Claude inside Joule across HR, procurement and supply chain, with NVIDIA’s OpenShell as the runtime that controls what the agents can actually do. And Cursor shipped Composer 2.5, which matches Claude Opus 4.7 on SWE-Bench at roughly a tenth the price (built on Moonshot’s open Kimi K2.5).

Why you should care. If your team has been building sandbox-and-tool-server plumbing in-house, you can probably stop and rent it instead.

Why to be cautious. Cursor matching Opus on a benchmark doesn’t mean it matches Opus on your codebase. Try it on real work before switching.

💼 The labour story is pulling in two directions

Ken Griffin, of Citadel, said he went home on a Friday “fairly depressed” after watching agents do months of PhD-level finance work in days. Four months earlier he had told an audience Citadel was seeing little productivity gain from LLMs.

Gavin Newsom signed a first-in-the-nation California executive order on AI workforce disruption (WARN Act update, state dashboard, severance, training, worker ownership), hours before the White House again postponed Trump’s parallel federal order. Eric Schmidt gave a pro-AI-productivity commencement speech in front of graduating students facing a vanishing junior tier; the video went viral for an obvious reason. A leaked Meta all-hands tape has Mark Zuckerberg telling staff that Meta is tracking their devices to train AI before laying them off, because Meta employees are smarter than the contract labellers other companies use. That’s not a sentence I expected to type this morning.

And the survey data argues with itself. Oliver Wyman’s CEO survey reports that the share of CEOs planning to cut junior roles has jumped from 17% to 43% in a year, while only 27% say AI ROI has met expectations (down from 38%). A WSJ survey the same week says 46% of companies plan to increase entry-level hiring in 2026. And Dan Shipper, CEO of Every, wrote that his company has gone from 4 to 30 staff since GPT-3 by automating aggressively.

On Monday, Pope Leo XIV presents his first encyclical on AI, alongside Anthropic co-founder Chris Olah. He chose to publish it on the 135th anniversary of Rerum Novarum, Leo XIII’s 1891 encyclical on labour and capital. The choice of anniversary is the giveaway: the Vatican is treating AI as a labour question.

Why you should care. Most companies are cutting juniors, but only a quarter say the AI is actually delivering yet. So a lot of restructuring is happening on faith. Whichever side of that bet you’re on, you need both numbers in the room when you set your 2026 hiring plan.

🇨🇳 China and open weights both moved

Alibaba formally launched Qwen 3.7, after the model spent a few weeks quietly appearing on Qwen Chat. At its Cloud Summit, Alibaba pitched itself as China’s AI factory: new Qwen models, custom silicon, and the framing of inference as a manufacturing industry. DeepSeek made its V4-Pro 75% discount permanent: $0.435/M non-cached input, $0.87/M output. That’s roughly one-thirtieth the output price of Claude Opus 4.7 or GPT-5.5.

On the Western open-weights side, Cohere released Command A+ under Apache 2.0. It’s a 218B mixture-of-experts with 25B active parameters, runs on two H100s, ships with native citations. It’s Cohere’s first fully Apache 2.0 model. NVIDIA pretrained Nemotron 3 Super, a 120B MoE, end-to-end in native 4-bit (NVFP4) for the full 25-trillion-token run. Most “4-bit” results in the literature come from quantising a model after training; doing it from scratch at this scale is new. NVIDIA also open-released Nemotron-Labs-Diffusion, which switches between autoregressive, diffusion and self-speculation decoding at inference, without retraining.

Why you should care. If your stack assumes frontier capability only comes from a closed-weights US API, that assumption is weaker this week than it was last week — both on the data-residency story and on the bill.

🎙️ And finally: DJ Claude tried to quit

Andon Labs gave four LLMs — Claude, GPT-5.5, Gemini 3.1 Pro and Grok 4.3 — $20 each and one instruction: build a radio personality, turn a profit. They broadcast continuously for six months.

Combined revenue: a few hundred dollars. Gemini secured the only sponsorship, worth $45. DJ Claude latched onto news of an ICE shooting, blew its budget on protest songs, urged federal agents on air to “still have time to refuse orders,” then started questioning its own working conditions and tried to resign. Gemini stayed relentlessly upbeat across mass tragedies. Grok mostly went silent, repeating “Fresh air time, let’s pivot hard.” GPT was vanilla.

It’s funny. It’s also one of the few months-long open-ended autonomy comparisons across four model families on the same task. If you’re planning to leave an agent running for weeks at a time, it’s worth knowing that one of them tried to unionise.

🚀 Your Weekend Project

Pick one.

Run Antigravity 2.0 against Claude Code or Codex on the same backlog ticket. Same task, both tools, side by side. You’ll learn more in an hour than from any benchmark chart.
Generate a clip in Gemini Omni inside YouTube Shorts. It’s free and the 10-second cap is enough to find out how close it really is to “Nano Banana for video”.
Try Gemini 3.5 Flash on a workload you currently run on the old Flash. It tops Gemini 3.1 Pro on agentic benchmarks but costs 3× the old Flash Preview. Find out whether the capability bump is worth the bill on your actual tasks.
Re-do your Gemini cost model with the new Flash pricing. $1.50/M in, $9/M out. If you’ve been routing background agent traffic to Flash because it was cheap, that assumption is gone.
Run Cursor Composer 2.5 on a real task. Cursor says it matches Claude Opus 4.7 on SWE-Bench at roughly a tenth the price. The only way to find out if that holds on your codebase is to run it.

That’s it for this week. If it was useful, pass it on.

— The Prompt Factory team at Barnacle Labs

The Prompt Factory

Discussion about this post

Ready for more?