đ Googleâs Full-Stack Offensive
Field notes from the AI trenchesâwhat actually matters this week
Welcome back to The Prompt Factory. This week, Google made its move - not just one product launch, but a coordinated offensive across the entire AI stack. From frontier models to edge devices to developer tools to chip wars, theyâre positioning for 2026.
Meanwhile, the industry is choosing collaboration over lock-in, Anthropic is being refreshingly honest about browser automation risks, and Karpathy reminds us that agents are a decade-long project, not a quarterly deliverable.
Weâre skipping next week for the holidays. We return on Sunday 4th January for a New Year Special Editionâa deep dive into the year that was 2025 and the year ahead for 2026.
Letâs dive in for the last Prompt Factory of 2025.
đ Googleâs Gemini 3 Ecosystem Goes Live
What happened
Google released Gemini 3 Flash, a frontier-level model that delivers Pro-grade reasoning at Flash-level speed. It outperforms Gemini 2.5 Pro on most benchmarks while running 3x faster. Priced at $0.50/1M input and $3/1M outputâabout 60% more than 2.5 Flash ($0.30/$2.50), but a fraction of 3 Pro pricing ($2/$12) for comparable performance.
What it does
Hits 90.4% on GPQA Diamond (PhD-level reasoning questions)
Processes video, images, and audio in real-time
Handles 128K token context windows
Now the default model in the Gemini app and rolling out across Google Search
Why you should care
This is Googleâs bid to make frontier intelligence a commodity. When Pro-level reasoning comes at Flash-level prices and speeds, the entire AI economics shift. Companies like JetBrains, Figma, Cursor, and Replit are already processing over 1 trillion tokens per day on the API.
The pattern
Google also released FunctionGemma (270M parameters optimized for edge function calling), T5Gemma 2 (first multimodal encoder-decoder models), and Conductor (context-driven development for their CLI). This isnât a product launchâitâs a full-stack offensive. From edge devices to enterprise workflows, Google is positioning itself across every tier of the AI stack.
đ¤ The Great AI Interoperability Push
What happened
Google announced comprehensive support for the Model Context Protocol (MCP), an open standard originally created by Anthropic for connecting AI systems to tools and data sources. Meanwhile, Anthropic transformed their skills mechanism into an open Agent Skills standard, now adopted by Cursor, Amp, Letta, Goose, and others.
Why you should care
This is the industry choosing ecosystem growth over vendor lock-in. In March 2025, OpenAI adopted MCP across ChatGPT. In December, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation, with support from Google, Microsoft, AWS, and Bloomberg. Some estimates suggest 90% of organizations will use MCP by yearâs end.
The Agent Skills standard is even more elegantâSimon Willison calls it âdeliciously tiny,â a specification you can read in minutes that lets you package organizational knowledge into portable, version-controlled folders that work across multiple AI platforms.
The stakes
When major competitors agree on standards, it signals market maturity. The battle is shifting from âlock users into our platformâ to âmake the best tools for an open ecosystem.â Thatâs good for everyone building with AI.
đ ď¸ Simular AI: When Computer Agents Beat Humans
What happened
Simular AI, founded by former Google DeepMind researchers, achieved a 72.6% success rate on OSWorld, surpassing the ~72% human baseline. Their Agent S3 also scored 90.1% on WebVoyager and 71.6% on AndroidWorld.
What it does
Agent S3 can autonomously control browsers, computers, and smartphones. The platform offers both production-grade enterprise agents (Simular Pro) and web-based access (Simular Cloud, starting at $50/month). Over 12,000 users have joined.
Why you should care
This is what computer use agents look like when they actually work. The team open sources their tools, builds in public, and focuses on production-ready reliability for workflows with thousands to millions of steps. Featured in Wired, MIT Tech Review, and IBM Think, theyâre not hyping the futureâtheyâre shipping it now.
Why to be cautious
Competitors like âThe AGI Companyâ claim their OSAgent achieves 76.26% on OSWorld, exceeding Simularâs performance. Benchmarks matter, but deployment reality matters more. Watch for real-world case studies, not just leaderboard positions.
đ Claude for Chrome: Browser Automation with Brutal Honesty
What happened
Anthropic released Claude for Chrome as a beta browser extension, enabling Claude to navigate websites, click buttons, and fill forms directly in your browser. Available to all paid subscribers.
What it does
Pulls metrics from analytics dashboards without manual exports
Organizes files in Google Drive (sort, create folders, flag duplicates)
Compares products across sites and creates comparison tables
Logs sales calls to CRM by matching calendar to Salesforce
Runs scheduled workflows (daily/weekly reports) in the background
Why to be cautious
Anthropicâs safety documentation is refreshingly blunt: âMalicious sites may hide instructions overriding user commands. Never use for financial transactions, password management, or sensitive data without supervision. Attack vectors are constantly evolving and Claude may hallucinate.â
This is prompt injection territory. Start with trusted sites only. Review sensitive actions. Pause if Claude acts unexpectedly.
Translation
Most companies would bury these warnings in legal disclaimers. Anthropic puts them front and center. Thatâs how you build trust when shipping powerful, risky tools. The feature is usefulâjust treat it like youâd treat giving someone remote access to your computer, because thatâs essentially what it is.
đ Andrej Karpathyâs Reality Check: The Decade of Agents
What happened
Andrej Karpathy, former OpenAI co-founder and Tesla AI director, delivered a sobering assessment: current AI agents have insufficient intelligence, and it will take âabout a decade to work through all of those issues.â Heâs not saying todayâs agents arenât useful, heâs saying thereâs still work to do to catch up with the âautonomous workforceâ hype.
Key quotes from his Dwarkesh podcast
âI donât want an Agent that goes off for 20 minutes and comes back with 1,000 lines of code. I certainly donât feel ready to supervise a team of 10 of them.â
He fears âmountains of slop accumulating across software, and an increase in vulnerabilities and security breachesâ if we deploy agents before theyâre ready.
Why you should care
Karpathy isnât a skepticâheâs a builder. His December year-in-review highlighted how the application layer around large language models matured in 2025, with tools like Cursor showing what âLLM appsâ should look like: orchestrated systems for specific vertical tasks where AI âbegins to feel practical, not just impressive.â
His concept of Reinforcement Learning from Verifiable Rewards (RLVR) points to where the field needs to go: proven correctness, not just plausible output.
The lesson
This is the counterweight to aggressive agent marketing from companies claiming their AI employees are production-ready. The gap between current capabilities and reliable deployment is measured in years, not quarters. Build accordingly.
đ§ Google vs. NVIDIA: The TorchTPU Gambit
What happened
Google is developing TorchTPU, a major initiative to make its TPU chips fully compatible with PyTorchâthe worldâs most widely used AI frameworkâin partnership with Meta. The goal: erode NVIDIAâs software moat.
Why you should care
Hardware alone doesnât win. NVIDIAâs dominance comes from CUDA, the software ecosystem deeply embedded in PyTorch. Googleâs TPUs require developers to switch to Jax, creating friction that limits adoption.
If TorchTPU succeeds, it dramatically reduces switching costs for companies seeking NVIDIA alternatives. Meta gains leverage in chip negotiations. Developers get more choices.
Googleâs new AI infrastructure head, Amin Vahdat, reports directly to CEO Sundar Pichaiâreflecting how critical this initiative is to both internal products (Gemini, AI Search) and cloud customers.
The pattern
This is the classic platform war playbook: make your hardware work with the software everyone already uses. Google started selling TPUs externally in 2022, began placing them directly in customer data centers this year, and is now tackling the last barrierâsoftware compatibility. Watch this closely.
đ¨ ChatGPT Images Gets a Major Upgrade
What happened
OpenAI released GPT Image 1.5, a new flagship image generation model thatâs up to 4x faster and significantly better at precise edits. Rolling out to all ChatGPT users and available in the API.
What it does
Precise editing: Changes only what you ask for while keeping lighting, composition, and faces consistent
Better instruction following: Handles complex multi-element compositions and dense text rendering
Creative transformations: Try-ons, style transfers, and conceptual reimaginings that preserve the essence of originals
20% cheaper in the API compared to the previous model
Why you should care
This is image generation moving from âimpressive demosâ to âpractical tool.â The focus on editingânot just generationâmeans you can iterate on real images: product photography variations, branded content adjustments, design mockups. Wix is already using it for concept-to-production workflows.
The new dedicated Images experience in ChatGPTâs sidebar includes preset filters and trending prompts, plus one-time likeness upload so you can reuse your appearance across creations. Itâs designed to make experimentation frictionless.
The honest take
OpenAI admits âresults remain imperfectâ and âthere is still significant room for improvement.â But thatâs the right framing: useful now, getting better. Try it for practical editing tasks rather than expecting magic.
đŻ OpenAIâs FrontierScience Benchmark
OpenAI launched FrontierScience, a new benchmark for PhD-level scientific reasoning across physics, chemistry, and biology. GPT-5.2 achieved 77% on Olympiad-style questions and 25% on research tasks.
Why you should care
This is how the industry is trying to measure whether AI can contribute to real science. But the caveats matter: theoretical physicist Carlo Rovelliâs blunt assessment is that submissions to his journal have doubled, âmost of it just people who think theyâre doing great science by having conversations with LLMsâand itâs horrible.â
OpenAI researcher Miles Wang acknowledged the benchmark âdoes not measure all the important capabilities in scienceââno hypothesis generation, no experimental execution, no multimodal research involving real-world lab systems.
The reality
High benchmark scores donât equal scientific reliability. We need evaluation frameworks measuring actual utility, not just controlled performance.
đ Your Weekend Project
Pick one:
Test the Agent Skills standard: Visit agentskills.io and read the specification (takes minutes). If you use Claude Code or Cursor, explore creating a simple skill for a repetitive task in your workflow.
Explore Googleâs MCP servers: Check out Googleâs open-source MCP implementations for Workspace, Firebase, or Cloud Run. If youâre building AI integrations, evaluate whether MCP reduces your vendor lock-in risk.
Try Gemini 3 Flash: At $0.50 per million input tokens, thereâs no excuse not to experiment. Compare it to your current model on a real task. Is it actually 3x faster? Does the quality hold?
Read the Claude for Chrome safety documentation: Even if you donât use it, Anthropicâs transparent approach to prompt injection risks is a masterclass in honest product communication. Study it before deploying any AI automation.
đ
Coming Next: New Year Special Edition
Weâre skipping next week for the holidays and return in the New Year with something different. Weâre kicking 2026 off with a special edition looking back at 2025 and ahead to 2026, using two guides to help us synthesise the noise from the real messages of progress. Their overarching theme: âThe era of AI evangelism is giving way to an era of AI evaluation.â
Itâs the perfect way to start the new year: stepping back from the weekly news cycle to ask where this is all heading.
See you in 2026 for the deep dive.
đď¸ About Barnacle Labs
At Barnacle Labs we build AI systems that actually ship. From the National Cancer Instituteâs NanCI app to AI systems deployed across biotech and enterprise clients, weâre the âbreakthroughs, not buzzwordsâ team.
Got an AI challenge thatâs stuck? Reply to this emailâletâs talk.
The voices worth listening to in AI are the ones building, not just talking.

