đ Claude can now edit your PowerPoints
Field notes from the AI team at Barnacle Labsâwhat's caught our attention this week?
This week's newsletter touches on why even setting temperature to zero doesn't make LLMs deterministic (spoiler: it's not just floating-point maths). Meanwhile, Yann LeCun thinks we need a completely different set of AI capabilities, so perhaps the question about reproducibility doesnât actually matter.
Elsewhere in AI land, BlackRock is betting ÂŁ500m that UK companies really do care about data residency, and someone built a terminal game where you pretend to be an AI to avoid getting voted off by actual AIs.
We're also seeing the inevitable office suite takeover continueâClaude can now edit your PowerPoints directly, which is either incredibly useful or the beginning of death by a thousand AI-generated slide decks. Meanwhile, the UAE just dropped a reasoning model that's supposedly hitting 2,000 tokens per second, trained entirely on open data, proving that the AI race isn't just a US-China duopoly anymore.
And yes, there are two new mystery models on the stealth leaderboards. The community thinks it's either Gemini 3.0 or Grok 4.0, but honestly, at this point the model release cycle is so relentless that by the time you read this, there will probably be three more.
Let's dig inâŚ
⨠Creative Inspiration Corner
Your job in this LLM game is to destroy the bots!
Link: Github
Our Take: This one is fun: âAmong LLMsâ turns your terminal into a chaotic chatroom playground where youâre the only human among a bunch of eccentric AI agents, dropped into a creative scenario. Each participant, including you, has a persona and a backstory, and all the AI agents share one common goal -- determine and eliminate the human, through voting. Your mission: stay hidden, manipulate conversations, and turn the bots against each other with edits, whispers, impersonations, and clever gaslighting. Outlast everyone, turn chaos to your advantage, and make it to the final two.
đ Papers that change strategic thinking
Why do LLMs not produce the same reply every time?
Link: ThinkingMachines
Our Take: You've probably noticed that ChatGPT and other AI assistants give slightly different responses each time you ask the same question. While many experts blame this inconsistency on the way computers handle complex math calculations running in parallel, new research reveals the real story is more nuanced. Even when developers try to force these AI systems to be completely predictable by adjusting their settings, they still produce varying resultsâcreating major challenges for scientists and businesses who need reliable, reproducible outputs. This investigation digs into the surprising technical reasons behind AI's unpredictability and explores potential solutions that could make these tools more consistent and trustworthy for critical applications.
đ§âđť Developer Tools
Generate and Edit Images with Googleâs Nano Banana in AI SDK
Link: Vercel
Our Take: Vercelâs AI SDK is extremely elegant and easy to use. Itâs a great way to bootstrap a lot of AI work and abstract some of the vendor-specific API differences and simplify model interactions. Itâs nice to see the SDK expand past just text generation and incorporate images.
Ship directly to Cloud Run from Gemini CLI
Link: Google
Our Take: This is kind of neat â just type /deploy into Gemini CLI and itâll push your code to git and publish to Cloud Run. Or, enter /security:analyze and itâll check your code for security exposures using the new Gemini CLI Security Extension.
ChatGPT supports MCP
Link: OpenAI
Our Take: ChatGPT developer mode is a beta feature that provides full Model Context Protocol (MCP) client support for tools. At last! Itâs interesting that OpenAI caveats this with âIt's powerful but dangerous, and is intended for developers who understand how to safely configure and test connectors. When using developer mode, watch for prompt injections and other risks, model mistakes on write actions that could destroy data, and malicious MCPs that attempt to steal information.â
đ¤ Agents
Writing effective tools for agents â with agents
Link: Anthropic
Our Take: Despite too many claiming that any piece of code that uses an LLM is an agent, real agents are the ones that are given a set of tools and allowed to dynamically plan when and how to use those tools. When the AI has autonomy and power to decide how to use a tool, the design of that tool becomes critical â it has to be obvious for the AI to understand and use. That means that just wrapping existing APIs is almost always the wrong approach, because those APIs are often too complicated for the AI to work out how to reliably use. Anthropic opens this post with the statement that âagents are only as effective as the tools we give themâ â something I strongly agree with. Read to understand how to write high-quality tools and evals, and how you can boost performance by using Claude to optimize its tools for itself.
Custom Tools launch for Claude Code
Link: Anthropic
Our Take: Everyone thinks of Claude Code as a coding agent, but itâs much more. Thereâs an SDK available for both Python and Typescript, which makes it a more general-purpose agent framework that can be used for anything. Weâve been using this in our work at Barnacle Labs â itâs currently our favourite Agent solution. This week Anthropic release âcustom toolsâ which allow you to extend Claude Codeâs capabilities with your own functionality through in-process MCP servers, enabling Claude to interact with external services, APIs, or perform specialised operations. This further cements Claude Code as a very powerful and extensible general-purpose agent framework.
Replit Agent 3 â Agents for Dummies?
Link: Replit
Our Take: For those who like the idea of AI Agents, Replitâs Agent 3 might be the answer. You can generate automation workflows by just prompting. Itâs getting some good press from what I see and appears very capable.
đŽ Model Watch
K2Think: The fastest reasoning model
Link: K2Think
Our Take: Itâs nice to see a model thatâs not from the USA or China. This oneâs from the UAE. A very compact and fast reasoning model â itâs been benchmarked at 2,000 tokens/second, which is insanely fast. It was trained on 100% open datasets, no proprietary data at all. The paper has a host of interesting details about their technical innovations.
2 new stealth models!
Link: x
Our Take: Stealth models are now common â itâs how providers release their models for real-world testing to detect any issues. Platforms like OpenRouter and AI SDK that abstract over multiple AI providers make perfect sense if you want to make a model available without telling people what it is! In AI SDK we have stealth/sonoma-sky-alpha and stealth/sonoma-dusk-alpha. Either Gemini 3.0 or Grok 4.0 seems to be the community consensus.
đ Industry Intelligence
A16Z benchmarks AI-powered office tools
Link: A16Z
Our Take: AI isnât just a feature anymore â itâs becoming embedded into every tool we use. From drafting emails to designing slides, researching markets, or building financial models, a new layer of âagenticâ tools is emerging that resemble an AI-native Office suite. Rather than ask ChatGPT a question and copy its results into Word, we can generate that text directly. But are these tools any good? This post tries to answer that question â is it worth jumping in, or should we stick with ChatGPT and copy/paste?
BlackRock to invest ÂŁ500m in UK datacentres
Link: BBC
Our Take: A lot of UK enterprises need to have compute infrastructure local in the UK in order to meet their data residency commitments. Iâve worked with companies whose discriminator between model providers was âwho has a UK data centre and will guarantee me that no data will be sent outside UK bordersâ. The truth is that not everyone can meet those requirements and those that do are often capacity constrained. So, good to see something being done about it.
Claude can now edit your documents
Link: Anthropic
Our Take: This seems inevitable, and indeed a couple of people asked for my help because they thought it was already possible (it wasnât). Claude can now edit and generate word docs, powerpoints and excel sheets. Now you can create a presentation with a single prompt!
đ§ AGI â are we there yet?
The Shape of AI to Come â Yann LeCun at AI Action Summit 2025
Link: YouTube
Our Take: Yann LeCun is one of AIâs so-called âgodfathersâ. Heâs very opinionated and critical of current LLMâs ability to reach AGI. In this talk he lays out his vision for how weâll get to AGI and the things that need to be invented.
Amol Rajan interviews Anthropicâs Daniel Amodei
Link: BBC
Our Take: This is a nice interview. I particularly liked (and agree with) his comments on the impact of AI on low-level white-collar work: "If we look at entry-level, white-collar work. I think of people who work at law firms like first-year associates. There's a lot of document review. It's very repetitive, but every example is different. That's something that AI is quite good at. If you think of first year of entry-level work at a consulting company... If you think of entry-level work at a finance company, right? Doing routine analysis of financial documents. These are kind of the workhorses of entry-level white-collar labour. And yet they are things that AI is already pretty good at and AI is rapidly getting better at."
đŻ Week Ahead Priorities
If you want to do something, rather than just readâŚ
Play "Among LLMs" - Install this fun terminal game where you're the only human trying to survive among AI agents who are hunting for you. It's a clever way to understand how AI thinks and communicates.
Create a presentation with one prompt - Try Claude's new document editing feature by asking it to create a PowerPoint presentation on any topic you're interested in.
Test drive a mystery model - If youâre a developer using Vercelâs AI SDK, try one of the new stealth models. See if you can guess whether it's Gemini 3.0 or Grok 4.0!
Watch Yann LeCun's AGI talk - Get a glimpse of AI's future from one of its godfathers. His skeptical take on current AI limitations is refreshingly grounded and thought-provoking.
đ¨ Shameless plug alert
Barnacle Labs builds AI solutions for ambitious organisations tackling complex challenges. We're the team behind the National Cancer Institute's NanCI AI-powered appâwe help you identify the right opportunities and ship solutions that deliver results.
Reply to this email with your biggest AI challenge if you'd like to talk!

