📰 This Week's AI Highlights
TLDR: This week's AI roundup explores the ongoing debate about agent frameworks with perspectives from Anthropic, OpenAI, and Langchain; highlights new assistant tools from Suna and Perplexity; features educational resources from Microsoft and Anthropic; examines document conversion tools for AI processing; shares insights on AGI progress from industry leaders Demis Hassabis and Geoffrey Hinton; and covers notable model releases including Google's Gemma 3 QAT and the emotion-capable speech generator Nari Dia-1.6B.
Full disclosure: this TLDR was written by AI.
🤖 Agents – battle of the blogs
Anthropic: Anthropic started this series of blogs/docs trying to define what an agent is and what agent frameworks do.
Link: Anthropic Blog
Our Take: This is a decent post. Especially worth a read are the takes on the explicit difference between workflows (a pre-defined set of LLM calls) and agents (LLMs dynamically direct the process flow). Also, this guidance “When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all.” Not everything needs the complexity that agents bring.
OpenAI: Not a blog, but a white paper. I’m not sure I know what the difference is.
Link: OpenAI doc
Our Take: OpenAI’s document is a little opinionated, but that isn’t necessarily a bad thing. They have an agent framework, so you might argue they are just pushing the way their framework sees things. Regardless, we think it’s a useful perspective.
Langchain didn’t agree with OpenAI: At all. So Harrison (the Langchain guy) wrote his own blog post.
Link: Langchain blog
Our Take: This blog post starts with a red-cross/green-tick analysis of the top agent frameworks, assessing which features each framework includes. No surprise, LangGraph (the Langchain agent framework) comes out top. But agent frameworks are super new – there’s a strong argument for choosing something that’s robust and compact, rather than something that does everything averagely. So, maybe this focus on “functional completeness” isn’t the definitive answer.
🦾 Actual useful AI?
Your first “AI employee”?: Suna promises a lot, so should white collar workers be afraid?
Link: Suna
Our Take: It’s early for this stuff. Suna seems to do a lot of productivity-focused things, but does it do them well enough? A slick video with nice background music doesn’t tell us if the data it collects is definitive or nonsense.
What Siri should have been?: Perplexity’s new assistant takes on day-to-day tasks on your phone.
Link: Perplexity
Our Take: Perplexity’s new assistant looks like it’s taking on the things that Siri should have done by now. Looks pretty cool!
📜 Papers
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models: A fantastic overview of all the different approaches to increasing AI reasoning abilities.
Link: arXiv
Our Take: 72 pages, 38 of which are references – this is a great resource and extremely well referenced!
🎓 Eduction
MS GenAI for Beginners: Free education from Microsoft!
Link: Github
Our Take: It looks good. Keep in mind that it’s Microsoft’s perspective, but still useful.
Prompt Engineering: Free education from Anthropic!
Link: Youtube
Our Take: Anthropic stuff is always interesting, they do a good job at this.
📜 All your documents are MD
Microsoft MarkItDown: This library converts all kinds of documents into Markdown format - PDF, PowerPoint, Word, Excel, HTML... and more!
Link: Github
Our Take: If you want to use AI on your documents, you probably need them in plain text, so AI can do something with them. Markdown is a popular pain text format the maintains important document structure eg section titles.
IBM Docling: This library converts all kinds of documents - doc, PDF, PowerPoint, Word, Excel, HTML... and more!
Link: Github
Our Take: Docling is a more sophisticated solution, handling complex pdf’s that MarkItDown doesn’t.
Mistral OCR: This one only does pdf’s, but it does them well.
Link: Mistral
Our Take: In my experience this is the absolute best at converting complex pdf documents into markdown, correctly parsing line breaks, multi-column layouts, etc. It’s kind of incredible how complex pdf documents are, but at least we have AI to turn them into something useful!
🧠 AGI - are we there yet?
AGI is coming: Demis Hassabis, CEO of Google Deepmind and Nobel prize winner, is worth listening to.
Link: Youtube
Our Take: The interview with Dennis starts at around 15 minutes in. He predicts AGI in 5-10 years and the possible end of disease as a side benefit.
How do humans actually think?: Another Noble prize winner, Geoffrey Hinton, does a fascinating interview at the University of Toronto.
Link: Youtube
Our Take: An interesting discussion with Geoffrey Hinton: “Up until now most people, including in the humanities, have thought that we sort of reason using something like logic. We’re rational beings. We’re not…. We’re analogy machines, rather than thinking machines. We’ve got a thin layer on reasoning on the top, and that’s really important for doing things like mathematics… But we basically use analogies to think.”
How to avoid horseless carriages?: An interesting challenge.
Link: Horseless Carriages
Our Take: “Whenever a new technology is invented, the first tools built with it inevitably fail because they mimic the old way of doing things. “Horseless carriage” refers to the early motor car designs that borrowed heavily from the horse-drawn carriages that preceded them.” Very thoughtful and insightful comments in this post – worth a read. At Barnacle Labs we sometimes talk about the “imagination gap” – from where we are today, it’s difficult to imagine what our AI future might look like. If I’d told my 16 year old self that we’d all have smart phones in the future, my 16 year old self’s brain would probably have exploded.
🔮 Model Watch
Gemma 3 QAT: Gemma 3 from Google is one of the best small open models and this release should make it a bit more accurate due to its innovative training process.
Link: Google
Our Take: When you run a small local model on your laptop, you’re almost never running the full model – even the small ones are too big for most people’s machines. Instead, you’re running a model that’s been through a ‘quantisation’ process after training to shrink its size. The quantisation process has a small impact of model performance, so it’s interesting to see Google release a version of Gemma 3 that’s been trained with from the start to be quantisation aware. The promise is that this new approach will further reduce the impact of quantisation.
Nari Dia-1.6B: Speech generation with real emotion.
Link: Github
Our Take: AI generated speech has a tendency to sound a bit scripted and wooden, so it’s interesting to see a speech model that handles real emotion.
The Prompt Factory is a newsletter from Barnacle Labs, sharing AI insights and discoveries our team finds fascinating.