🔥 Emergency episode: OpenAI releases GPT-4o mini

How OpenAI’s latest release attacks cost, speed and efficiency

Jul 19, 2024

TLDR: Inspired by “The Rest Is Politics” podcast’s emergency episodes, this edition of The Prompt Factory covers the timely release of OpenAI’s GPT-4o mini. This smaller model, fitting the theme of small LLMs, boasts multi-modal abilities, is 3x cheaper than GPT-3.5-turbo, 30x cheaper than GPT-4o, and 2x faster than both.

One of my favourite political podcasts, The Rest Is Politics, has pioneered the practice of releasing emergency episodes whenever significant events break. Taking that as inspiration, I find myself penning this episode of The Prompt Factory the morning after the release of OpenAI’s GPT-4o mini.

The timing of GPT-4o mini’s release is serendipitous because the previous edition of this newsletter focussed on small models and, as its name suggests, GPT-4o mini is another small model.

Like its bigger sibling, GPT-4o mini includes multi-modal abilities. Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs coming in the future. That’s obviously a big improvement over GPT-3.5-turbo that it effectively replaces.

GPT-4o mini is big news because of four things:

💰 It’s significantly cheaper. To be precise, mini is 3x cheaper than GPT-3.5-turbo that it effectively replaces and 30x cheaper than the larger GPT-4o. Taking a competitive lens, mini is cheaper than the nearest competitors, Google’s Gemini Flash 1.5 and Anthropic’s Claude 3 Haiku. For uses that are price sensitive (and most are) this is a big deal.
🏎️ It’s fast. Artificial Analysis, my go-to site for LLM assessments, places it at the top of the speed league table for commercial models, coming in at around 2x the speed of both GPT-4o and GPT-3.5-turbo. But if absolute top speed is a requirement, Llama3 70B on specialist hosting provider Groq is over twice the speed again. Speed can be important, because a lot of uses are shifting towards making multiple calls to the model in order to fulfil a user request—user wait time can become an issue in these scenarios, so a fast model becomes important.
🧠 It’s good enough at reasoning. When cost is an influence and we don’t need that absolute top reasoning abilities, a “good enough” model is worth considering. But actually, mini comes out really well on the benchmarks—very similar to Gemini 1.5 Flash and better than GPT-3.5-turbo. With a bit of prompt tinkering, some GPT-4o applications could likely be adapted to work with it and so bank some big cost savings.
🌱 It’s greener. This is less easy to quantify, given that OpenAI doesn’t disclose any details of energy use. However, it’s safe to say that mini is a smaller model that uses less energy and therefore emits less CO2.

What’s interesting here is that in the commercial LLM world we’re seeing two distinct sets of models—one cluster focused on being cheap, fast and “good enough” (GPT-4o mini, Gemini 1.5 Flash, Claude 3 Haiku) and another focussing on pushing the boundaries of what’s possible (GPT-4o, Gemini 1.5 Pro, Claude 3 Opus).

I spotted the following in the OpenAI announcement…

“the cost per token of GPT-4o mini has dropped by 99% since text-davinci-003, a less capable model introduced in 2022”

The dramatic reduction in cost of language models has been, and continues to be, consistent. With every release, the prices tumble. I cannot think of any other product that’s seen a 99% reduction in cost over just two years.

👉 So, what does this mean?

The bottom line is that anyone using GPT-3.5-turbo should immediately look into replacing that with GPT-4o mini, in order to bank some juicy cost savings and latency improvements. If you’re using GPT-4o, it’s worth assessing mini to see if it could do the job—if it can, you’re looking at a 30x reduction in cost!

Those using similar classes of models from OpenAI competitors (Gemini 1.5 Flash, Claude 3 Haiku) should probably stick with what they’re doing—there may be some small gains from GPT-4o mini, but the cost of change and testing/validating the impact may not be worth it.

The Prompt Factory

🔥 Emergency episode: OpenAI releases GPT-4o mini

How OpenAI’s latest release attacks cost, speed and efficiency

👉 So, what does this mean?

Discussion about this post