đ„ Emergency episode: OpenAI releases GPT-4o mini
How OpenAIâs latest release attacks cost, speed and efficiency
TLDR: Inspired by âThe Rest Is Politicsâ podcastâs emergency episodes, this edition of The Prompt Factory covers the timely release of OpenAIâs GPT-4o mini. This smaller model, fitting the theme of small LLMs, boasts multi-modal abilities, is 3x cheaper than GPT-3.5-turbo, 30x cheaper than GPT-4o, and 2x faster than both.
One of my favourite political podcasts, The Rest Is Politics, has pioneered the practice of releasing emergency episodes whenever significant events break. Taking that as inspiration, I find myself penning this episode of The Prompt Factory the morning after the release of OpenAIâs GPT-4o mini.
The timing of GPT-4o miniâs release is serendipitous because the previous edition of this newsletter focussed on small models and, as its name suggests, GPT-4o mini is another small model.
Like its bigger sibling, GPT-4o mini includes multi-modal abilities. Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs coming in the future. Thatâs obviously a big improvement over GPT-3.5-turbo that it effectively replaces.
GPT-4o mini is big news because of four things:
đ° Itâs significantly cheaper. To be precise, mini is 3x cheaper than GPT-3.5-turbo that it effectively replaces and 30x cheaper than the larger GPT-4o. Taking a competitive lens, mini is cheaper than the nearest competitors, Googleâs Gemini Flash 1.5 and Anthropicâs Claude 3 Haiku. For uses that are price sensitive (and most are) this is a big deal.
đïž Itâs fast. Artificial Analysis, my go-to site for LLM assessments, places it at the top of the speed league table for commercial models, coming in at around 2x the speed of both GPT-4o and GPT-3.5-turbo. But if absolute top speed is a requirement, Llama3 70B on specialist hosting provider Groq is over twice the speed again. Speed can be important, because a lot of uses are shifting towards making multiple calls to the model in order to fulfil a user requestâuser wait time can become an issue in these scenarios, so a fast model becomes important.
đ§ Itâs good enough at reasoning. When cost is an influence and we donât need that absolute top reasoning abilities, a âgood enoughâ model is worth considering. But actually, mini comes out really well on the benchmarksâvery similar to Gemini 1.5 Flash and better than GPT-3.5-turbo. With a bit of prompt tinkering, some GPT-4o applications could likely be adapted to work with it and so bank some big cost savings.
đ± Itâs greener. This is less easy to quantify, given that OpenAI doesnât disclose any details of energy use. However, itâs safe to say that mini is a smaller model that uses less energy and therefore emits less CO2.
Whatâs interesting here is that in the commercial LLM world weâre seeing two distinct sets of modelsâone cluster focused on being cheap, fast and âgood enoughâ (GPT-4o mini, Gemini 1.5 Flash, Claude 3 Haiku) and another focussing on pushing the boundaries of whatâs possible (GPT-4o, Gemini 1.5 Pro, Claude 3 Opus).
I spotted the following in the OpenAI announcementâŠ
âthe cost per token of GPT-4o mini has dropped by 99% since text-davinci-003, a less capable model introduced in 2022â
The dramatic reduction in cost of language models has been, and continues to be, consistent. With every release, the prices tumble. I cannot think of any other product thatâs seen a 99% reduction in cost over just two years.
đ So, what does this mean?
The bottom line is that anyone using GPT-3.5-turbo should immediately look into replacing that with GPT-4o mini, in order to bank some juicy cost savings and latency improvements. If youâre using GPT-4o, itâs worth assessing mini to see if it could do the jobâif it can, youâre looking at a 30x reduction in cost!
Those using similar classes of models from OpenAI competitors (Gemini 1.5 Flash, Claude 3 Haiku) should probably stick with what theyâre doingâthere may be some small gains from GPT-4o mini, but the cost of change and testing/validating the impact may not be worth it.