🐭 Small AI Models: Tiny but Mighty

How Pocket-Sized Powerhouses Are Transforming AI and Making It Accessible to All

Jul 16, 2024

👋 Greetings from the Factory Floor

I’m Duncan Anderson, and I’ve spent the past decade building a career in AI, working with everyone from the largest multinational tech firms to the smallest startups. Now, I’m excited to share my experiences and insights with you. This newsletter is designed to equip you with the practical knowledge and skills you need to thrive in our new AI-powered world.

In this week’s issue:

We shine the spotlight on small models, including Google’s new Gemma2 and Kyutai’s Moshe, and how you can play with them on your laptop.
The Prompt Report: the most comprehensive overview of AI prompting techniques we’ve yet seen.
Porter’s Five Forces – a prompt to get you a corner office at McKinsey!

📢 The Factory Whistle

A focus on small AI Models: The New Giants in Tech

The AI world is buzzing with excitement, and a new breed of AI models is emerging, proving that sometimes, smaller is mightier. We're talking about pocket-sized powerhouses like:

Microsoft’s Phi3 and WizardLM2
Meta’s Llama3 8B
Mistral’s Mixtral 7B
And now, Google's latest star: Gemma2
Let’s not forget that Apple Intelligence includes a similarly small model that runs locally on your iPhone – small models are no toys

These "small" models (typically under 10B parameters) are revolutionising the field. Remember, big models like GPT-4 and Gemini are well over 100B in size, so some of these models are really tiny.

Not sure what “10B parameters” means? It’s a measure of the size of a model. Larger sizes imply a need for more memory and more compute.

Under 10B comfortably fits on a half-decent modern laptop, but as the size increases from there you start to exceed the memory available in consumer hardware. Bigger models are the domain of large servers. And as they get bigger still, you quickly need specialist GPUs like an Nvidia A100 and costs start spiralling. The original large language models were built for very high-end cloud infrastructures, so it’s a measure of the level of innovation in the field that we’re now talking about running some of these on laptops and even phones.

Why are small models changing things? Historically, they’ve only been useful as research aides and had limited practical application due to their poor performance. But not any more—some of these models now get very respectable benchmark scores. Small models are perfect for fine-tuning, a process that takes an off-the-shelf model and applies additional data to further refine its training and make them good at specific things. This is what Apple’s done with Apple Intelligence—creating a small model that runs on your iPhone.

A good example of how these small models have improved is Google Gemma2’s MMLU benchmark score: the 9B version scores 71.3 and the 27B version 75.2 . OpenAI’s GPT-3.5-TURBO scores 70.0, so even the smallest Gemma2 beats it. (higher scores are better, in case you hadn’t guessed)

Now that performance has been addressed, the ability of small models to run smoothly on consumer hardware comes into focus. This brings AI power to your fingertips. No need for massive servers, scarce GPUs or sky-high compute costs. And let’s not forget: less compute means lower energy use and less CO2 generated – small models are better for the planet 🌍

The newest small model, Gemma2 from Google, comes in three flavours:

Tiny (2.6B) - Coming soon, for the AI light snackers
Small (9B) - Perfect for your laptop
Medium (27B) - When you need a bit more oomph and have a server to run it on

Here's the kicker: Gemma2 isn't just small, it's mighty. The 27B version is out there crushing benchmarks, even outperforming its bigger cousin, Llama3 70B. That's right—half the size, all the power. Say goodbye to hefty compute costs and hello to efficient AI!

Gemma2 will even give OpenAI’s GPT-3.5-TURBO a run for its money. Given that many AI apps still use GPT-3.5-TURBO because it’s “good enough” and 10x cheaper than GPT-4, we can see that Gemma2 has some serious potential.

Want to try Gemma2 on your laptop?

It’s really easy with Ollama, a system for running open models on consumer hardware.

Just hop along to Ollama, press download and install the app.
You’ll need a terminal to run Ollama – find your terminal app (it’s installed on all computers).
Type ollama run gemma2 into your terminal.
Hey Presto, you’re chatting with Gemma2!
Once you’ve got that working, why not also try ollama run phi3 and ollama run wizardlm2 ?

Perhaps you want Gemma2 running on a server?

That’s also going to be easy – Google have already announced one-click support in Google Cloud’s Vertex and specialist AI hosting provider Fireworks already has support in place. No doubt AWS and Azure will follow shortly.

Is Gemma2 cheaper to run than the usual commercial models? Yes—Fireworks charge $0.2/1m tokens, whereas GPT-3.5-TURBO costs $0.5/1m tokens on the input side.

All the big cloud providers have options for hosting a variety of open models, be it Google’s Vertex Model Garden, AWS’s Bedrock, or Azure’s Model Catalog. But this space is also populated by specialists like Together.AI, Groq, Beam, and Replicate. These folk tend to differentiate from the big boys by moving fast and innovating—Groq, for example, has become famous for using specialist hardware to deliver the fastest inferencing around. So, sure, it’s easy to get an OpenAI key and use their API, but a bit of research could have you running a small and open model with just a tiny bit of effort.

Most people will want a model-as-a service option, which means you only pay for the usage you make. If you have to provision a specific size of server, that means the server is dedicated to you and needs be paid for, regardless of how much use you make of it. Small models are greener, but not if you provision a server and then leave it idling, but still eating energy, for most of the time. Unless you can reliably calculate how much capacity you need, dedicated servers almost always mean you over provision and therefore have servers using energy and not doing much. You’ll consume much less energy/CO2 (and cost) if you opt for a model-as-a-service option, where the compute infrastructure is sized by specialists and shared with others. So, when looking at hosting options be careful to check which options require dedicated servers and which do not.

Small models find their voice

Language models are rapidly shifting towards becoming multi-modal—for example, including the ability to natively process images and speech. Moshi from Paris-based Kyutai (shocker: not all AI comes from San Francisco!) is a small 7B multi-modal model that very effectively emulates GPT-4o’s still unreleased speech ability. The demo is a bit rough around the edges, but it’s easy to see how a bit of improvement will take it somewhere pretty awesome. Because it’s a small model, Kyutai already have demos of it running locally on laptops and have the bandwidth to improve things because they’re sitting on $300m of funding.

Function Calling: Small Models, Big Impact

Ever heard of “function calling”? It’s one of the most powerful tricks up the sleeve of language models. You can ask a model “What’s the square root of 1024?” and it gives you the exact API call to make your calculator do the maths. Or ask “What’s the weather in London?” and it hands you the precise call to your weather API. That’s the power of function calling—connecting language models to APIs and enterprise systems.

OpenAI pioneered this nifty feature and their models are still top of the charts in function calling benchmarks. But here’s the exciting part: a new contender, gorilla-openfunctions-v2, is nipping at GPT-4’s heels. And surprise—it’s a compact 7B model! Again, we find that small models are getting useful.

Faster, Smarter Responses with RAG & Small Model Magic

Have you heard about RAG (Retrieval Augmented Generation)? It’s a super common way to make language models answer questions using your own data, instead of their training data.

Google’s just published research showing a fascinating approach to RAG: using small models to generate multiple candidate answers, then using a larger model to judge, refine and create a final response from these candidates. This method sped things up by over 50% and boosted accuracy by 13%. Now 13% might seem a relatively modest gain, but trust me when I stay that in a world where most RAG improvements bring only single digit improvements, it’s a big deal. It’s also a brilliant example of small models playing the role of cog in a bigger machine.

A word of caution…

As cool as these small models are, they have their limitations. They aren’t competing with GPT-4 and they’re only likely to be useful for certain tasks today. So it pays not to get too carried away—those big models still have a role. But for the right task, a small model will use a fraction of the cost/compute/energy footprint of their bigger brothers and sisters. Working out when a small model might be appropriate needs experience and also some ingenuity—time to put those thinking caps on!

👉 Key Points 👈

Small models are getting good and are starting to compete with mainstream commercial models like GPT-3.5-TURBO, but they have limitations and are best used for niche things right now.
You can easily run any of these models on your laptop with Ollama.
There’s a wide variety of options to host open models in the cloud.
Small models have already started adopting innovative features like function calling and multi-modal abilities.
😍 Enjoying so far, share with your friends! 😍
Share The Prompt Factory

🔬 R&D Department

A group from academia, OpenAI, Microsoft and others have just released groundbreaking research, "The Prompt Report," providing the most comprehensive analysis of prompt engineering to date. This isn't just for techies – everyone interacting with AI can benefit from their insights! 🚀

What's inside the report?

A massive taxonomy of 58 text-based prompting techniques, categorized for easy understanding and implementation.
Exploration of multilingual and multimodal prompting, going beyond English text to include images, audio, and more.
A deep dive into AI agents, where AI uses external tools like calculators and the internet to solve complex problems.
Crucial discussions on evaluating AI output, security risks, and ethical considerations – ensuring your AI usage is responsible and effective.

👉 Checkout the The Prompt Report: https://arxiv.org/abs/2406.06608

🏆 Prompt of the Week

This week we’ve got a prompt that will make you sound like you've got an MBA and a corner office at McKinsey…

Porter's Five Forces is a strategic analysis framework that evaluates the competitive intensity and attractiveness of an industry. By analyzing the forces affecting a business, you can assess its competitive position, identify potential threats and opportunities, and develop strategies to enhance its profitability and market share.

This prompt taps into the power of Porters Five Forces – just be sure to replace the variables {{ }} with the specifics of the situation you’d like analyzed.

Who knows? Use this prompt and you might find yourself fielding job offers from big consultancies faster than you can say "synergy"!

You are tasked with analysing a business situation using Porter's Five Forces framework. I will provide you with the details of the situation, including the industry, key players, and any relevant market conditions. Your job is to apply each of the five forces to the situation and provide a detailed analysis.

Here are the details of the situation:

<situation_details>
{{SITUATION_DETAILS}}
</situation_details> 


Follow these steps to complete the analysis:

1. **Understand the Situation**: Carefully read the provided situation details to understand the industry, key players, and market conditions. This will help you apply the five forces accurately.

2. **Analyze Each Force**: Apply each of Porter's Five Forces to the situation. The five forces are:
   - **Threat of New Entrants**: Assess the ease with which new competitors can enter the market. Consider factors such as barriers to entry, capital requirements, and regulatory constraints.
   - **Bargaining Power of Suppliers**: Evaluate the power of suppliers to influence prices and terms. Consider the number of suppliers, the uniqueness of their products or services, and the cost of switching suppliers.
   - **Bargaining Power of Buyers**: Assess the power of buyers to influence prices and terms. Consider the number of buyers, the importance of each buyer to the business, and the cost of switching to alternative products or services.
   - **Threat of Substitute Products or Services**: Evaluate the availability of alternative products or services that can replace the current offerings. Consider the quality, price, and performance of substitutes.
   - **Industry Rivalry**: Assess the intensity of competition among existing players in the market. Consider factors such as the number of competitors, market growth rate, and product differentiation.

3. **Provide Detailed Analysis**: For each of the five forces, provide a detailed analysis that includes specific examples and references to the situation details. Explain how each force impacts the overall competitiveness of the industry.

4. **Summarize Findings**: Summarize your findings in a comprehensive report. Highlight the key insights and implications for the business. Provide recommendations based on your analysis.


Provide your detailed analysis inside <analysis> tags. Make sure to cover all the key points from the steps outlined above. Here is an example of what the analysis should look like:

<example>
<analysis>
1. **Threat of New Entrants**:
   - Barriers to entry are high due to significant capital requirements and strict regulatory constraints.
   - Established players have strong brand loyalty, making it difficult for new entrants to gain market share.

2. **Bargaining Power of Suppliers**:
   - There are few suppliers of critical components, giving them significant power to influence prices.
   - The cost of switching suppliers is high due to specialized equipment and long-term contracts.

3. **Bargaining Power of Buyers**:
   - Buyers are numerous and fragmented, reducing their individual bargaining power.
   - The cost of switching to alternative products is low, giving buyers some leverage in negotiations.

4. **Threat of Substitute Products or Services**:
   - There are several high-quality substitutes available at competitive prices.
   - Substitutes offer similar performance and features, posing a significant threat to the current offerings.

5. **Industry Rivalry**:
   - The industry is highly competitive with numerous players vying for market share.
   - Product differentiation is minimal, leading to intense price competition and marketing battles.

**Summary**:
- The industry is characterized by high barriers to entry and significant supplier power.
- Buyer power is moderate, but the threat of substitutes is high.
- Intense rivalry among existing players further complicates the competitive landscape.
- Recommendations: Focus on strengthening brand loyalty, exploring alternative suppliers, and differentiating products to mitigate competitive pressures.

</analysis>
</example>

Remember, the goal is to provide a thorough and insightful analysis using Porter's Five Forces framework. Take your time to carefully evaluate each force and its impact on the situation

👋 Until the next time…

I hope you enjoyed this first episode of The Prompt Factory. We’re just getting started and have lots more content for you—feel free to reach out if you have constructive thoughts on what you’d like to see!

The Prompt Factory