š Small AI Models: Tiny but Mighty
How Pocket-Sized Powerhouses Are Transforming AI and Making It Accessible to All
š Greetings from the Factory Floor
Iām Duncan Anderson, and Iāve spent the past decade building a career in AI, working with everyone from the largest multinational tech firms to the smallest startups. Now, Iām excited to share my experiences and insights with you. This newsletter is designed to equip you with the practical knowledge and skills you need to thrive in our new AI-powered world.
In this weekās issue:
We shine the spotlight on small models, including Googleās new Gemma2 and Kyutaiās Moshe, and how you can play with them on your laptop.
The Prompt Report: the most comprehensive overview of AI prompting techniques weāve yet seen.
Porterās Five Forces ā a prompt to get you a corner office at McKinsey!
š¢ The Factory Whistle
A focus on small AI Models: The New Giants in Tech
The AI world is buzzing with excitement, and a new breed of AI models is emerging, proving that sometimes, smaller is mightier. We're talking about pocket-sized powerhouses like:
Metaās Llama3 8B
Mistralās Mixtral 7B
And now, Google's latest star: Gemma2
Letās not forget that Apple Intelligence includes a similarly small model that runs locally on your iPhone ā small models are no toys
These "small" models (typically under 10B parameters) are revolutionising the field. Remember, big models like GPT-4 and Gemini are well over 100B in size, so some of these models are really tiny.
Not sure what ā10B parametersā means? Itās a measure of the size of a model. Larger sizes imply a need for more memory and more compute.
Under 10B comfortably fits on a half-decent modern laptop, but as the size increases from there you start to exceed the memory available in consumer hardware. Bigger models are the domain of large servers. And as they get bigger still, you quickly need specialist GPUs like an Nvidia A100 and costs start spiralling. The original large language models were built for very high-end cloud infrastructures, so itās a measure of the level of innovation in the field that weāre now talking about running some of these on laptops and even phones.
Why are small models changing things? Historically, theyāve only been useful as research aides and had limited practical application due to their poor performance. But not any moreāsome of these models now get very respectable benchmark scores. Small models are perfect for fine-tuning, a process that takes an off-the-shelf model and applies additional data to further refine its training and make them good at specific things. This is what Appleās done with Apple Intelligenceācreating a small model that runs on your iPhone.
A good example of how these small models have improved is Google Gemma2ās MMLU benchmark score: the 9B version scores 71.3 and the 27B version 75.2 . OpenAIās GPT-3.5-TURBO scores 70.0, so even the smallest Gemma2 beats it. (higher scores are better, in case you hadnāt guessed)
Now that performance has been addressed, the ability of small models to run smoothly on consumer hardware comes into focus. This brings AI power to your fingertips. No need for massive servers, scarce GPUs or sky-high compute costs. And letās not forget: less compute means lower energy use and less CO2 generated ā small models are better for the planet š
The newest small model, Gemma2 from Google, comes in three flavours:
Tiny (2.6B) - Coming soon, for the AI light snackers
Small (9B) - Perfect for your laptop
Medium (27B) - When you need a bit more oomph and have a server to run it on
Here's the kicker: Gemma2 isn't just small, it's mighty. The 27B version is out there crushing benchmarks, even outperforming its bigger cousin, Llama3 70B. That's rightāhalf the size, all the power. Say goodbye to hefty compute costs and hello to efficient AI!
Gemma2 will even give OpenAIās GPT-3.5-TURBO a run for its money. Given that many AI apps still use GPT-3.5-TURBO because itās āgood enoughā and 10x cheaper than GPT-4, we can see that Gemma2 has some serious potential.
Want to try Gemma2 on your laptop?
Itās really easy with Ollama, a system for running open models on consumer hardware.
Just hop along toĀ Ollama, press download and install the app.
Youāll need a terminal to run Ollama ā find your terminal app (itās installed on all computers).
TypeĀ
ollama run gemma2
Ā into your terminal.Hey Presto, youāre chatting with Gemma2!
Once youāve got that working, why not also try
ollama run phi3
andollama run wizardlm2
?
Perhaps you want Gemma2 running on a server?
Thatās also going to be easy ā Google have already announced one-click support in Google Cloudās Vertex and specialist AI hosting provider Fireworks already has support in place. No doubt AWS and Azure will follow shortly.
Is Gemma2 cheaper to run than the usual commercial models? YesāFireworks charge $0.2/1m tokens, whereas GPT-3.5-TURBO costs $0.5/1m tokens on the input side.
All the big cloud providers have options for hosting a variety of open models, be it Googleās Vertex Model Garden, AWSās Bedrock, or Azureās Model Catalog. But this space is also populated by specialists like Together.AI, Groq, Beam, and Replicate. These folk tend to differentiate from the big boys by moving fast and innovatingāGroq, for example, has become famous for using specialist hardware to deliver the fastest inferencing around. So, sure, itās easy to get an OpenAI key and use their API, but a bit of research could have you running a small and open model with just a tiny bit of effort.
Most people will want a model-as-a service option, which means you only pay for the usage you make. If you have to provision a specific size of server, that means the server is dedicated to you and needs be paid for, regardless of how much use you make of it. Small models are greener, but not if you provision a server and then leave it idling, but still eating energy, for most of the time. Unless you can reliably calculate how much capacity you need, dedicated servers almost always mean you over provision and therefore have servers using energy and not doing much. Youāll consume much less energy/CO2 (and cost) if you opt for a model-as-a-service option, where the compute infrastructure is sized by specialists and shared with others. So, when looking at hosting options be careful to check which options require dedicated servers and which do not.
Small models find their voice
Language models are rapidly shifting towards becoming multi-modalāfor example, including the ability to natively process images and speech. Moshi from Paris-based Kyutai (shocker: not all AI comes from San Francisco!) is a small 7B multi-modal model that very effectively emulates GPT-4oās still unreleased speech ability. The demo is a bit rough around the edges, but itās easy to see how a bit of improvement will take it somewhere pretty awesome. Because itās a small model, Kyutai already have demos of it running locally on laptops and have the bandwidth to improve things because theyāre sitting on $300m of funding.
Function Calling: Small Models, Big Impact
Ever heard of āfunction callingā? Itās one of the most powerful tricks up the sleeve of language models. You can ask a model āWhatās the square root of 1024?ā and it gives you the exact API call to make your calculator do the maths. Or ask āWhatās the weather in London?ā and it hands you the precise call to your weather API. Thatās the power of function callingāconnecting language models to APIs and enterprise systems.
OpenAI pioneered this nifty feature and their models are still top of the charts in function calling benchmarks. But hereās the exciting part: a new contender, gorilla-openfunctions-v2, is nipping at GPT-4ās heels. And surpriseāitās a compact 7B model! Again, we find that small models are getting useful.
Faster, Smarter Responses with RAG & Small Model Magic
Have you heard about RAG (Retrieval Augmented Generation)? Itās a super common way to make language models answer questions using your own data, instead of their training data.
Googleās just published research showing a fascinating approach to RAG: using small models to generate multiple candidate answers, then using a larger model to judge, refine and create a final response from these candidates. This method sped things up by over 50% and boosted accuracy by 13%. Now 13% might seem a relatively modest gain, but trust me when I stay that in a world where most RAG improvements bring only single digit improvements, itās a big deal. Itās also a brilliant example of small models playing the role of cog in a bigger machine.
A word of cautionā¦
As cool as these small models are, they have their limitations. They arenāt competing with GPT-4 and theyāre only likely to be useful for certain tasks today. So it pays not to get too carried awayāthose big models still have a role. But for the right task, a small model will use a fraction of the cost/compute/energy footprint of their bigger brothers and sisters. Working out when a small model might be appropriate needs experience and also some ingenuityātime to put those thinking caps on!
š Key Points š
Small models are getting good and are starting to compete with mainstream commercial models like GPT-3.5-TURBO, but they have limitations and are best used for niche things right now.
You can easily run any of these models on your laptop with Ollama.
Thereās a wide variety of options to host open models in the cloud.
Small models have already started adopting innovative features like function calling and multi-modal abilities.
š Enjoying so far, share with your friends! š
š¬ R&D Department
A group from academia, OpenAI, Microsoft and others have just released groundbreaking research, "The Prompt Report," providing the most comprehensive analysis of prompt engineering to date. This isn't just for techies ā everyone interacting with AI can benefit from their insights! š
What's inside the report?
A massive taxonomy of 58 text-based prompting techniques, categorized for easy understanding and implementation.
Exploration of multilingual and multimodal prompting, going beyond English text to include images, audio, and more.
A deep dive into AI agents, where AI uses external tools like calculators and the internet to solve complex problems.
Crucial discussions on evaluating AI output, security risks, and ethical considerations ā ensuring your AI usage is responsible and effective.
š Checkout the The Prompt Report: https://arxiv.org/abs/2406.06608
š Prompt of the Week
This week weāve got a prompt that will make you sound like you've got an MBA and a corner office at McKinseyā¦
Porter's Five Forces is a strategic analysis framework that evaluates the competitive intensity and attractiveness of an industry. By analyzing the forces affecting a business, you can assess its competitive position, identify potential threats and opportunities, and develop strategies to enhance its profitability and market share.
This prompt taps into the power of Porters Five Forces ā just be sure to replace the variables {{ }} with the specifics of the situation youād like analyzed.
Who knows? Use this prompt and you might find yourself fielding job offers from big consultancies faster than you can say "synergy"!
You are tasked with analysing a business situation using Porter's Five Forces framework. I will provide you with the details of the situation, including the industry, key players, and any relevant market conditions. Your job is to apply each of the five forces to the situation and provide a detailed analysis.
Here are the details of the situation:
<situation_details>
{{SITUATION_DETAILS}}
</situation_details>
Follow these steps to complete the analysis:
1. **Understand the Situation**: Carefully read the provided situation details to understand the industry, key players, and market conditions. This will help you apply the five forces accurately.
2. **Analyze Each Force**: Apply each of Porter's Five Forces to the situation. The five forces are:
- **Threat of New Entrants**: Assess the ease with which new competitors can enter the market. Consider factors such as barriers to entry, capital requirements, and regulatory constraints.
- **Bargaining Power of Suppliers**: Evaluate the power of suppliers to influence prices and terms. Consider the number of suppliers, the uniqueness of their products or services, and the cost of switching suppliers.
- **Bargaining Power of Buyers**: Assess the power of buyers to influence prices and terms. Consider the number of buyers, the importance of each buyer to the business, and the cost of switching to alternative products or services.
- **Threat of Substitute Products or Services**: Evaluate the availability of alternative products or services that can replace the current offerings. Consider the quality, price, and performance of substitutes.
- **Industry Rivalry**: Assess the intensity of competition among existing players in the market. Consider factors such as the number of competitors, market growth rate, and product differentiation.
3. **Provide Detailed Analysis**: For each of the five forces, provide a detailed analysis that includes specific examples and references to the situation details. Explain how each force impacts the overall competitiveness of the industry.
4. **Summarize Findings**: Summarize your findings in a comprehensive report. Highlight the key insights and implications for the business. Provide recommendations based on your analysis.
Provide your detailed analysis inside <analysis> tags. Make sure to cover all the key points from the steps outlined above. Here is an example of what the analysis should look like:
<example>
<analysis>
1. **Threat of New Entrants**:
- Barriers to entry are high due to significant capital requirements and strict regulatory constraints.
- Established players have strong brand loyalty, making it difficult for new entrants to gain market share.
2. **Bargaining Power of Suppliers**:
- There are few suppliers of critical components, giving them significant power to influence prices.
- The cost of switching suppliers is high due to specialized equipment and long-term contracts.
3. **Bargaining Power of Buyers**:
- Buyers are numerous and fragmented, reducing their individual bargaining power.
- The cost of switching to alternative products is low, giving buyers some leverage in negotiations.
4. **Threat of Substitute Products or Services**:
- There are several high-quality substitutes available at competitive prices.
- Substitutes offer similar performance and features, posing a significant threat to the current offerings.
5. **Industry Rivalry**:
- The industry is highly competitive with numerous players vying for market share.
- Product differentiation is minimal, leading to intense price competition and marketing battles.
**Summary**:
- The industry is characterized by high barriers to entry and significant supplier power.
- Buyer power is moderate, but the threat of substitutes is high.
- Intense rivalry among existing players further complicates the competitive landscape.
- Recommendations: Focus on strengthening brand loyalty, exploring alternative suppliers, and differentiating products to mitigate competitive pressures.
</analysis>
</example>
Remember, the goal is to provide a thorough and insightful analysis using Porter's Five Forces framework. Take your time to carefully evaluate each force and its impact on the situation
š Until the next timeā¦
I hope you enjoyed this first episode of The Prompt Factory. Weāre just getting started and have lots more content for youāfeel free to reach out if you have constructive thoughts on what youād like to see!