“[𝑞𝑢𝑎𝑙𝑖𝑡𝑦 𝑑𝑎𝑡𝑎] 𝑖𝑠 𝑡ℎ𝑒 𝑏𝑖𝑔𝑔𝑒𝑠𝑡 𝑖𝑛ℎ𝑖𝑏𝑖𝑡𝑜𝑟 𝑓𝑜𝑟 𝑐𝑜𝑚𝑝𝑎𝑛𝑖𝑒𝑠 𝑡ℎ𝑎𝑡 ℎ𝑎𝑣𝑒 𝑎𝑙𝑟𝑒𝑎𝑑𝑦 𝑖𝑛𝑣𝑒𝑠𝑡𝑒𝑑 𝑛𝑜𝑤 𝑖𝑛 𝐿𝐿𝑀𝑠, 𝑎𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 𝑎𝑛𝑑 𝑝𝑒𝑜𝑝𝑙𝑒”. This was the quote that stood out the most in CB Insights “Enterprise AI Report” released last week.
A few interesting insights and takeaways:
🚀 𝟏. 𝐓𝐡𝐞 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 𝐠𝐚𝐩 𝐛𝐞𝐭𝐰𝐞𝐞𝐧 𝐨𝐩𝐞𝐧 𝐬𝐨𝐮𝐫𝐜𝐞 𝐚𝐧𝐝 𝐜𝐥𝐨𝐬𝐞𝐝 𝐬𝐨𝐮𝐫𝐜𝐞 𝐢𝐬 𝐜𝐥𝐨𝐬𝐢𝐧𝐠 𝐟𝐚𝐬𝐭:
Meta’s open-source Llama-3-70B recently outperformed Anthropic’s Claude-3-Sonnet according to the MMLU benchmark (although Claude-3.5-Sonnet is back to being stronger than the Llama models).
As business leaders grapple with financial constraints, they will have to find the sweet spot between performance, cost, and flexibility while considering the ROI of open source models.
⭐️ 𝟐. 𝐁𝐢𝐠𝐠𝐞𝐫 𝐢𝐬𝐧’𝐭 𝐚𝐥𝐰𝐚𝐲𝐬 𝐛𝐞𝐭𝐭𝐞𝐫:
Smaller language models (SLMs) built for specific use cases are not only often faster and cheaper, but can also outperform LLMs
For example, Microsoft Phi-3 with 7B parameters outperformed ChatGPT 3.5 trained on 20B parameters, as measured by MMLU.
And of course, Refuel-LLM-2, our purpose-built model, outperforms GPT-4-Turbo on data labeling, cleaning and enrichment benchmarks.
Domain-specific-models are not an opportunity enterprise buyers should shy away from, and should be explored for task specific applications.
📈 𝟑. 𝐏𝐫𝐨𝐩𝐫𝐢𝐞𝐭𝐚𝐫𝐲 𝐚𝐧𝐝 𝐜𝐥𝐞𝐚𝐧 𝐝𝐚𝐭𝐚 𝐚𝐫𝐞 𝐞𝐯𝐞𝐫𝐲𝐭𝐡𝐢𝐧𝐠:
Clean data minimizes downstream AI effects and proprietary data drives differentiated business outcomes.
As the quote below aptly alludes to, curating quality data and developing the supporting infrastructure will become the lifeblood of product development and the determinant of success in the era of Gen AI.
We’re lucky to see this in action every day with our customers and partners — good data strategy, the curiosity and bravery to try task-specific models and focus on ROI — 𝐭𝐡𝐞𝐬𝐞 𝐚𝐫𝐞 𝐭𝐡𝐞 𝐢𝐧𝐠𝐫𝐞𝐝𝐢𝐞𝐧𝐭𝐬 𝐭𝐨 𝐬𝐮𝐜𝐜𝐞𝐬𝐬 𝐰𝐢𝐭𝐡 𝐀𝐈 𝐭𝐨𝐝𝐚𝐲.
Which takeaway stood out to you the most?