By @IBMResearch
Publication Date: 2025-11-12 16:50:00
In photography, there’s an adage: The best camera is the one you have on you. For the computing revolution that’s taken place over the last two decades, this has also been true. The laptops and phones in our hands are often much easier to access than powerful cloud clusters, and oftentimes, are capable enough to get the jobs we need done. That’s quickly proving true for AI, as well.
Large language models have progressed rapidly in capability over the last few years, although usually at the expense of their size. The largest frontier models require hundreds of gigabytes to store, and even more memory to run properly. Even high-powered, modern laptops struggle to run the massive LLMs locally, and making calls to the cloud can often take longer to get an answer than users are willing to wait.
But recent research suggests that for a large majority of the queries that the average user has in a given day, much smaller LLMs can handle their requests adequately. It’s something IBM…