Ready to get started?
No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.
Book a demoIs that the smell of content deals going up in smoke? LLMs are reaching a data and compute scaling plateau, and better data might not be the answer.
Scale, we were once assured, was the answer for LLM systems. The publicity hungry commercial prophet and/or charlatan of such thinking was Sam Altman of OpenAI. Indeed, it was Altman who wrote of a "Moore's Law for Everything" back in 2021.
The answer way back then in the mists of time was that more data and greater computational power would produce better results in LLM usage. Simply, ha said, the more data LLMs have to work with and the more power they have to process that data, the better the results will be.
This week, that notion was brought into serious question by none other than OpenAI dissident and co-founder of Safe Superintelligence (SSI), Ilya Sutskever.
Speaking to Reuters, Sutskever said "The 2010s were the age of scaling, now we're back in the age of wonder and discovery once again. Everyone is looking for the next thing.
“Scaling the right thing matters more now than ever.”
Reuters added that, "Sutskever declined to share more details on how his team is addressing the issue, other than saying SSI is working on an alternative approach to scaling up pre-training."
Roughly, this "alternative approach" means looking at what is done with the data once the system has it, and not just adding more.
This all happened as a report in The Information suggested that OpenAI is struggling to make a significant leap forward with its forthcoming Orion model: "While Orion’s performance ended up exceeding that of prior models, the increase in quality was far smaller compared with the jump between GPT-3 and GPT-4, the last two flagship models the company released, according to some OpenAI employees who have used or tested Orion."
There is also a general consensus that most of the high quality data out there has been vacuumed up already, and, as a colleague here at Glide put it, "just adding everything The Economist has ever produced isn't the answer" to the question of diminishing returns.
The increasing use of synthetic data doesn't seem to be helping either, to the surprise of no one who has ever written anything original.
So not more data, and not more computational power. It seems that the solution everyone is heading for is giving the LLMs more time to process. Note, I will not use the word "think" for what are basically pattern recognition systems at heart.
By way of example, the Reuters story quoted Noam Brown, a researcher at OpenAI who has worked on their latest o1 model. "It turned out that having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer."
If those words are thought about in terms of cost, then you're looking at a vast difference in expenditure. And make no mistake, it is the economics of such technical advances that dictate the direction they go and eventual success or failure.
At present, OpenAI's GPT-4 remains the perceived gold standard in publicly available systems. Plenty would argue that assessment from a scientific standpoint, but for most people it's the only AI they know, and in this space that's what they mean by gold.
Such is our collective and rather unreasonable expectation about the speed of technological advances in the field, and the promises reinforcing that expectation from people such as Altman, that to realise there may currently be a hard limit to them, at least along the path we were sold on, does put the brakes on the hype train somewhat.
And it raises the immediate question, of course, of what this means for mooted future income from content deals, if - as many hope - having quality content alone is the secret to a big cheque signed by Sam and his ilk.
The phrase currently in play is "test-time compute", along with the slightly more cryptic "inference compute". Essentially, this is a processing step being utilised in order to further analyse the results of the trained data. As in the poker example given by Brown above, if the system is given time to refine an answer, then the 'better' that answer will be.
Twenty seconds. If only everyone had to wait 20 seconds before responding to stuff, we might live in a better world. Apart from the road accidents.
How does Glide Publishing Platform work for you?
No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.
Book a demo