Scaling Law Skepticism
Peter Lee, Head of Microsoft Research, once quipped a joke about Moore’s law: “There’s a law about Moore’s law…the number of people predicting the death of Moore’s law doubles every two years.”
As a reminder, Moore’s law is not a law of physics or mathematical theorem. It is just an prediction/observation that ended up being incredibly prescient. I like how Mark Liu, the former Chairman of TSMC, characterized Moore’s Law as essentially a “shared optimism”. Chris Miller said in “Chip War”: “The making of Moore’s Law is as much a story of manufacturing experts, supply chain specialists, and marketing managers as it is about physicists or electrical engineers.”
Given the qualitative elements to Moore’s law, in retrospect it doesn’t surprise me that it had so many skeptics along the way. Of course, Moore’s law is so yesteryears; there’s a new law in town: scaling laws! If you throw more compute, more training data and more parameters, you will get a better AI model. Unlike Moore’s law, I have hardly come across any ardent skeptics of scaling laws. I’m sure they are out there, but they’re probably still few and far between for me to notice.
So, I was particularly curious to read yesterday Sara Hooker’s recent essay/paper: “On the slow death of scaling”. Hooker is former Head of Cohere Labs and former Research Scientist at Google DeepMind. After reading her essay, I went through a couple of her prior work, and I particularly enjoyed “The Hardware Lottery” paper from 2021 which argued that AI progress is often determined by what runs best on available hardware rather than what is scientifically the best idea; if you’re visually inclined, you can watch her discuss the paper here in four minutes.
I will discuss some key excerpts from her most recent paper on scaling laws behind the paywall.
In addition to “Daily Dose” (yes, DAILY) like this, MBI Deep Dives publishes one Deep Dive on a publicly listed company every month. You can find all the 65 Deep Dives here