The Depreciation Battleground
I was in my Freshman year in college when I first watched this Khan Academy video “Is short selling bad?” To this day, it remains a key foundation for my belief that short sellers are a net positive force in capital markets.
Despite such belief, I myself never shorted any individual stock (but I did buy puts on index at times). John Hempton wrote pretty persuasively the pitfall of shorting individual stocks:
“In shorting frauds, this is the sort disaster that sometimes befalls you:
a) You short a stock at $10 run by a promoter who you suspect is a liar. You are (as nearly as possible) certain that this stock is worthless. You hope to cover at $1.
b) The promoter makes up a story that somehow retail seems to think is real and the stock trades at $40.
c) You are forced to buy some back – because there is no conceptual reason why the stock can’t trade at $80. After all it is no sillier at $40 or $80 than it was at $10 (it was worth 100 percent less at all times).
d) After you cover the stock normally goes to $1 (as you expected all along) though it might go through $100 on the way.
This is actually a fairly common event for us. We manage well over 200 shorts with the specific goal of blunting the impact of any such individual disaster. And the diversity normally works.”
As an individual investor managing my own money, it just didn’t seem quite feasible to short stocks well, especially after considering the return on brain damage and the potential for a random stock turning into a meme for obscure reasons. I’m sure some people can do this well even within such constraints, but I have decided long ago that I am not going to be one of them.
One person who essentially achieved the celebrity status for shorting the GFC correctly was Michael Burry. Burry recently launched a Substack: “Cassandra Unchained”. Burry is an incredibly gifted writer, and I enjoyed reading every single piece published so far.
However, it was during college that I learned first hand not to confuse good rhetoric with accuracy of the logic. During my Junior year, I went to Manila in 2012 to participate as an adjudicator in the World Universities Debating Championship. It was quite the teachable moment for me as I got to observe some of the world’s best debaters closely. In case you don’t know how debate competition works, the debaters are given a motion or topic just ~15 minutes before the debate starts. There are four teams (two in favor of the motion, and two opposing the motion), but they don’t get to pick a side. They are assigned randomly whether they are supposed to speak in favor or against the motion. After they all speak for seven minutes each, the debaters leave the room and adjudicators then discuss among themselves how to rank the four teams. Following this discussion, the debaters come back to the room and adjudicators then explain the results to the debaters. To this day, this was one of the most stressful moments for me because I was often explaining my decision to a group of people who are almost always superior to my own rhetorical ability.
Nonetheless, the whole experience showed me there are gifted people out there who can basically argue for or against a topic in a moment’s notice even if they need to take opposite position of what they actually believe. But what I took the most from that experience is I should never confuse a good rhetoric with its accuracy. In fact, I often wonder that being able to write or speak too well comes with a profound, hidden risk: you can always come up with compelling reasons why you are not wrong! Perhaps this is mother nature’s way to eliminate or at least negate a lot of the advantage of such a gift.
Is Burry inaccurate in some of the concerns he laid out about big tech’s depreciation schedule? Before I dig into that, I should start by saying that Burry’s Substack is bit of a fresh air. Not only are there too many bullish think pieces on AI these days, the few bearish voices often fall so short on making their cases that they end up losing their credibility altogether.
Burry, on the other hand, is a very good student of the market. You can tell he has been at this for a much longer than most of us. I only read about tech bubble (and crash), but Burry lived (and invested) through it. In one of his pieces, these couple of sentences stood out to me:
I can tell you firsthand how the market peak appeared on March 10, 2000. That is, it happened for no apparent reason…As 2000 progressed, parts shortages and capacity constraints were the rule.
He then cited a few quotes from Cisco’s earnings calls in 2000. Some quotes from Cisco calls that Burry Cited:
“We see no indications in the marketplace that the radical Internet business transformation… is slowing — in fact, we believe it is accelerating globally.”
Cisco CEO: Q4 earnings release, August 2000
“Cisco is fortunate to be at the center of an economic revolution that is reshaping not only the economy, but all facets of the society.”
Cisco CEO: press release September 24, 2000
“We haven’t seen any sign of a slowdown. We have guided the Street accurately, and we can execute to plan.”
Cisco Chief Strategy Officer: Nov 3, 2000
Reading these made me realize that it is highly likely that hyperscalers will probably continue to say “demand outstrips supply” for a quarter or two in their earnings calls after the peak! In fact, when I wondered how big tech communicated during 2022 slowdown, I realized they weren’t quite proactive in giving the signal to the investors. In retrospect, Meta was noticeably different from the pack then. As a founder led company with voting control, Meta likely feels more empowered to communicate in a more authentic manner with the investors. During 4Q’21 call, Meta explicitly quantified the headwinds from ATT:
we believe the impact of iOS overall as a headwind on our business in 2022 is on the order of $10 billion, so it’s a pretty significant headwind for our business.
Perhaps how Meta navigates this AI capex bonanza, especially at the face of potential investor skepticism can be important signal to where we are in the cycle.
Going back to Burry’s arguments on big tech’s depreciation schedule. Regular readers are likely aware that I myself was deeply concerned about this in early 2025. However, I later changed my mind and explained why I don’t worry (as much) about big tech’s depreciation schedule anymore.
The distinction between AI workloads is key: training frontier models demands the newest chips, while inference (running the models) is less demanding. I mentioned about the “value cascade” model before which works as follows: New chips (e.g. Blackwell) will handle frontier training, displaced chips (H100) move to high-end inference or fine-tuning, and even older chips (A100) will move to bulk inference or other accelerated computing tasks. I cited some historical precedents that show, if anything, big tech’s depreciation schedule had been overly conservative in the past.
Burry’s argument is that the historical precedents are obsolete because the pace of innovation has fundamentally changed. Nvidia has moved from an 18–24 month cycle to a 1-year cycle. More importantly, the generational leaps are not just about power, but about efficiency (Total Cost of Ownership or TCO). If a new chip is vastly more efficient, the economic justification for running older hardware may evaporate.
To substantiate his case, Burry cited none other than Satya Nadella who said the following in Dwarkesh podcast:
“The other thing is that I didn’t want to get stuck with massive scale of one generation. We just saw the GB200s, the GB300s are coming. By the time I get to Vera Rubin, Vera Rubin Ultra, guess what, the data center is going to look very different because the power per rack, power per row, is going to be so different. The cooling requirements are going to be so different. That means I don’t want to build out a whole number of gigawatts that are only for a one-generation, one family.”
Burry’s interpretation from this quote was while Microsoft is still depreciating chips and servers over 6 years, and data center buildings at 15 years or longer, this quote itself is an indication that the accounting useful lives are fiction.
That wasn’t my interpretation. Nadella is talking about capex pacing and concentration risk. He is not explicitly commenting about the true economic life of a chip. That’s consistent with a world where top‑tier training usage of a GPU generation might be 1–3 years, but the same silicon can still earn its keep for another few years in lower tiers (inference, smaller models, other accelerated workloads). Phasing your build so you’re not massively overweight one vintage is just normal capital discipline when the tech curve can be steep. It doesn’t logically require that the total economic life is only 2.5–3 years.
Even if the older chips are materially less energy‑efficient than the new chips, there are workloads where latency doesn’t matter much, or the alternative is CPU, which can be worse on both cost and performance.
Let me give a concrete example. Let’s say you’re YouTube. You have billions of videos, and for each video you want to store a little “fingerprint” i.e. a list of numbers that capture what the video is about (for search, recommendations, etc.). You don’t make these fingerprints one‑by‑one while a user is waiting. You do it in huge background jobs. Maybe every night you scan all the new videos, and you run a big model that turns each video into its fingerprint. This job might run from 2:00–4:00 am in some data center. No one obviously cares if it finishes at 3:00 or 3:40. It’s not blocking a user click. What matters is the total cost to chew through that giant pile of videos. So you take a bunch of older GPUs (like A100s) that you already own, feed them huge batches of videos at once, and keep them busy close to 100% the whole time.
Even though these GPUs are “old” and less efficient than new ones, they are much faster and cheaper for this math‑heavy job than CPUs would be. This is one way old GPUs, for example, can still be very useful and cheaper than trying to do the same work on regular CPUs.
Having said that, I still have sympathy for one particular argument against big tech’s depreciation schedule. Depreciation is supposed to allocate cost over the period you get economic benefits from the asset. If benefits are front‑loaded, an accelerated method may be preferable than straight‑line. The reality for hyperscalers is the new GPUs generate way more revenue per unit in the early “frontier training + premium inference” years. The economic value curve is non‑linear i.e. big in years 1–2–3, then tapers as chips get pushed down the value cascade.
Picking straight line depreciation schedule is obviously flattering to near‑term EPS and ROIC. If you switched from straight‑line to an accelerated method, you would take a larger depreciation hit right when the market is obsessed with AI. It is perhaps not a surprise that no CFO is volunteering for that. If you squint harder, there may be a plausible justification: the cascade + workload diversity means GPUs do meaningful work for many years, so simple straight‑line isn’t wildly wrong, and any more sophisticated method would be guesswork. And given how flattering the straight line depreciation potentially is for big tech’s EPS today, who would take the risk of such a guesswork, especially if it turns out to be unnecessary conservatism?
In many ways, there are many layers to what the “correct” depreciation schedule is. And it may not be a boring accounting question, but at its heart, this may be a technological question. You can even wonder whether it’s a geopolitical question; if SOTA model training clusters can be built more and more in a more power abundant locations (i.e. middle east), I can see how that may introduce a different dimension to the overall debate.
Nonetheless, I do acknowledge that Burry is probably sniffing in the right areas. If you want to build a proper short case against big tech today, this is probably the area you should focus on. But I continue to believe Burry is still short of evidence. That doesn’t mean he’s wrong; he might simply be early. As someone on the other side of his bet, I intend to follow this debate closely wearing my old adjudicator hat.
In addition to “Daily Dose” (yes, DAILY) like this, MBI Deep Dives publishes one Deep Dive on a publicly listed company every month. You can find all the 65 Deep Dives here.
Current Portfolio:
Please note that these are NOT my recommendation to buy/sell these securities, but just disclosure from my end so that you can assess potential biases that I may have because of my own personal portfolio holdings. Always consider my write-up my personal investing journal and never forget my objectives, risk tolerance, and constraints may have no resemblance to yours.
My current portfolio is disclosed below: