The Data Center Story: Part 1

I came across this recently launched podcast named “stepchange”. The podcast is basically Acquired-style story telling of “human progress through the lens of transformative technologies, systems, and infrastructure”. They have covered two topics so far: Coal, and Data Center. Naturally, I started with their Data Center episode.

Since I grew up in Bangladesh, loadshedding was a daily occurrence in my childhood. In fact, one of my lasting childhood memories of my brother and me studying in the evening using kerosene powered hurricane lamp. For the uninitiated, I am not some old dude reminiscing about yesteryears; I’m just a guy in his mid-30s who was born in a country where it used to take awfully long time for future to arrive. But the diffusion of progress and technology has undeniably got faster and faster at an unprecedented speed in my lifetime.

Figure: A kerosene powered hurricane lamp; Image Source here

While the playing field will never quite be leveled across the world, today’s revolutionary technologies (e.g. chat bots) is much more accessible to kids in Silicon Valley and “most” kids born in Bangladesh. I quite marvel at the diffusion of technology, and it is the rate of diffusion for which I have always had a soft corner for Alphabet and Meta who both went above and beyond to ensure their products reach every nook and cranny of the world. While it is quite fashionable to drone about their shortcomings in the pursuit of such diffusion, you probably needed to study under kerosene powered hurricane lamps to appreciate the accelerating pace of diffusion for more people to have a chance at social mobility over time.

One technology that certainly was an accelerant of such diffusion was data centers. While today’s data centers look gigantic, the early days of “data centers” looked nothing like it. The stepchange guys argue IBM’s punch cards were essentially the first data centers. It is hard to appreciate how different the world looked when processing and storing information were entirely manual; for example, it took seven years just to tabulate the data collected for 1880 census data. Even in 1960s when air travel started to take off, you had to call your travel agent who would then call the airline, the airline would then have to run clerks around, pulling cards out to book a seat…the whole process apparently used to take 90 minutes for each booking!

The episode also enhanced my admiration for Walmart even further. Walmart may not have a conspicuous revenue and profit driver such as AWS to make them look like a “tech company” today, but Walmart’s penchant for technology ran pretty deep. Perhaps it isn’t a surprise why despite starting to sell groceries in 1988, Walmart became the largest seller in groceries in just 13 years! In the late 80s, Walmart was essentially at the frontier in integrating technology in every facets of their business. An excerpt from the podcast:

…you're at the register at your neighborhood Walmart location: you scan the toothpaste, the barcode beeps. Within seconds, a satellite dish behind the store sends that transaction to the sky and over to Bentonville, Arkansas, where they're hosting their mainframe computer to record the sale.

A few minutes later, a massive data warehouse would update how many tubes of toothpaste you had just bought. Then, perhaps before the end of the day, Procter & Gamble's factory would receive an update that they needed to make more toothpaste.

This was the cutting edge of retail in the late 80s, and Walmart made a massive decision to take this a step further, truly driving innovation across the retail industry. They invested $24 million to build their own private satellite network linking all Walmart stores to headquarters. This was fairly unprecedented at the time…It was the largest private satellite network.

So any single event that happened within the Walmart ecosystem rode on this private network, enabling them to mine their data in a way that was unheard of before.

By mining their sales data – which could now be collected in real time across all Walmart stores over their private satellite network – they discovered that when hurricanes approached, the sale of Pop-Tarts increased 7x over their normal rate.

I am 75% into the four-hour episode, and one thing that stood out from the episode is their discussion on PUE or “Power Usage Effectiveness”. What is PUE? From Wikipedia:

PUE is the ratio of the total amount of energy used by a computer data center facility to the energy delivered to computing equipment. PUE is the inverse of data center infrastructure efficiency…PUE was published in 2016 as a global standard

The interesting bit was that PUE as a metric was only presented in a paper in 2006 and established as a global standard in 2016. Of course, data centers were around long before people were optimizing for PUE, but it was still instructive for me to understand just how inefficiently many data centers were operated not so long ago. From the podcast:

It hit the industry like a storm because the PUE race was now on, and Google wanted to win. Many enterprises had PUE numbers around 2, meaning twice as much power went to lights, cooling, and everything else compared to the amount of power that actually ran the IT hardware. Google pushed it down to 1.1, meaning only 10% of the power was not directly used to power the computers.

Their internal teams took hold of this and ran with it. They used this rethinking of the data center from the ground up as one computer system, with software being the reliability layer, not the hardware. This, along with innovative geographic placement, drove performance and PUE. It's come to a point where it essentially can't be optimized past one.

We have all heard about Google’s infrastructure advantage, but these advantages were often consequences of their ability to assemble such technically competent teams. Of course, Google (or any company) never had monopoly on talent. Facebook understood Google’s deep advantages here and realized they needed a different approach to play catch up to Google. From the podcast:

“so they're (Facebook) realizing, 'Why is everyone making so much money on us?' There's got to be a better way to do this. And there are two broad ways that they go about this. One is similar to what we've seen with Amazon, Google, and Microsoft, which is the thought that, 'Hey, we can do this better, cheaper if we build our own custom data center.' And so their first purpose-built facility was announced on January 10th and began operations in 2011 in Prineville, Oregon. And they soon followed with North Carolina, Sweden, and Iowa, and expanded rapidly. But Prineville was, just like all the others, quite intentional and quite strategic.

…if you look at the numbers of server growth, if you go back just to 2008, they had 10,000 servers; in 2009, 30,000 servers. Supposedly by 2010, when they are really kicking off and building the Prineville project, they have around 60,000 servers. They're trying to figure out how to scale up.

so they decide to engineer the building and the servers together. This very purpose-built facility. And just like the others, they realized that they can beat industry standards. And so they very famously and publicly came out with this first purpose-built facility with a very aggressive target of a 1.15 PUE. Industry average is 1.5. Historically, they were north of 2, and they were able to report 38% less energy, 25% lower cost against prior facilities. And do you think they ended up beating their PUE?

Smoked it: 1.07.

So that means that only 7% of the energy going to run the data center went to anything other than powering the IT equipment. That is phenomenal.

Did they keep this to themselves, how they did this? I think this is what sets Facebook's approach apart.

This first part they needed to do, and they executed extremely well. But the second thing they did was far more revolutionary. In a very secretive world of data center development where each hyperscaler kept their builds to themselves, in April of 2011, Facebook open sourced their blueprints. They announced the Open Compute Project.

It's hard to overstate how radically different an approach this was. This is an industry that kept the design of the servers in the data centers extremely secretive. They viewed that as core IP, differentiation, and notions of security. Google would publish papers, especially on things like PUE, and be visible about metrics that they wanted to highlight. But they famously didn't let anyone into their data centers until this really changed the game.

Their motivation is pretty interesting. They look around at all the other big data center companies and they're seeing all the margins that are being made. They're like, 'Well, wait a minute, if we publish our blueprints and we get everyone else to buy in and publish theirs, that means we can drive down the costs by standardizing what we're building.'

Ultimately, what they want is for their suppliers to be in increased competition. The best way to do that is not to make one deal with one supplier, but to say, 'Hey, suppliers, this is what we need, this is what we like. You make this, we'll buy it as long as it's the cheapest one out there.'

…it flips the power dynamic. Now, instead of the vendors and suppliers dictating what the specs are and having multiple different specs for multiple different customers, they're able to standardize it and say, 'You're going to respond to the Open Compute Project standards.”

I knew about “Open Compute Project” and Meta’s strategy to “commoditize your complement” which they have tried to replicate in SOTA model as well (so far, not so successfully), but it was still quite interesting to learn just how much better efficiency in infrastructure may have been a tailwind for both Meta and Alphabet in much of 2010s. Of course, many of these low hanging fruits are now gone, but it is hard to appreciate in the moment what else these companies may be cooking to improve their infrastructure. Given their ever increasing depreciation expenses, the question may have been never more relevant than it is today.

I hope to write more about the episode tomorrow as well.


In addition to "Daily Dose" (yes, DAILY) like this, MBI Deep Dives publishes one Deep Dive on a publicly listed company every month. You can find all the 62 Deep Dives here.


Current Portfolio:

Please note that these are NOT my recommendation to buy/sell these securities, but just disclosure from my end so that you can assess potential biases that I may have because of my own personal portfolio holdings. Always consider my write-up my personal investing journal and never forget my objectives, risk tolerance, and constraints may have no resemblance to yours.

My current portfolio is disclosed below:

This post is for paying subscribers only

Already have an account? Sign in.

Subscribe to MBI Deep Dives

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe