The Robustness-Efficiency Trade-Off (RETO) states that systems (systems used herein the very general sense meaning it could be a project, company, country, society, etc.) must trade-off between being Robust and being Efficient.
This is one of the most important tradeoff spaces and gaining a better understanding of it will make you a better entrepreneur, investor, ecologist, or almost anything else that deals with inherently complex systems.
To start with a simple example, let’s say you have 10 hours to edit some articles and it takes you two hours to edit an article in full. You have to make a decision between:
- Editing one article five times and making sure your edits are extremely robust and precise.
- Editing five articles one time and being more efficient but at the cost of some robustness and confidence in the piece.
- Something in between those two points.
When resources are scarce (and time is always scarce – the universe isn’t making any more of it…), there is a tradeoff between being more efficient and being more robust.
If we’re deciding where on the RETO spectrum to fall, we need to consider the cost of failure and how easy it is to reverse engineer.
If failure is cheap or easy to recover from, then optimize for efficiency. The web developer building a puppy photo-sharing app can optimize for high efficiency at the cost of robustness – hence the startup adage to “move fast and break things.” The cost of making a mistake is cheap. No one freaks out if their puppy photo won’t load for an hour.
If failure is costly and hard to recover from, then robustness is more important. The nuclear power plant engineer is in the precisely opposite position. Nuclear engineers do not “move fast and break things” in the name of efficiency. They move at Sloth like speed and quintuple check everything. The impact of making a mistake is catastrophic hence the saying “measure twice, cut once.”
The fourth backup generator is going to be idle 99.9999% of the time. In that sense, having it around is very “inefficient,” but if it gets used only once in a millennium to prevent a catastrophic meltdown, then it was money very well spent.
In general, I find most people are bad at making this tradeoff appropriately. In some areas, people tend to err too much on the side of robustness and in others on the side of efficiency. Let’s look at a few examples to try and tease to out.
The 70% Rule
For most small businesses and startups, the error is that they tend to worry about robustness and “getting things right” too soon. When you don’t know what is going to work, getting three projects that are 70% done out the door and shipped is better than obsessing over perfecting just one.
I launched the first version of The End of Jobs with probably 100 typos in it. It could probably have been cut down another fifty pages. It was maybe 70-80% as good as it could have been.
However, it only took me nine months from writing the first word to having the book published and for sale which is probably three times faster than typical.
It didn’t make sense to take the time to get it to 95% or 99% perfect because:
- It was cheap to fix mistakes later – since I self-published the book, editing it just required updating a PDF and re-uploading it to Amazon so once it started to sell well, I quickly fixed all the errors.
- I didn’t know if it was going to work – I’m not James Patterson, J.K. Rowling, or Jim Collins. If I knew a million people were going to buy my book on the first day, I would have spent way more resources on it. But I thought maybe 50-100 copies would get sold on day one and most of those would be my mom. (I was pleasantly surprised quite a few more copies were sold than that, thanks mom!)
A fair amount of startup advice tends to focus on this point. The Lean Startup movement gets at this (though it has its issues) and the popular quips like Mark Zuckerberg’s “move fast and break things” or Reid Hoffman’s “If you are not embarrassed by the first version of your product, you’ve launched too late.”
Many people working at startups tend to come from larger companies where the cost of being wrong is higher and they carry that sensibility over. If you are overseeing an integration of a new accounting system for a 1,000 person company with thirty stakeholders then sure, you don’t want to get it wrong and you should spend a few months and talk to the salespeople and do deep feature comparisons.
If you are doing it for a 10 person company, just flip a coin and pick Quickbooks or Xero, they are basically the same and it won’t be that hard to switch in a year if you change your mind. The cost of delaying is higher than the cost of being wrong.
There is often a strong ego component here as well. No one likes to be wrong and it’s important for company culture to reward people that make small mistakes then fix them as opposed to those that never make mistakes as that’s likely the behavior you want to encourage.
I’ll give Jeff Bezos the final word on this point:
“First, never use a one-size-fits-all decision-making process. Many decisions are reversible, two-way doors. Those decisions can use a light-weight process. For those, so what if you’re wrong? I wrote about this in more detail in last year’s letter.
Second, most decisions should probably be made with somewhere around 70% of the information you wish you had. If you wait for 90%, in most cases, you’re probably being slow. Plus, either way, you need to be good at quickly recognizing and correcting bad decisions. If you’re good at course correcting, being wrong may be less costly than you think, whereas being slow is going to be expensive for sure.” 1
Now, if a mistake is fatal, you really don’t want to make it. For nuclear power plant type decisions, you want to be on the other side of RETO.
My favorite example of how this goes wrong is from the German forestry service in the 1700s.
In the late 18th century, the German government started growing “scientific forests” so they could more easily track and harvest timber.
The German government wanted to be able to forecast and plan how much timber needed to be harvested each year to provide enough firewood to their citizens and ships to their sailors.
The underbrush was cleared since it was hard to quantify and it did not produce usable timber. The number of species in the forest was reduced, often to one, because it was easier to track.
The result was mass plantings of a single species of tree, done in straight rows and grids on large tracts of land. It looked more like a tree farm than a forest. They were making it more efficient, right?
The first plantings by the government did well because they were able to use the nutrients in the soil that had accumulated over centuries. This created an initial surge in the amount of timber available to the German industry.
This initial surge in timber increased the German central planners’ confidence in the plan working. Since early results were positive, it seemed to be working and so they built more scientific forests.
Narrator: It wasn’t actually working…
The clearing of the underbrush reduced the diversity of the insect, mammal, and bird populations that were essential to the soil building process. Since there was only one species of tree in the scientific forests, pests and diseases could easily move from tree to tree infecting the entire forest. All of these issues came together to result in massive forest death across the country over a short period of time, effectively setting back the German industry by decades, a devastating blow.
The long term result was less total timber available to Germany, the exact opposite of what the German forestry central planners had intended… What they did seems, on the surface, entirely reasonable doesn’t it?
They were trying to reduce the volatility and variance in timber production so that the German industry could rely on a consistent and predictable amount of timber each year. However, they failed to realize how costly and hard to reverse failure would be.
Hidden Risk and The Robustness / Efficiency Tradeoff
The German forest example illustrates an important general principle.
For some systems, over-optimizing for efficiency in the short run actually leads to less efficiency in the long run because of the build-up of hidden risk. The way that hidden risk normally shows up is by making a tradeoff to get more efficiency in the short term at the cost of robustness and efficiency in the long run.
The scientific forests were initially more efficient. For the first few decades of their use, scientific forests produced more timber, more reliably than the older, natural forests.
It seemed that the central planners had made a smart decision. They looked at their ten-year track record and saw a line going up and to the right.
What did not show up in their spreadsheets was the loss of robustness inherent in the old-growth forest.
To give only one example of this tradeoff, the old-growth forests tended to have many different species that grew in isolated groves. This made it more difficult to harvest timber because one day you were cutting down birch trees, the next day oak, and the next day pine. This might require different equipment or more general, less optimized equipment. In either case, you are losing efficiency in how fast you can harvest timber. By planting only one species, you got far more efficiency in the short run because it was easier to cut the trees down and harvest the timber.
However, one of the benefits of the groves was that they isolated disease spread. If a disease that affected oak trees popped up in one of the groves, it might kill all the trees in that isolated grove but because that grove was isolated from other oak trees, it didn’t have large, systemic consequences. 2% of the trees dying in a year due to a new insect infestation wasn’t ideal, but it was manageable. The more “efficient” scientific forests were much more vulnerable to massive, systemic disease spread.
This is not just the case in forestry, but many different systems. You can improve the gas mileage of your car by getting rid of all the protective equipment like airbags and seat belts that add cost and adds weight which makes the fuel efficiency worse. You may drive your “more efficient” car for years, increasingly certain of what a good decision you made until one day, things go bad.
The important takeaway from this lesson: For systems where failure is catastrophic, attempts to increase efficiency in the short run often lead to less efficiency in the long run.
Perhaps the most dangerous element though is not just that the system tends to become extremely fragile and full of hidden risk. It is that, at the same time, it exhibits no visible risk.
The year before the forest yields completely collapsed was one of the best years of timber production in history. All signs pointed to even better years ahead. Every year had been better than the last since the new scientific forest project began.
Any “improvement” which makes something 20% more efficient in the short term, but introduces the risk of complete ruin like an 80% reduction in timber yields is a winning bet in the short term, but a losing bet in the long term.
It is akin to playing Russian Roulette every year of your life for a $1 million dollar prize each time you survive. While you might get lucky the first year, second year and third year, but the odds of you making it to your 50th birthday are 0.016%. As time goes on, your chance of “blowing up” approaches 100%.2
The Robustness Efficiency Tradeoff points out that you should never take risks that could kill you (literally or metaphorically), but be willing to take lots of risks that can’t kill you.
It is not that most people are either too risk-averse or too risk-taking, but that we are too risk-averse in some areas and too risk-taking in others.
The person that spends three weeks picking out a logo for their business (risk-averse) but hold their entire retirement portfolio in tech stocks and crypto (risk-taking) would be better served by taking more risks and moving faster in their business and taking fewer risks in their portfolio.
The correct point on the tradeoff is going to be different for any given system and its subsystems. The nuclear power plant needs to focus on robustness and being thorough for its generator design, but not for ordering office chairs. If you order the wrong chairs, you just send them back, it’s not that costly or hard to reverse.
Is failure cheap or easy to reverse? Move fast and break things.
Is failure devastating or hard to reverse? Measure twice, cut once.
If you’re interested in reading more about this topic, I’ve written about it’s applications to running a business, bitcoin and cryptocurrency, and have an essay coming out soon applying it to investing.
It’s beyond the scope of this piece but part of what Amazon has realized and executed on so brilliantly is that it is possible to architect their company structure to make each of the pieces relatively autonomous such that a failure in one doesn’t spill over to the others. This allows them to both move fast and break things at a micro level while also remaining robust at a macro level.
It is possible (and desirable) to do the same in many other systems. For instance, the U.S. federal system where more authority is delegated to the states as opposed to the national government achieves a similar effect.
- This brings us back, of course, to our old friend, ergodicity. The essential principle of ergodicity is that it doesn’t matter what happens on average, it matters what happens to you.