Random Thoughts on Leadership & Software Engineering

Starbucks Does Not Use Two-Phase Commit 20 Years Later - and It Never Will

starbucks-shop

More than 20 years ago Gregor Hohpe published an observation that distributed systems in real life - the CAP theorem (consistency, availability, partition tolerance - pick two, any two, because you cannot achieve three at once) holds true for systems outside of the digital space.

CAP

Indeed, digital systems are usually part of much more complex interdependencies - the difference between software architecture (how a system is designed and structured internally) and enterprise architecture (how organizations and systems are designed as a supersystem together) - so real systems follow the same rules, not just the digital parts. Since a typical Starbucks shop is a distributed system (multiple baristas, multiple machines and steps conducted, no central clock, no shared memory, transparency of failure, access and location) it can only select in its design between availability - the ability to start processing orders at any time and consistency - getting every step of every order right or cancelling the whole order. Dropping partition tolerance is not an option, unless they change their whole model and lose all benefits of distribution - chiefly ease of scalability and resilience of process in their case.
Of course, Starbucks makes more money on availability and processing as many customers as it can, regardless of it may lead to inconsistency sometimes - they might lack soy milk for example - OK, bad example 😁 - rather they might lack cinnamon or the customer funds and starting a process without these will not lead to a purchase of a cinnamon drink.
They are happy to remake coffee or throw coffee away due to inconsistency but serving as many customers as they can.
If Starbucks used a two-phase commit for consistency purposes a single barista will lock all resources and prerequisites - the milk, cinnamon, machines, lock funds on your card, etc. and only then make your coffee (phase two) - if something is missing, they will roll back.
The same way Airbnb is happy to give you a voucher to compensate a double booking but not slow down the booking process as a whole.
We live in a world of distributed digital systems now - cloud and microservices are the primary drivers.
So, the real choice usually is - do you want a consistent or available system.
Always make informed choices when you design.