That was my opening line at an OpenStack technology summit presentation I delivered last year. Many people raised their hands after I asked the question. I followed up by asking, “How many of you are also married?” The remaining people looked uncomfortable, so of course I made a joke about the Ashley Madison website, which facilitates extra-marital affairs. (It was funny at the time, when the site had just been hacked and cheaters were being publicly exposed.) Like a marriage, enterprise-IT lock-in—or using only one vendor’s product for key functions, such as outsourced cloud computing—is a good thing if managed well for mutual benefit. But like a divorce, it gets ugly and expensive when things go wrong.
Trends mitigating lock-in
Lock-in is on many people’s minds as Google Cloud Platform and Microsoft Azure, emerging cloud competitors to Amazon Web Services, recently unveiled new plans to compete with the larger and more dominant AWS. More broadly, however, I believe that several key technology trends are actually working to mitigate the ill effects of lock-in even as these three big companies slug it out in the marketplace.
These trends include modern application development practices that focus on product-based work, as opposed to project-based; the DevOps movement; and the rise of micro-service architectures. In addition, the availability of just about anything as an open-source package or delivered “as-a-service” reduces lock-in. And with careful use of Web services, modern applications are not really locked in at all—they can relocate while staying connected to their dependencies.
But lock-in is still a real thing and can be a real problem for IT professionals, who often use it to justify using one product over another. Later, I’ll break down and classify different types of lock-in and discuss which ones have lower barriers for escape.
If we look at who complains most about IT vendor lock-in, it’s primarily operations people who have been given a system to operate—and are stuck with it—as vendors ramp up support costs. In contrast, discussion among developers is primarily about finding best-of-breed suppliers to get a product built quickly, with the most features, by expending the least effort. So it appears that developers often ignore and create lock-in, and operations staff suffer most of the consequences.
How can we rationalize these conflicting priorities? For starters, developers and operations professionals need to better work together and share each other’s concerns. Luckily, today, there’s a whole (aptly named) DevOps movement dedicated to making that happen.
Deming loved DevOps
In fact, a leading voice in the DevOps movement, John Willis, likes to quote the fourteen key principles of W. Edwards Deming, the management guru who helped shape Japanese companies after World War II. Willis summarized Deming’s principles—which underpin today’s DevOps philosophy–in a short talk you can watch on YouTube.
Deming’s fourth point is the most relevant one for DevOps and lock-in. It states:
“End the practice of awarding business on the basis of a price tag. Instead, minimize total cost. Move toward a single supplier for any one item, on a long-term relationship of loyalty and trust.” – W. Edwards Deming – 4th Point
So we should all think like developers! We can stop worrying about lock-in at work and go home to our happy marriages.
Unfortunately, not all marriages or vendor relationships are happy, and sometimes the situation ends up looking like this:
“End the practice of awarding business on the basis of a price tag. Instead, minimize total cost. Move toward a single supplier for any one item, on a long-term dysfunctional relationship of loyalty exploitation and trust abuse.”
Why does that happen sometimes, and what can we do to make the best of the situation? I think the key thing to look at is the ability to evolve and the cost of changing technology dependencies. Since the goal of development is to create new things, developers are well-suited to evolving the products they are building, and also have a relatively low cost of change. They don’t feel locked in, as they can work with an open source project or vendor to redesign a dependency or switch from one project or vendor to another.
In contrast, if an operations team is running a stable product that has little or no developer activity, they are locked into whatever decisions were made at the time it was built. So they can’t evolve it and have a high cost of change. Vendors then exploit this situation by extracting as much value as they can from the relationship, and unpleasant terms like “paying the vendor tax” come up.
Project-based to product-based work
One of the big changes underway in enterprise IT, and one that is helping to mitigate lock-in, is a move from project-based work to product-based work. In the first scenario, individual project teams are formed to create or upgrade a system; then, they hand over that system to other parties to operate it, and the project team members then disperse to go work on other projects. The project-based model is especially common when organizations outsource development work, or send it offshore, using fixed contracts.
While this model may be more efficient and reduce up-front development costs, it’s clear that it can leave behind it a trail of locked-in systems that eventually cost more to operate. The alternative is to build product-focused teams who own the creation, operation and evolution of services. They then break down those services from monolithic builds that are designed to be efficient large bundles of features that change slowly and change them into a large number of independent micro-services that trade off some efficiency for the ability to evolve rapidly. The main driver for the change from project teams to product-based teams is the competitive landscape, chiefly the need to rapidly improve end-user experiences. But an interesting side effect is that lock-in should also be reduced as an issue.
As-a-service and micro-services
Another large change in enterprise IT that is affecting lock-in is the move toward buying products as services, rather than building them in-house. But this arrangement gives rise to a different kind of lock-in. Internal processes and training still need to adapt in order for organizations to switch between vendors, but overall it seems that there is less lock-in when dealing with a SaaS vendor than there would be to an on-premise installation of an equivalent product.
From the developer point of view, modern applications are constructed from micro-services that include a lot of open-source components such as Docker, Redis, Nginx, Consul, Mesos, Swarm, Kubernetes, MySQL, MongoDB, Cassandra etc. etc. They also make use of many Web services that usually are offered initially through a free trial option. Developers don’t have to have a vendor relationship to download these components and try out the Web services, so they are able to do rapid experiments and track the latest technology as it matures.
When these components and Web services are deployed in production, vendor support and service subscriptions are needed. But again, it’s far less lock-in than what you get with the traditional enterprise- vendor solutions. The nature of micro-services also isolates dependencies into small sections of the overall system, which reduces the cost of change.
What’s up with Dropbox, Apple and Netflix?
Very large scale use cases have their own flavors of lock-in, and motivations to move. Large amounts of data can be hard to migrate, and some vendors scale better than others, so may become the only option. There are also cases where it makes sense to completely build and own a capability at extreme scale.
Dropbox was in the news recently with a Wired story, stating that they have moved their primary storage backend in the USA from AWS S3 to their own dedicated servers and software. They continue to use S3 for Europe. The Dropbox front end was always on their own dedicated datacenter platform, so they had already figured out how to run infrastructure at very large scale. At some point when you get big enough, it’s a good idea to try to optimize the core workload onto dedicated hardware and software. Since the AWS workload for Dropbox is primarily an ever increasing amount of online storage, they don’t benefit from the ability to turn off shared compute capacity when they aren’t using it, like many other cloud workloads. Dropbox reports that they moved data from AWS to their new platform at 4 Petabytes per day. This is about 50 Gigabytes per second and would requires about 500 Gigabits of network capacity. There are very few datacenters or clouds capable of delivering at that speed, obviously AWS can, and Dropbox had to build its own to match.
Some people compared this move to Zynga’s move off of AWS in 2011, and eventual move back onto AWS in 2015. In that case Zynga was one of the biggest AWS customers, was growing extremely fast and had a primarily compute based workload driven by a series of hit games. They built their own datacenter cloud but after the build out Zynga’s hits dried up and with hindsight they probably over-built and over-invested in datacenter capacity, when they could have invested more in games. I don’t see this happening to Dropbox because their storage workload is always increasing, rather than the boom and bust of being driven by the latest hit game.
Apple is operating at a scale that is hard for most people to comprehend. At a conference in 2015 they said they run over 100,000 instances of the Cassandra database, about ten times more than Netflix. To get all the various Apple services to work at scale they have a large datacenter footprint, and use several public cloud and content delivery networks (CDNs). This week’s story is that Apple has apparently moved some of their storage capacity to Google’s cloud data store. This is one of the largest scale and most mature Google services.
Since I left Netflix two years ago, the scale of their operations on AWS has continued to grow strongly. Recent stories about their caching and data tiers stated that they have around 10,000 instances of memory caches, and another 10,000 instances of the Cassandra database. Since 2011 Netflix has also used Google cloud’s data store for disaster recovery backups in case everything goes wrong with their main backups on S3, and it’s a great insurance policy.
While Netflix deploys the code and primary data storage to AWS, they also run one of the world’s largest content delivery networks. The Netflix CDN is far too big to run on AWS CloudFront, and Netflix even outgrew the ability to depend on the largest public CDNs. The content generation, coordination and management of the Netflix CDN runs on AWS with sophisticated custom code inside the client devices (e.g. Smart TV sets), but the content itself is served from their OpenConnect appliances. These are basically very efficient Nginx web servers full of disks. One of the key concepts of cloud at scale is that you always want to be a “small fish in a big pond,” you don’t want to be a “shark in a paddling pool.” This is why the biggest use cases tend to end up in specialized deployments outside the public cloud.
It’s clear that there are still many different kinds of lock-in, however—though some are more pernicious than others. In part 2 of this post I will classify the types of lock-in, to help inform a discussions about whether a particular lock-in situation is really high cost and hard to evolve, or not.
Most people seem to like the idea of marriage—which makes me think we can work to make IT-vendor lock-in more manageable. The rise of modern application development practices, DevOps, microservice architectures and “as-a-service” delivery models is helping.
Note: This post is based in part on a presentation given at a Silicon Valley OpenStack Summit in 2015:
Web Services and Microservices: The effect on vendor lock-in