#0076: An operating system for the cloud

Matthew Sinclair
7 min readJun 12, 2018
Photo by Markus Spiske on Unsplash

Braingasm

[ED: This post goes a bit deeper on the tech than usual. I’d give it 3 out of 5 nerds: 🤓🤓🤓🤔🤔]

Microsoft’s acquisition of GitHub this week, which I think is a pretty savvy move, prompted me to think again about something that has been in the back of my mind for quite some time: What is the operating system for the cloud?

The obvious answer to this question is AWS, or Google Cloud, or even Microsoft Cloud, but I want to lay out some thinking which suggests that, as technically and functionally amazing as these existing and emerging platforms might be, they are perhaps just building blocks for what comes next.

Lets recap what Microsoft did in the 80s and 90s. During that time, they managed to establish one of the most successful, value-creating platforms of all time. The DOS/Windows ecosystem generated hundreds of billions of dollars and made an awful lot of people very rich. It did this by creating a software layer that sat between desktop PC hardware and application software. This allowed independent software vendors to write applications that interacted with this intermediate layer, and freed them (mostly) from having to worry about the nuances and vagaries of the underlying hardware. We can argue about Microsoft’s design aesthetics, but what you can’t argue is how successful this platform was in freeing developers to build application software, and the resultant value that was unlocked. Famously, Microsoft only captured a minority percentage (~30%) of the total value created by the platform, leaving the remainder to be captured by software vendors and service providers on their own. [ED: If this topic interests you, Ben Thompson at Stratechery has been writing some great stuff on this recently.]

About 10 years ago Apple repeated this miracle with the iPhone, but with a slightly different business model configuration that had them capturing a percentage of all of the value unlocked by the platform. They were able to do this because they locked down the way applications were distributed to users, as well as taking a slice of any monetisation that occurred inside apps. That this was possible at all is due in no small part to the attention paid by Apple to design, and the compelling user experience they created. However, the size of the prize — the iOS ecosystem has turned into the most successful and profitable product in history — comes down to a fundamental shift in end user control: from businesses buying enterprise software in the case of Microsoft, to consumers buying smartphones in the case of Apple.

The conclusion to draw from this brief history lesson is simply that there is an enormous amount of value available to anyone who can create a platform that allows an ecosystem of developers to build applications for a pool of end users, removing the need for them to worry too much about the underlying hardware. Obviously, this simple point belies huge design and engineering complexities, and also ignores timing and marketing and human nature and a bunch of other inscrutable factors. But those factors are important in any change, so the point worth noting is the pivotal role that the operating system plays in creating platform leverage.

Which brings us back to the original question: what is the operating system for The Cloud?

As things stand, there really isn’t an operating system for the Cloud. There are multiple Cloud platform providers, and those providers, and other 3rd-parties, have an array of value-added services that sit on top of the underlying infrastructure. In the case of AWS, these services provide a dizzying array of high-level, as-a-service functionality. But there’s nothing that performs the complete role in The Cloud that desktop operating systems performed for application developers during the desktop era.

Don’t get me wrong, there are a bunch of things that make running software on The Cloud a lot easier, but these things are not (generally) abstractions that remove the need to deal with the infrastructure as much as they are sophisticated ways to treat the infrastructure as if it were software. This is important, but it is not quite the same thing.

By way of analogy, consider the emerging tech field that is loosely called serverless. The idea here is that you can build software in a way that means you avoid thinking about the servers that the back-end services are running on. In contrast to containerisation, which virtualises the server, serverless allows you to just think about the functions that your app might need, and not worry at all about the servers on which those functions are running.

Some have even made the argument that serverless is the future, and that it solves many, if not all, of the current crop of problems with building large-scale software on elastic infrastructure. My contention is that as interesting as it is, it is not enough on it’s own to take on the role of operating system for the Cloud.

So what is the Cloud analog of the job that DOS/Windows or macOS did for the desktop era, iOS and Android are doing for the smartphone era, and Linux continues to do for the virtualised, elastic infrastructure era?

An operating system is more than just a set of software APIs for hardware. To create leverage for application programmers it needs to provide conceptual abstractions over the hardware that make programming easier.

In the case of The Cloud, we already have conceptual abstractions over the underlying raw hardware in the form of virtualisation and containerisation. We also have a ton of individual services that can be coordinated such as databases, queues, logging, directories, and so on. And we even have a declarative way to configure and provision these resources in the form of things like Terraform and CloudFormation. But what we don’t have (yet) is a way to treat the whole ecosystem as though it were a single, unified platform.

To stand up a modern back-end, it is still necessary to provision a whole load of infrastructure and manage an enormous amount of configuration. Tools like Terraform turn a lot of this into a matter of declarative configuration, but the fact remains that there is a lot of stuff to do, and it requires engineers to consider multiple layers of the technology stack. Importantly, even when tools attempt to raise the level of abstraction, it’s still easy to get yourself into trouble if you don’t know what you are doing.

If we think about the types of resources available in The Cloud, they fall into (at least) four broad categories, or fabrics:

  1. Storage fabric: services that facilitate permanent an, durable storage of data
  2. Networking fabric: services that allow the constituent parts of a web-scale system to find and talk to each other
  3. Compute fabric: services that allow for application logic to run
  4. Application fabric: services and components that make building applications easier

The first three are relatively well understood, but the fourth one is less well developed. An application fabric would contain a range of services that help application developers, such as:

  • System topology and provision, coordinate compute, network, and storage fabrics
  • User identification, authentication, authorisation, and access control
  • Service versioning, discovery, and composition
  • Logging, runtime management, and the full suite of operational controls
  • Backup and recovery and general platform hygiene
  • Platform extension and customisation
  • Developer support including distributed version control, continuous integration/deployment, devops, etc
  • Others, …

Consider the Ruby on Rails web application framework. From a 2018 perspective, it’s difficult to conceive just how bad web development was in the early 2000s. It’s not that Rails is perfect, but prior to Rails there was so much cruft developers had to contend with to build a simple web app that the whole process was fraught with pain and frustration. The many limitations of Rails are well documented, but its highly opinionated convention over configuration approach to building apps meant that once a developer understood the way the framework zigged and zagged, they could be incredibly productive.

Completely coincidentally, today I was pointed at a project called Architect. This is a serverless framework, in the same vein as Serverless.com. However, Architect appears to be pushing the boundaries of what this kind of framework can be. I haven't gone into it deep enough to have a strong opinion, but it looks to be going in the right direction by providing a complete set of application fabric components.

This blog post also makes the following point, worth quoting in full:

Since Rails was born, I’ve seen plenty of frameworks come and go, and have poked at a few here and there out of curiosity — I wondered if any could capture the right combination of nailing the abstractions and providing the right community framework. The first that has really caught my attention in that time is called Architect, and just as Rails leveraged Ruby to unleash massive potential for web development, Architect leverages Node and NPM to finally bring some structure and insight to the AWS Serverless landscape.

That’s enough to pique my interest.

Which brings me back to the thing that originally sparked the question: Microsoft’s acquisition of GitHub. Although Microsoft has been in the wilderness during the transition from desktop to mobile, the new Microsoft 2.0 under Satya Nadella has a very good chance of recapturing some of their past glory … if they can come up with a credible operating system for The Cloud, and get it out to developers in a compelling package.

There is still an awful lot more to do, but the building blocks are falling into place.

Regards,
M@

ED: If you’d like to sign up for this content as an email, click here to join the mailing list.

--

--