Written by Bill Farner and David Chung
Docker’s mission is to build tools of mass innovation, starting with a programmable layer for the Internet that enables developers and IT operations teams to build and run distributed applications. As part of this mission, we have always endeavored to contribute software plumbing toolkits back to the community, following the UNIX philosophy of building small loosely coupled tools that are created to simply do one thing well. As Docker adoption has grown from 0 to 6 billion pulls, we have worked to address the needs of a growing and diverse set of distributed systems users. This work has led to the creation of many infrastructure plumbing components that have been contributed back to the community.
It started in 2014 withlibcontainerandlibnetwork. In 2015 we createdrunCandco-foundedOCIwith an industry-wide set of partners to provide a standard for container runtimes, a reference implementation based on libcontainer, andnotary, which provides the basis for Docker Content Trust. From there we added containerd, a daemon to control runC, built for performance and density. Docker Engine was refactored so that Docker 1.11 is built on top of containerd and runC , providing benefits such as the ability to upgrade Docker Engine without restarting containers. In May 2016 at OSCON, we open sourced HyperKit, VPNKit and DataKit , the underlying components that enable us to deeply integrate Docker for Mac and Windows with the native Operating System. Most recently, in June, we unveiledSwarmKit, a toolkit for scheduling tasks and the basis for swarm mode, the built-in orchestration feature in Docker 1.12.
With SwarmKit, Docker introduced a declarative management toolkit for orchestrating containers. Today, we are doing the same for infrastructure. We are excited to announce InfraKit, a declarative management toolkit for orchestrating infrastructure. Solomon Hykes open sourced it today during his keynote address at LinuxCon Europe. You can find the source code at https://github.com/docker/infrakit
Back in June at DockerCon, we introduced Docker for AWS and Azure beta to simplify the IT operations experience in setting up Docker and to optimally leverage the native capabilities of the respective cloud environment. To do this, Docker provided deep integrations into these platforms’ capabilities for storage, networking and load balancing.
In the diagram below, the architecture for these versions includes platform-specific network and storage plugins, but also a new component specific to infrastructure management.
While working on Docker for AWS and Azure, we realized the need for a standard way to create and manage infrastructure state that was portable across any type of infrastructure, from different cloud providers to on-prem. One challenge is that each vendor has differentiated IP invested in how they handle certain aspects of their cloud infrastructure. It is not enough to just provision five servers;what IT ops teams need is a simple and consistent way to declare the number of servers, what size they should be, and what sort of base software configuration is required. And in the case of server failures (especially unplanned), that sudden change needs to be reconciled against the desired state to ensure that any required servers are re-provisioned with the necessary configuration. We started InfraKit to solves these problems and to provide the ability to create a self healing infrastructure for distributed systems.
InfraKit breaks infrastructure automation down into simple, pluggable components for declarative infrastructure state, active monitoring and automatic reconciliation of that state. These components work together to actively ensure the infrastructure state matches the user’s specifications. InfraKit emphasizes primitives for building self-healing infrastructure but can also be used passively like conventional tools.
InfraKit at the core consists of a set of collaborating, active processes. These components are called plugins and different plugins can be written to meet different needs. These plugins are active controllers that can look at current infrastructure state and take action when the state diverges from user specification.
Initially, these plugins are implemented as servers listening on unix sockets and communicate using HTTP. By nature, the plugin interface definitions are language agnostic so it’s possible to implement a plugin in a language other than Go. Plugins can be packaged and deployed differently, such as with Docker containers.
Plugins are the active components that provide the behavior for the primitives that InfraKit supports. InfraKit supports these primitives: groups, instances, and flavors. They are active components running as plugins.
When managing infrastructure like computing clusters, Groups make good abstraction, and working with groups is easier than managing individual instances. For example, a group can be made up of a collection of machines as individual instances. The machines in a group can have identical configurations (replicas, or so-called “cattle”). They can also have slightly different configurations and properties like identity,ordering, and persistent storage (as members of a quorum or so-called “pets”).
Instances are members of a group. An instance plugin manages some physical resource instances. It knows only about individual instances and nothing about Groups. Instance is technically defined by the plugin, and need not be a physical machine at all. As part of the toolkit, we have included examples of instance plugins for Vagrant and Terraform. These examples show that it’s easy to develop plugins. They are also examples of how InfraKit can play well with existing system management tools while extending their capabilities with active management. We envision more plugins in the future – for example plugins for AWS and Azure.
Flavors help distinguish members of one group from another by describing how these members should be treated. A flavor plugin can be thought of as defining what runs on an Instance. It is responsible for configuring the physical instance and for providing health-check in an application-aware way. It is also what gives the member instances properties like identity and ordering when they require special handling. Examples of flavor plugins include plain servers, Zookeeper ensemble members, and Docker swarm mode managers.
By separating provisioning of physical instances and configuration of applications into Instance and Flavor plugins, application vendors can directly develop a Flavor plugin, for example, MySQL, that can work with a wide array of instance plugins.
Active Monitoring and Automatic Reconciliation
The active self-healing aspect of InfraKit sets it apart from existing infrastructure management solutions, and we hope it will help our industry build more resilient and self-healing systems. The InfraKit plugins themselves continuously monitor at the group, instance and flavor level for any drift in configuration and automatically correct it without any manual intervention.
The group plugin checks on the size, overall health of the group and decides on strategies for updating.
The instance plugin monitors for the physical presence of resources.
The flavor plugin can make additional determination beyond presence of the resource. For example the swarm mode flavor plugin would check not only that a swarm member node is up, but that the node is also a member of the cluster. This provides an application-specific meaning to a node’s “health.”
This active monitoring and automatic reconciliation brings a new level of reliability for distributed systems.
The diagram below shows an example of how InfraKit can be used. There are three groups defined; one for a set of stateless cattle instances, one for a set of stateful and uniquely named pet instances and one defined for the Infrakit manager instances themselves. Each group will be monitored for their declared infrastructure state and reconciled independently of the other groups. For example, if one of the nodes (blue and yellow) in the cattle group goes down, a new one will be started to maintain the desired size. When the leader host (M2) running InfraKit goes down, a new leader will be elected (from the standby M1 and M3). This new leader will go into action by starting up a new member to join the quorum to ensure availability and desired size of the group.