The Future of BOSH

Jun 16th, 2015

I love BOSH. I have many conversations with people about where BOSH is going so I thought it a good idea to capture my thinking in a post. This post.

Two Options

BOSH deploys and manages all the things. It’s the best distributed-system-system I’ve ever used.
BOSH should be the deployer of Diego and other container schedulers.
- Diego et al deploy and manage all the things.
- Diego becomes something like Service Foundry.
- The world is a better place because non-12-Factor-app developers have a contract to develop to, and the fast-feedback of container-driven deployments.

I currently favour option 2 but reserve the right to change my mind as I learn over time. Previous attempts to build multi-purpose stateful/stateless PaaSs haven’t gone amazingly well. If you’re building a stateful service right now, build it for BOSH, not Diego.

What Next?

We’ll be exploring the cool new future of Diego in a LoPUG hack day and looking at ideas like a Diego CPI for BOSH, persistent disks, and TCP networking in PaaS. Join us to find out more.

The CloudCredo Way

Jan 5th, 2015

5th January, Blog #12

The CloudCredo Way

Today, January 5th, is the last day of the twelve days of Christmas, so this is the last blog in The Twelve Blogs of Christmas. I survived. I may not have saved the best for last, but I’ve certainly saved the most important. This post is all about people, and how to get people delivering value. Great people make great things. Great things make great profit, or so I’m hoping.

Explicit or implicit culture

This post is about growing the culture within my company, CloudCredo, and the processes we use. I’ve spoken to many senior people in technology that believe you shouldn’t tell developers how to develop, and should just let them get on with it. I firmly disagree with that opinion. I think a company should be clear about how you want to deliver value, and you will then attract people that ‘fit’ your culture. If you don’t choose a way of working you are effectively saying “We let the senior staff bully the junior staff, code like cowboys, and do whatever they like.”. But that’s just my opinion.

I presented most of this content at Extreme Programmers London and had a great response. I thoroughly recommend attending their meetups.

Programming The Extremely Pivotal Way

I have been massively influenced by “The Pivotal Way”, in fact our working practices emulate it in most aspects. Imitation is the sincerest form of flattery. I had the tremendous good fortune to be invited to spend a few months in Pivotal’s Cloud Foundry Dojo (I believe I may have been the first person through it) during CloudCredo’s early days, which was an amazing experience. I’ve taken those learnings into CloudCredo and iterated on them.

We’re using the processes to deploy and develop agile platforms and infrastructure – pair-programming test-driven development with our clients. We use the Tracker-based planning process to empower authoritative product managers to prioritise bugs and features on multi-functional teams. You could call them ‘devops’ teams but I think that label has had all possible value eroded; they’re teams focused on delivering business value. You write it, you run it.

A process of feedback loops

I’m obsessed with fast feedback: I view the process as a set of shrinking feedback loops, making their way inwards towards the continuous feedback of pair programming, and expanding back out again. These loops are a good starting point until you find something that might work better. Have the courage to experiment. This content is severely abridged for brevity.

0. Pre-inception – qualify readiness to start the process

1. Inception – reach team consensus on a meaningful step forward, typically three about three months

2. Iteration Planning Meeting – plan the next iteration, pointing

3. Standup – today’s work and pairs

4. TDD (for design, refactoring, confidence, and numerous other reasons)

Pairing (remove distractions, stop cowboy coding, skills transfer, and numerous other reasons)

4. Acceptance – by an empowered product manager, the ‘CEO of the product’

3. Standup – helps/interestings/blockers from yesterday

2. Retrospective – can we improve how we’re delivering? The first derivative of delivery.

1. Review – can we improve how we’re improving on delivery? The second derivative of delivery.

Why now?

Over the course of my career I’ve delivered many projects using a traditional “up-front architecture” approach, heavy on Gantt charts and light on prototypes. This was necessary when procuring servers (and the data centre space for them to live in) took months if not years. The cost of change for the infrastructure was massive; similar to changing the foundations beneath a skyscraper. The architecture needed to be right, and IT processes evolved around these assumptions.

The big change in our industry has been the increased agility of infrastructure. A six-month server procurement has become a thirty-millisecond container-starting API call. Infrastructure is now defined as code. The incredibly low cost of change means we can now use a process that embraces, rather than resists, change – and that process is Extreme Programming(XP).

Why not Six Sigma?

Six Sigma is a fantastic methodology for eliminating defects and minimising deviation. Six Sigma is easy to implement with software: cp -ra. It’s less relevant when developing and running new services, where learning is more important.

Why not Scrum?

Scrum and XP are both agile methodologies although I believe XP has a greater focus on learning. XP embraces change during sprints/iterations, Scrum favours commitments. XP is a more holistic process, recommending engineering practices such as TDD and pair-programming. As mentioned above, I see value in forming explicit opinions on these practices.

Why not Kanban?

Tracker-based planning is actually a software-development-domain specific implementation of Kanban. Blur your eyes and you can see the swim lanes. Achieving and maintaining stable delivery flow is very important.

Learning

As we’re building and deploying new systems and services we need to embrace learning as we make progress. We’re helping our clients to learn – and we’re using a process that facilitates learning and embraces change based on that learning. Between competing organisations the organisation that can learn fastest, and channel its efforts in the right direction, will win. We learn through feedback loops such as PDCA, OODA – both developments of scientific method.

The focus of XP is learning. Learn about what you’re building so you can adjust your plans as you find out what’s good and what’s bad about what you’re building. Learn about how you’re building it; what’s working and what needs adjustment. Use emergent velocity for data-driven planning.

Supporting processes

There are a number of patterns we’re using to support our chosen methodology. Here’s a small selection:

Continuous Delivery – start with a strawman (a test, a repository, a CI server) and continue delivering from there
Microservices – people can’t solve big problems: divide and conquer
Release trains – continuously deliver all the things into an integration pipeline. If you want to ‘tag’ a release, run a release train through the last known good versions from the pipeline. Choo choo!

Openness and communication

This information begs an obvious question: if this is CloudCredo’s secret sauce, why would I blog about it publicly? This again comes back to XP, and two of XP’s core values – courage and communication. I have the courage to lay CloudCredo’s approach bare for all to see. If we’re not working in the right way, or there’s flaws, I hope they will be pointed out so we can improve. We’re continuously trying to improve. Please get in contact if you’ve got any feedback.

If you’ve been thoroughly confused by the UK Garage links at the end of all the Twelve Blogs of Christmas, well, maybe it’s a London thing.

How I Build Stuff

Jan 4th, 2015

4th January, Blog #11

How I Build Stuff

This is likely to be the most nonsensical of my thoroughly nonsensical Twelve Blogs of Christmas. I wanted to capture the recurring themes of the processes I’ve employed when bringing software to life; from inception to massive scale. Everything here should be overridden by domain-specific knowledge/concerns as appropriate. Or ignored altogether :)

Don’t build anything

If you’ve had an idea for something, it’s probably already been done. Go and have a look. If your idea hasn’t been done it’s probably quite likely you can reskin/resell/repackage/repurpose a current service to deliver the new service you’re hoping people will find valuable.

Start at the top, not the bottom

If your idea hasn’t been developed before, look for two or more currently running services you can combine to create the value. Use SaaS. If you have to write software, host it in a PaaS that does as much heavy lifting as possible. Don’t start with infrastructure and build upwards. You don’t have time, and you’ve got better things to do.

Get feedback from day 1

Get meaningful feedback about your service as soon as possible. Your idea is probably stupid, and you should stop (or pivot). That’s the harsh reality of the situation. Continue getting feedback about your decisions as fast as possible and as frequently as possible. Your ability to gain and act on feedback will be the key determinant of the success or failure of whatever you’re building. Read ‘The Lean Startup’. Cucumber-value is a rubbish piece of software, but should make you think about how you can quantify and measure real metrics for success.

Use small teams that ‘own’ services with business value

Business services are owned by small teams. Teams own one or more (micro?) services. Use Domain-Driven Design as a guide for breaking large problems into smaller services. Don’t overly engineer towards microservices, let them emerge, you can break them down further at a later date. Don’t be afraid to sacrifice your architecture if it needs replacing; you’re learning. Hire people that learn fast; they can gain knowledge from others in the team. Use ‘The Pivotal Way’ – lightweight XP. Initially create a Git repository per service; shared code will emerge into new repositories. Use GitHub or BitBucket – you’re not a Git hosting service (unless that’s your idea, and if it is, it’s already been done).

Deploy to somebody else’s Cloud Foundry

Deploy two(or more) Cloud Foundry instances per (micro)service. Use run.pivotal.io, Anynines, Fjord IT, AppFog, Bluemix, or any other hosted Cloud Foundry (not your own). Write and deploy your (micro)services quickly, in languages you can develop quickly in: we like Go and Ruby. If you need to develop in a less agile language as requirements emerge you can do that later. Don’t be afraid to use a range of languages; the cost of doing this is mitigated by the PaaS abstraction. Kill your (micro)services and create new ones as necessary.

Start with Continuous Delivery and continue to deliver

Begin with a test for ‘Hello World’, an app that outputs ‘Hello World’ in your chosen language/framework, and a CI server that can run your test then deploy the working software to Cloud Foundry. Iterate from there. Use Travis or Circle until you need your own Jenkins/GoCD/Concourse solution.

Log problems and measure latency

Log when things go wrong. Measure the latency(time from request to response) when things go right. Use Papertrail for the logs. Use New Relic for the metrics. Replace Papertrail with Logsearch and New Relic with Graphite when you need to.

Use JSON templates

Use JSON to communicate between your (micro)services. Use JSON templates as lightweight contracts between (micro)services. Use HTTPS with circuit breakers for synchronous communication. Use Redis or RabbitMQ for asynchronous communication. If your code can’t get to Redis/RabbitMQ/another HTTP endpoint – die quickly rather than tying up resources.

Lightweight data

Start with JSON in Redis. Move to JSON in MongoDB when you need to query the data with more flexibility. Place your BLOBs in AWS S3. Once you know whether your data – per (micro)service – needs to be consistent or available you can change data store as required. Cassandra is a great available data store. Postgres is a great consistent data store (if you can’t afford Oracle RAC, which you can’t). If you’re generating huge quantities of events throw them into Hadoop.

Mutable state is the enemy

Generate events – immutable statements of fact – based upon actions. Simple, immutable, repeatable JSON events are a great start. Store the events in an event log and use them to mutate the data stores powering consuming services – separating commands from queries.

Customise Cloud Foundry when you need to

Customising Cloud Foundry is remarkably easy; do it when requirements demand. Adding buildpacks, services, even stacks (I added Docker) is straightforward. Don’t start with Cloud Foundry customisation but don’t be afraid to take the step if you need to. Deploy to your own IaaS or even your own metal if security/regulation/performance concerns necessitate it.

Go fast and fail faster

In summary: get fast feedback about the software you’re bringing to life. Fail fast if you’re doing the wrong thing. Don’t over-engineer; do what works, deliver quickly. Realise the opportunity cost of your (and your team’s) time. Could you be building something more valuable instead?

Christmas Garage! This one’s relatively recent. And good.

The Problems With PaaS

Jan 3rd, 2015

3rd January, Blog #10

The Problems With PaaS

Platform-as-a-Service(PaaS) isn’t perfect. There are always going to be some things it does well and some things it does badly. This post takes a look at some of the things it does badly, and how we can make improvements in the future.

Stateful services

This excellent article highlights some key points related to problems with PaaS. I absolutely agree that the data service journey with current implementations is painful. I’ve written about CloudCredo’s plans for “Service Foundry”; this is work-in-progress and needs to be an area of greater focus. Stateful services need to become first-class citizens in the PaaS landscape.

Maintenance troubles

I also agree that maintaining PaaSs is currently too difficult. I think this is a symptom of two separate issues. Firstly, the current crop of configuration management tools (Chef/Puppet/Salt/Ansible etc.) are not fit for the purpose of deploying and maintaining distributed systems, such as PaaSs. Secondly, BOSH is the right tool for the job but currently has a difficult user journey. CloudCredo have invested time and effort attempting to make BOSH easier to consume – but again it’s another work-in-progress. We need to get better at making PaaSs easier to operate, maintain, and upgrade.

Secure networking

Another good blog post highlights how networking concerns can block PaaS adoption. Since the writing of that post a couple of advancements have been made. There is now an easily consumable BOSH release to enable encrypted network traffic of any BOSH-deployed service – although we should always question how secure our encryption is. There is also user-configurable networking inside Cloud Foundry. I believe these additions go a long way towards mitigating user concerns, but I’d certainly be interested in further feedback related to PaaS networking.

Transparency

The greatest strength of PaaS is that it’s a black box for running your applications; it allows developers to focus on delivering value rather than operating a platform. The greatest weakness of PaaS is that it’s a black box for running your applications; when things go wrong it can be difficult to work out what’s happening. If you application is performing poorly on Heroku, what do you do next? Spend more money and hope? Cloud Foundry’s new Firehose generates huge volumes of information but can prove difficult to consume for PaaS novices. Buildpack integration with monitoring systems is clearly helpful but we could still make enhancements in this area.

Let’s keep PaaS-bashing

PaaS will only improve if we identify and expose the flaws. We need more users, more critiques, more real-world scenarios. Please get in contact if there’s any burning issues blocking your adoption of PaaS.

Christmas Garage: London – Stand Up Tall!

Service Foundry

Jan 2nd, 2015

2nd January, Blog #9

Service Foundry

As I’ve mentioned a few times, Cloud Foundry has won the Platform-as-a-Service(PaaS) war for stateless 12-Factor applications. Stateless PaaS will now be a Cloud Foundry distro war. We all know mutable state is the root of all evil so I’ve been thinking about a stateful CF-like PaaS since I first interacted with Cloud Foundry. In fact, Cloud Foundry version 1 had an interesting service integration, which served to highlight the gaps in the PaaS journey for stateful application developers. I tweeted a while back about some of the work we’re doing in this area; this post will expand on our plans for ‘Service Foundry’ – a PaaS for stateful deployments.

Implementation

Docker/Rocket for application packaging
Diego with mutexs for scheduling
Etcd/Consul(reused from Diego’s deployment) for service discovery
Weave to provide inter-container networking of clustered state
Flocker for state relocation/replication
BOSH-IPSEC for security of data in-flight
HAProxy or custom Go router for non-HTTP routing
Tunable sync/non-sync disk IO
Expose STONITH semantics to users
CF-like API for management
TBC: Kubernetes for clusters(pods) of containers

Potential

It’s interesting to consider that you could deploy Cloud Foundry itself in such a system. Does this actually look like a manifesto for BOSH version 2? Or does this look more like OpenShift v3? The Kubernetes overlap certainly suggests something similar to RedHat’s latest IaaS+ effort.

Why is this useful to users?

I’ve given talks suggesting that microservices and PaaS are the future of application development. Cloud Foundry makes it incredibly easy to be tall enough for stateless microservices. I want users to have a similarly easy journey for the state powering their microservices.

If you’d like to help with this effort please do get in contact with CloudCredo.

Christmas Garage!

Mutable State

Jan 1st, 2015

1st January, Blog #8

Mutable State

I enjoy attending developer-focused conferences – in particular QCon and GOTO – to talk to the real consumers of PaaS. I also attend infrastructure-orientated conferences as a consumer of IaaS. I was recently on the PaaS panel at the Apache CloudStack Conference when I made what seemed a bizarre statement to many in the audience: “If your problem isn’t mutable state then you’re probably doing something wrong.”. I’m taking this opportunity to explain what I meant.

Mutable state is the problem

I’ve actually been making statements in this vein for a long time. In my experience, when delivering solutions, you can usually find a correct answer to your problem, given time to research and engineer sufficiently. The majority of optimisation problems have occurred when I’ve been facing issues related to mutable state. These generally involve functions related to latency, consistency, availability and performance. CAP theorum is perhaps the most frequently occurring; choose availability or consistency in the face of network partitions in a distributed system. You cannot have both.

Development

Approaches to mutable state vary across programming paradigms. Object-orientated programming favours encapsulating mutable state in objects; usually controlling concurrent access to the mutable data via mutexs or semaphores. In my experience this can lead to heavy performance issues when scaling. Functional programming favours having no mutable state; instead passing immutable values between functions. This causes different issues as it shifts the burden to the developer to model their data – in what can sometimes be an unnatural way.

Infrastructure

Mutable state in Infrastructure-as-a-Service can also cause issues. VMWare’s various infrastructure offerings favour a consistent view of the infrastructure landscape, often using Oracle or MSSQL as an ACID-compliant database for consumers to rely upon. This makes the API easy to consume but difficult to scale to huge levels for producers. Consumers can rely on their mutations being immediately reflected across the API.

AWS EC2 offers an eventually consistent view of their landscape. This means EC2 can be delivered at a truly unprecedented scale but can cause issues for platform-level consumers. Intelligence must be built into the tooling that consumes the API, as large consumers of EC2, such as Netflix, have done.

Platform

At a Platform-as-a-Service(PaaS) level we have tended to deal with the issue of mutable state by abdicating it to external services. Stateless application hosting is a done deal – Cloud Foundry has won that battle. Stateless application developers can choose to locate their data in whichever external service best suits the data; this is the polyglot approach embraced by PaaS. The real issues arise when we attempt to provide a Cloud-Foundry-like journey for developers and operators around stateful services.

An increasing number of PaaS developers appear to have become preoccupied with scheduling. Scheduler debates are so hot right now. No PaaS conference would be complete without an Omega-style optimistic scheduler versus Mesos-style pessimistic scheduler conversation. In my experience scheduling performance is rarely the constraint blocking PaaS adoption: the constraint is usually dealing with the necessary mutable state generated/consumed by the applications running in the PaaS.

Two PaaSs to rule them all?

RedHat recently stated that dividing stateful and stateless services “is an arbitrary distinction”. I don’t agree with this perspective; I think it’s a very important distinction. There are some key issues here – if an application container appears to have stopped functioning, what action should the PaaS take? If the container is stateless the PaaS can request a new container be started: potentially creating a duplicate. If the container is stateful the choice is more complicated. Should a new container be started, maximising service availability, but risking split-brain scenarios in a stateful environment? Should the container remain offline, reducing availability but ensuring data consistency? Should the container be restarted following a successful fencing operation? What does fencing look like in a distributed, scheduled, containerised PaaS environment? These questions may lead to a separate breed of stateful PaaSs emerging focusing on stateful concerns, even if they share similar-looking APIs.

cf push mysql/mysql --disk 500 --restart-policy consistent  --disk-sync immediate --io-performance high
cf push redis/redis --disk 100 --restart-policy available --disk-sync defer --io-performance medium

BOSH

The “Two PaaS” debate has led to some people perceiving Cloud Foundry as a stateless PaaS and BOSH as a stateful PaaS. I believe this is an incorrect interpretation. I think BOSH is a fantastic system – it has had a greater influence on me than Cloud Foundry itself – but it is not a PaaS. BOSH is the purest embodiment of the principles of Infrastructure as Code: it fires up metal, lays down code, and attaches disks for state. It is not, in its current form, a scheduler in the manner of Mesos/Omega. BOSH is the world’s best deployer of schedulers and distributed systems. A stateful PaaS would look more like Diego/Lattice, modified to address the concerns above. BOSH would be a great way to deploy this new PaaS.

Christmas Garage!

Multi-Site Cloud Foundry

Dec 31st, 2014

31st December, Blog #7

Multi-Site Cloud Foundry

CloudCredo’s clients often ask about how to run Cloud Foundry across multiple sites for performance and resilience. This is a natural question to ask: as an application developer, if I specify that I need to have ten instances of my application available, I expect ten instances to be available – whether the underlying infrastructure is available or not. The Platform-as-a-Service(PaaS) abstraction implies that my application should continue to run as desired, and that the PaaS should be designed to handle failures in the infrastructure layer.

Choosing a CAP

This idea sounds great in theory but can lead to some problems in practice. What should Cloud Foundry do, as a distributed system, in the event of a network partition? Should each partition converge on the desired state of the whole system, leading to twice the number of applications being online, and potential split-brain issues with singleton applications? Should the whole Cloud Foundry shut down to ensure application consistency? The key point to note here, which is not immediately obvious, is that whilst Cloud Foundry hosts stateless applications, it is actually a stateful system itself. This means CAP theorem must be obeyed; we can opt for consistency or availability in our Cloud Foundry deployment.

Cloud Foundry has, at its core, a Cloud Controller database maintaining the desired state of the system. If we maintain a single, consistent state of this database we will be running a CP(consistent, partition-tolerant) Cloud Foundry. If we allow multiple, divergent copies of this database we will be running an AP(available, partition-tolerant) Cloud Foundry.

Consistent Cloud Foundry

Cloud Foundry’s current engineering direction seems to favour a consistent view of the desired application state. This is exhibited by the choice of Raft as a consensus algorithm, and Etcd as an implementation, for CF’s next generation Diego components. These choices pose some difficult questions:

Do we suffer thundering herd issues in the event of a network partition, where all desired instances of all applications attempt to run on the nodes with access to the quorate Etcd partition?
How do we ensure data services, required by the applications, are accessible from the functioning side of a partition?
How do we ensure latency to data services from across multiple sites does not reduce application performance below acceptable levels?
What happens if the Cloud Foundry components themselves experience issues, degrading the service as a whole across all sites?

Running a consistent Cloud Foundry makes application deployment and management very easy; there is a single API endpoint for management. At CloudCredo we deploy consistent Cloud Foundry installations across multiple availability zones within a single region. This mitigates the latency concerns and provides a convenient management structure for application deployment.

Available Cloud Foundry

When I’ve needed to deploy Cloud Foundry across multiple regions it has usually been to provide for a very high availability service level. To provide for this I have deployed multiple, completely separate Cloud Foundry installations, often on heterogeneous infrastructure providers, in diverse regions. An example of this kind of installation would be the donations platform for Comic Relief. The huge benefit of this strategy is that no outages in any individual Cloud Foundry can stop the service – availability is maximised. The two major downsides are that deployment and orchestration of applications becomes significantly more complex, and that the applications and data services need to be developed to handle being deployed in a distributed system of this nature.

The Perfect Cloud Foundry?

I’m currently deploying multi-zone ‘consistent CF’ – and then multi-region ‘available CF’ if requirements demand it. This is a domain specific choice; and it takes a significant amount of work in the application(particularly around state) to bring multi-region availability.

Christmas Garage!

PaaSaaP and the Distro Wars

Dec 30th, 2014

30th December, Blog #6

PaaSaaP and the Distro Wars

Cloud Foundry has won the PaaS war. Heroku blazed the trail with their PaaS for Twelve-Factor Apps, and VMWare/Pivotal have built an open-source implementation and ecosystem to provide both enterprises and developers with an obvious choice. Buildpacks have become the standard for translation from application code to runnable unit.

The question is rapidly changing from “which PaaS are you using?” to “which distribution of Cloud Foundry are you using?”. Just as Hadoop became synonymous with big data so Cloud Foundry has become PaaS. The emerging battle is between the distribution vendors; Pivotal, IBM, HP and ActiveState already have offerings in the market. I’m sure we’ll see more from other players as the Cloud Foundry Foundation gains momentum.

The other fascinating aspect of the development of the Cloud Foundry ecosystem is ‘Platform as a Service’ versus ‘Platform as a Service as a Product’(PaaSaaP). Some vendors are offering Cloud Foundry as installable, supported software – where the onus is on the customer to deploy the software to their chosen infrastructure in order to provide a service. Other vendors are deploying and running Cloud Foundry on behalf of their uses to provide a true PaaS experience. A few vendors are offering both. Some PaaS purists have denigrated Cloud Foundry for offering this flexibility, but I see this as one of Cloud Foundry’s greatest strengths. Developers can minimise Time-to-Value by deploying quickly to a vendor’s cloud-based solution – and then deploy Cloud Foundry to their own infrastructure when non-functional requirements emerge to make a custom deployment necessary.

We will also see domain-specific Cloud Foundy implementations for particular markets. The ‘core’ Cloud Foundry specification, provided by the CF Foundation, will provide a key set of capabilities and an API for developers to work against – but we will see extensions providing for additional requirements and innovation. Time will tell which flavours of Cloud Foundry are successful and which are left behind.

Christmas Garage!

Twelve Factor Enterprise

Dec 29th, 2014

29th December, Blog #5

Twelve Factor Enterprise

The big paradigm shift for developers using Platform-as-a-Service(PaaS) is understanding “The Twelve-Factor App”; a set of patterns developed by the team at Heroku enabling applications to be orchestrated and distributed at scale. By adopting these patterns developers can take advantage of PaaS, via services such as Heroku and Cloud Foundry, reducing their operational responsibilities so they can focus on delivering value.

I’ve had a long-running debate with various members of the PaaS community about 12-Factor’s relevance to enterprises. I’ve heard many claims that enterprises don’t want to adopt these patterns, and would rather mix their state, config, and application together as a tangled ball of mud. At a board level the enterprises I’ve worked for, and interacted with, have been seeking organisational agility – the kind of fast delivery and iteration PaaS brings to software development. Somehow this message often gets lost in middle management, leading to a resistance to change and clinging to legacy practices like a safety blanket.

We need to stop telling enterprises that modern patterns, such as 12 Factor and Microservices, won’t work for them. We need to help enterprises to lower their time-to-value and increase their operational efficiency. The only winners from keeping enterprises stuck in the dark ages are the incumbent vendors, happy to continue charging extortionate prices for outdated systems and software.

Enterprises can adopt 12 Factor, Microservices, and PaaS. They want to be more agile, not less. Let’s help them.

Christmas Garage!

Containers as a Service

Dec 28th, 2014

28th December, Blog #4

Containers as a Service

At a simple level, Infrastructure-as-a-Service deals with virtual machines, Platform-as-a-Service deals with applications, and Software-as-a-Service deals with users. I’ve spoken a few times about Containers-as-a-Service(CaaS) – the idea that containers will become a new meaningful unit of currency in cloud computing. I’ve also written about the ramifications of CaaS for PaaS.

The hottest debate in the container ecosystem seems to be whether to use containers as single processes or multi-process virtual machines. The Docker folks seem to encourage the use of single process containers – a view I’m supportive of, following principles from software development, such as single responsibility and inversion of control. The alternative approach is to use containers as virtual machines with many processes. The virtual machine approach seems to resonate well with the current generation of operating systems and configuration management tooling.

I believe both approaches will continue to thrive for some time. The true value in Docker lies in the packaging and portability of containers, and this holds true whether you’re packaging filesystems for a single process or a number of processes. Containers used as virtual machines will quickly erode the IaaS market – if only for their portability, speed, and density. Consumers and producers can gain benefits when choosing the CaaS abstraction over IaaS.

Over time there will be a move towards per-process containers, away from virtual machine-like containers, as the deployment and orchestration benefits from this approach become clearer. Just as SOLID, and TDD took time to gain momentum in software development so the correct patterns for containers will gain traction over time.

We will also see increasing numbers of customised operating systems, designed to run with the Linux kernel but providing a subset of the functionality, tuned for single-purpose containers. OSv is a great example of this. I also think we’re likely to see an increasing number of container host OSs, presenting something that looks like a Linux kernel to hosted containers, under development. The boundaries between hypervisor and namespaced kernel will blur.

The other main areas for innovation will be storage and networking. Providing containers with reliable storage for mutable state is challenging. The ClusterHQ folks have made great progress in this area using ZFS-on-Linux in Flocker. There are also exciting developments in the networking space from SocketPlane and Weave – simplifying multi-host container networking.

CaaS solutions have already emerged from the big cloud players; you can easily host containers on GCE, AWS, and Azure. Rancher looks to enable CaaS on any infrastructure. Pivotal’s Lattice enables operators to deliver their own CaaS, as does its big brother Cloud Foundry. These solutions are currently largely focused towards Docker but I’m sure we’ll see Rocket and alternatives emerge.

Whilst the future may look bright for containers, there are some warning signs. Multi-tenant container isolation and security are currently far from production ready. It also seems that choosing a layered filesystem beneath your containers can be a risky business; I’ve been bitten by few bugs.

I think CaaS will grow exponentially over the next few years. Tutum are building the first pure CaaS I’ve seen. They attracted $2.65M in seed funding this year. I’m sure they’ll have plenty of competition in the near future.

Christmas Garage!

← Older Blog Archives