Hatofmonkeys

Opinions on building platforms

The Heterogeneous Workforce

27th December, Blog #3

The Heterogeneous Workforce

I enjoy regularly visiting and speaking at tech conferences. They’re a great way to stay abreast of trends and keep a fresh perspective on how the industry is delivering value. Unfortunately I’m usually dismayed at the homogeneous composition of the conference attendees; the overwhelming majority are white males, like myself.

As a company CEO I see this as a tremendous waste. The key role I play in the company is the recruitment and retention of the members of our great team. Enlarging the pool of resource I can draw from will only increase the quality of the team, the diversity of opinions, and create a balanced working environment. I’m not trying to get on my moral high horse; I want a better team so I can make more money.

If you’re interested in PaaS, distributed systems, and Extreme Programming, please do contact CloudCredo – especially if you don’t ‘fit the mould’. We’re looking for fast learners and good communicators. I’d love to think there’s a pool of untapped skills out there I can use to help build our team. I can’t promise to run the perfect company, but I can promise to try.


More Christmas UKG!

Cloud Foundry Summit 2014

26th December, Blog #2

I wrote this blog back in June but didn’t have time to publish it. No time like the present.

Cloud Foundry Summit 2014

Now the dust has settled following the CF Summit I thought it a good time to note some of the lasting impressions.

CF Summit was a great conference. The buzz and energy around the place was tangible. This year’s host, Andrew Clay Shafer, gave the event a friendly, personable feel, and also has(had) great hair. The big hitters from the Cloud Foundry community were present, along with a newer users from a diverse range of organisations.

Personally, I felt the Summit represented the emergence of Cloud Foundry as the leading enterprise PaaS. I backed Cloud Foundry from day one, and I’ve been vocal about the shortcomings of other PaaSes that fall short of CF. Witnessing some of the biggest names in tech queuing up to take the stage to endorse Cloud Foundry was immensely gratifying. The corollary to those endorsements was the quantity of real world Cloud Foundry success stories in the talks; Cloud Foundry has broken through, and proven its worth.

One of the chief reasons for the dramatic rise in enterprise adoption of Cloud Foundry has been the formation of the CF Foundation. The Foundation members were well represented at CF Summit, from small innovators such as CloudCredo, to tech giants such as IBM and Cisco. It’s fantastic to see the Cloud Foundry ecosystem crystallise as a Foundation to move the project forward.

The noticeable trend amongst the Summit attendees was big businesses realising the importance of speed. Canopy are shining example of this; formed from large enterprises but able to quickly and effectively make use of outstanding new technologies. I fear for their competition; it’s striking when organisations decisively deliver capability at this scale.

We were overjoyed with the Summit from a CloudCredo perspective. The ecosystem’s response to our work with them was fantastic. We are proud our work was mentioned in the following talks:

We’d love to be able to add your organisation’s name to that list. Please get in contact to talk about how we can help you deliver with Cloud Foundry.


Christmas Garage!

The Twelve Blogs of Christmas

25th December, Blog #1

Twelve blogs, twelve days

I’ve got a backlog of half-formed blog posts (backblog?) that have been hanging around, like bad smells, for a while. I’ve decided to get them all done and out over Christmas. I’m cheating by imposing the following restrictions:

  • This is the first blog and counts as a blog post even though it’s a meta-post
  • I reserve the right to post days, or even weeks, late. I’m in Sark for some of Christmas and I will be burned alive in a wicker man if I’m seen using a laptop
  • I reserve the right to post nonsense due to Christmas-related alcohol abuse
  • I reserve the right to blame alcohol for nonsense I’ve posted while sober
  • I have about eight posts lined up, most of which are nonsense
  • The remaining three posts are TBC and are guaranteed to be nonsense
  • I reserve the right to edit/delete all blogs at a later date once I realise how nonsensical they are

Random asides

I’ve noticed the vast majority of my traffic is from the USA. As part of my single-handed mission to bring back UK Garage, and further its influence in the world, I will be linking to some of the UKG greats at the end of each blog post.

Bo selecta!

Let the blogging commence!

Articles

  1. The Twelve Blogs of Christmas
  2. Cloud Foundry Summit 2014
  3. The Heterogeneous Workforce
  4. Containers as a Service
  5. Twelve Factor Enterprise
  6. PaaSaaP and the Distro Wars
  7. Multi-Site Cloud Foundry
  8. Mutable State
  9. Service Foundry
  10. The Problems With PaaS
  11. How I Build Stuff
  12. The CloudCredo Way

Docker, Investors, and Ecosystems

Rocket

As you’re probably aware by now – CoreOS have released a new container runtime called ‘Rocket’. Rocket will inevitably be perceived as a reaction, and competitor, to Docker. The language in CoreOS’s Rocket blog post suggests Docker have deviated from CoreOS’s preferred path, and the tone suggests an accusation of betrayal. The timing of CoreOS’s announcement, the week before DockerCon EU, implies a deliberate attempt to undermine Docker’s marketing.

Reaction

GigaOM have published a great summary of the community’s reaction here. Given Docker’s amazing ecosystem and adoption levels I’m actually quite surprised more people haven’t come out in strong support of Docker. Solomon Hykes has exactly three things to say (followed by a subsequent thirteen) here.

The Changing Docker

I spoke at DockerCon 2014 about the Cloud Foundry/Docker integration project I’d been working on. This work has now been subsumed into the Docker on Diego work which is beginning to form the core of Cloud Foundry V3. While working on the initial proof of concept I had regular communication with the people at Docker. I found them friendly and open to the idea of using Docker containers as another deployable unit within Cloud Foundry.

Once Docker raised their $40M investment the tone changed. Docker became a ‘platform’. Docker’s collaboration with Cloud Foundry, to use Docker inside Cloud Foundry, seemed to stall. It appeared Docker were trying to eat their ecosystem. Was investor pressure for a huge return causing Docker to try to capture too many markets rather than focusing on their core?

Cloud Foundry Future

Cloud Foundry will continue to orchestrate various units of currency; applications via buildpacks, containers via Docker, and potentially containers via Rocket. My company, CloudCredo, is already looking at what a Rocket integration with Cloud Foundry would look like. cf push https://storage-mirror.example.com/webapp-1.0.0.aci is on the way.

Configuration Management isn’t Stupid, but it Should Be

Devops is about holistic, systems-orientated thinking; it’s been misappropriated to be about configuration management. I’ve noticed a decline in the number of people from a development background at Devops conferences – maybe they’ve lost interest in talking about Puppet vs Chef vs Salt vs Ansible vs CFEngine vs X vs Y vs Z?

Bricks

The role of infrastructure should be to provide reliable, consumable bricks to enable innovation at higher levels. If we create beautiful, unique, novel bricks it becomes impossible to build houses. This is a problem I see regularly with OpenStack deployments; they’re amazing, wonderful, unique, organic. I cannot, however, easily deploy platforms to them.

Configuration Management

This issue manifests itself in deployments orchestrated by configuration management.

  • Complexity – you can do some amazing things in Chef as you have the power and flexibility to run arbitrary Ruby during your deployments. I have seen this abused many times – and done this myself. Too much clever branching logic and too little reliable code
  • Determinism – configuration management tools often provide a thin veneer over non-deterministic operating system commands
  • Reproducibility – server scaling operations often fail due to poor dependency management and non-deterministic actions

Configuration management is too focused on innovation at the server level rather than thinking about the entire system. Devops has become a silo.

A Better Way

There are some tools and patterns emerging to tackle these problems.

  • Immutable infrastructure – remove the drift
  • Docker/Decker – testable, simple, small, disposable containers
  • Nix – declarative, deterministic package management
  • OSV – stupid(brilliant) operating system to enable innovation
  • BOSH – stupid(brilliant) tool to deploy complex distributed systems
  • Mesos – schedule commands(jobs) to run in a distributed environment

‘Infrastructure as Code’ to ‘Infrastructure as Good Code’

We need SOLID for infrastructure. We need to develop standardised, commoditised, loosely coupled, single-responsibility components from which we can build higher-order systems and services. Only then will we be enabling innovation higher up the value chain.

Devops should be about enabling the business to deliver effectively. We’ve got stuck up our own arses configuration management.

When to Pass on a PaaS

There’s no point pretending I don’t love ‘The PaaS’. I do. I have spent too much of my career fighting battles I shouldn’t be fighting; re-inventing similar-looking systems time and time again. The idea I could just drop an application into a flexible platform and expect it to run, without consuming my entire life for the preceeding three months writing Chef cookbooks and wrestling EC2, sounds fantastic.

Cloud Foundry

Having played with Heroku, and being dismayed at not being able to play with my own Heroku, I was overjoyed when VMWare released Cloud Foundry. I created a Vagrant box on the Cloud Foundry release day and began distributing it to clients. I worked with one of my clients at the time, OpenCredo, to develop and deploy one of their new services to a Cloud Foundry installation I created. I believe this was the first SLA-led production deployment of Cloud Foundry globally.

I spoke about the rationale behind running/developing your own PaaS at QCon London 2012. I also discussed some of the use cases I’d fulfilled using PaaS, including OpenCredo’s.

QCon 2012 – Lessons Learned Deploying PaaS

OpenShift

I was similarly happy when I heard RedHat had bought Makara, a product I’d briefly experimented with, and were looking at producing their own PaaS. I’ve used RedHat-based systems for many years with great success, have always found YUM/RPM great to use, and was apparently the 7th person in the UK to achieve RedHat Architect status. A RedHat-delivered PaaS would surely be the panacea for all my problems.

I was scoping a large project at this time with availability as a prime concern. It occurred to me that I could use two turnkey PaaSes simultaneously, Cloud Foundry and OpenShift, such that if there was an issue with either I could simply direct all traffic to the other PaaS. I discussed the deployment and progress at DevopsDays.

DevopsDays Rome 2012 – How I Learned to Stop Worrying and Love the PaaS – I start about 32 minutes in.

From Dream to Reality

Unfortunately, the project didn’t quite work out as planned. We had a number of issues with OpenShift which meant we had no choice but to withdraw it from production usage. Scalability was an enormous problem; bringing an application to production scale was a sub-twenty second operation in Cloud Foundry; it took fourty-eight hours plus in OpenShift. We had to write our own deployment and orchestration layer for OpenShift based on Chef and shell – Cloud Foundry has the fantastic BOSH tool enabling deployment, scaling, and upgrades. These reasons, alongside some nasty bugs and outages, meant we were unable to use OpenShift for our deployment.

Beyond this I feel OpenShift, like many ‘PaaSish’ systems, has got the focus wrong. There seems to be a plethora of container-orchestration systems being produced at the moment, which are really just a slight re-focus on the IaaS abstraction layer. OpenShift is in danger of falling into this trap. PaaS needs to remain focussed on the application as the unit of currency, and not the container or virtual machine. It looks entirely possible (and would make an interesting project) to run Cloud Foundry inside OpenShift, illustrating the conceptual difference.

We settled on using distributed Cloud Foundry instances across diverse IaaS providers to deliver the project; it was a great success. I blogged about it for Cloud Foundry’s blog.

CloudFoundry.com blog post – UK Charity Raises Record Donations Powered by Cloud Foundry

Which PaaS?

I’ve remained supportive of RedHat’s efforts to deliver a PaaS solution but fear they’re not quite there yet. I organised a London PaaS User Group meetup to help RedHat to put their side of the case across to the London community. It sounded like they had some exciting developments in the pipeline but even with the new features it’s likely we would have been unable to deliver the enterprise-grade services our project required.

Perhaps Redhat’s customer base are largely systems administrators rather than developers. Perhaps Redhat have more experience at deploying and managing servers than applications. For whatever reason, I think it would be a denigration of PaaS to allow it to be misconstrued as container-based IaaS. Containers can be a by-product of service provision but should not be the focus of PaaS.

Cloud Foundry isn’t perfect but, at the moment, is the only PaaS product I’d recommend to anyone looking to make a long-term investment.

System Build Reproducibility

I’ve been on the receiving end of build reproducibility rants from developers at plenty of conferences. Their bile is usually aimed at Maven’s snapshot functionality. I’ve often questioned how reproducible their systems are; I’m usually met by a blank look.

I’ve always aimed to make system builds reproducible, but with little success. Gem, pear, pecl, rpm, license agreements, configure/make/make install: they all take their toll. This can lead to inconsistent builds between environments – or even in a single tier – due to scaling up/down.

As I’ve tended to use RPM-based systems (misspent youth), I’ve attempted, wherever possible, to get all non-configuration files on a server into RPMs. I’ve been more promiscuous with configuration management, moving from home grown, to Cfengine, via Puppet, to Chef. I’m currently using chef-solo, with tooling such as Noah and MCollective for orchestration. Don’t even mention the number of deployment/ALM tooling solutions I’ve been through(although Capistrano has never annoyed me to any great extent).

Even with long term usage of RPMs, build reproducibility has been far from simple. RH Satellite/Spacewalk should make this easy, but unfortunately it’s a bloated mess. I’ve usually resorted to simple apache/createrepo, but this poses its own problems. Do you have a repo per environment? How do you track which servers were built against which repo? How do you roll out updates in a manageable fashion?

I’ve created a simple setup called Yumtags! to address some of these issues. The basic idea is that you can drop RPMs in to a directory, and then “freeze” the directory at that point in time by creating and storing repository metadata against a tag. This tag can then be used, perhaps in a chef-solo-driven repository definition, to update, build, and reproduce systems in a known state. It currently features simple JSON-driven integration for CI systems, so RPM-based integration pipelines can be easily automated. There’s a million and one things missing from it, but now it does the basic story I’ve shared it for others to hack on.

Monitoring-Driven Operations

RAMBLING BLOG POST ALERT

Monitoring sucks.

Following the “if it hurts, do it more often” mantra that has driven the success of patterns such as Continuous Delivery, there might be some value in jumping head-first into the world of monitoring.

I’ve been evangelising to anyone that will listen, and a lot of people that won’t, about declarative/convergent frameworks for some time now. Sometimes you have to describe how to converge (definitions may not yet exist), so I particularly enjoy working with frameworks such as Chef that enable you to easily move from declarative to imperative as the need arises.

These two trains of thought(monitoring + system convergence) collided a while back to make me think about Monitoring-Driven Operations, i.e. that we can declare the status of the monitoring systems (that servers and services are available during expected hours) and converge the environment(s) on this desired state if there’s a gap between observed and desired states. MDO for operations, TDD for developers.

It turned out I wasn’t alone in thinking about monitoring from this perspective.

Reusing the application behaviours in production is a natural, logical extension of the testing pipeline. Were we able to ensure that the behaviours include non-functional requirements, would it be possible to use these monitored behaviours to converge an environment towards a state that passes all scenarios?

The missing link here is the imperative element. We can declare the desired state for an environment (cucumber-nagios is a good start), however we need a framework to express how we get there. I think Chef/Puppet can help with a lot of the hard labour here, but I don’t think that either, in their current formats, are appropriate to converge a service from a failed monitoring check.

Cloud Foundry(brilliant) uses a health manager, message bus, and cloud controller interacting to accomplish something similar. In this situation the cloud controller knows how to converge the state of the environment when the health manager observes a disparity between observed and desired states.

I’m thinking about developing a system that works in the following way(“The Escalator”). Please get in contact if you’ve got any feedback.

  • Barry McDevops declares an escalation pattern for his environments. These are the levels of escalation for convergence that failing checks can be matched to. As a crude example:
    1. Create account with IaaS provider
    2. Create core networking and support infrastructure
    3. Create tier networking and all VMs
    4. Create an individual VM
    5. Converge VM with Chef
    6. (Re)deploy application
  • Barry creates a series of monitoring checks for the non-functional requirements
    1. IaaS provider is online
    2. (per node) ICMP ping reply
    3. (per node) app and DB services are online
    4. (per tier) app and DB services are online
    5. Smoke check of application
  • Barry maps each monitoring check to an escalation level
    • Check 1 => Level 1
    • Check 2 => Level 4
    • Check 3 => Level 5
    • Check 4 => Level 3
    • Check 5 => Level 6
  • Once engaged, the monitoring system will quiesce for 30 seconds, observe the highest level of required escalation, and then ask the escalation system to take that level of action (and all levels below it).

As examples: – If there is nothing in place (failed IaaS provider or a new system) – an entire new environment is built from scratch (escalation steps 1 through 6). – If a node fails: a new node is built, converged, and deployed to (steps 4 through 6). – If the application fails the smoke test, the application is redeployed (step 6).

Obviously it is up to Barry to ensure his escalation steps are sensible, i.e. use multiple IaaS providers, redeploy previous versions of applications if they persistently fail smoke tests. Each escalation step should declare a quiescence period, during which no further actions will be taken. There’s no point attempting to deploy an application if you have no nodes.

If an escalation process fails, the monitoring system could attempt a re-convergence if necessary or contact an administrator.

This overlaps significantly with a couple of discussions: here and here at the recent Opscode Community Summit, so perhaps someone else is already creating something similar with Chef.

Bridging the Gap Between Functional and Non-Functional

According to the principles behind the Agile Manifesto

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.

As I’m from an operations background, I’ve always had great trouble communicating with the customer over requirements that have been relevant to my domain. The usual situation is that the “customer”, often the product owner, views the working i.e. functionally complete software on a developer’s machine, but bringing this to an operational service level is an afterthought (until it goes offline, and the real customers start complaining).

Traditionally, capturing these specifications has fallen under the umbrella of “non-functional requirements”. As Tom Sulston pointed out to me, this suggests requirements that aren’t working, which is precisely the opposite of what we’re attempting to express.

I’ve sought to tackle this in a couple of ways.

  1. Spread FUD throughout the non-technical customer base.

    • “Do you want it to explode?”
    • “NO!”
    • “Well you should ask for non-exploding software in your stories then!”.
  2. Get the team in a room together and express “cross-cutting concerns” (I half-inched the idea from AOP) that span the project as a whole.

I haven’t been happy with the results of either approach, so I’d be interested to talk to anyone with a satisfactory solution in this space.

Continuous Deployment - A Vanity Metric?

I’ve recently seen a few articles/presentations carrying the claim “481,000 deployments a day!”, “Deployment every 3 seconds!”, or “We deploy more frequently than we breathe – or we sack the junior ops guy!”. Very exciting.

Having the capability to deploy frequently is important for a variety of reasons; fast feedback, quickly realising value, reducing risk deltas, increasing confidence, and so on. However, I also think that frequent deployment is useless without making use of that feedback.

Continuous Deployment is an enabler of fast feedback, but it’s not the end goal. If the feedback isn’t utilised by product owners to inform their decisions, there’s little point in creating it. The practice becomes a local optimisation.

I’ve deliberately chosen to differentiate Continuous Delivery from Continuous Deployment here, as I believe Continuous Delivery implies that value is being delivered, whereas Continuous Deployment suggests focusing on deploying frequently.

We need to optimise cycle time for the whole business, not just the dev/ops/devops/<current silo label> team.