BOSH deploys and manages all the things. It’s the best distributed-system-system I’ve ever used.
BOSH should be the deployer of Diego and other container schedulers.
I currently favour option 2 but reserve the right to change my mind as I learn over time. Previous attempts to build multi-purpose stateful/stateless PaaSs haven’t gone amazingly well. If you’re building a stateful service right now, build it for BOSH, not Diego.
We’ll be exploring the cool new future of Diego in a LoPUG hack day and looking at ideas like a Diego CPI for BOSH, persistent disks, and TCP networking in PaaS. Join us to find out more.
]]>Today, January 5th, is the last day of the twelve days of Christmas, so this is the last blog in The Twelve Blogs of Christmas. I survived. I may not have saved the best for last, but I’ve certainly saved the most important. This post is all about people, and how to get people delivering value. Great people make great things. Great things make great profit, or so I’m hoping.
This post is about growing the culture within my company, CloudCredo, and the processes we use. I’ve spoken to many senior people in technology that believe you shouldn’t tell developers how to develop, and should just let them get on with it. I firmly disagree with that opinion. I think a company should be clear about how you want to deliver value, and you will then attract people that ‘fit’ your culture. If you don’t choose a way of working you are effectively saying “We let the senior staff bully the junior staff, code like cowboys, and do whatever they like.”. But that’s just my opinion.
I presented most of this content at Extreme Programmers London and had a great response. I thoroughly recommend attending their meetups.
I have been massively influenced by “The Pivotal Way”, in fact our working practices emulate it in most aspects. Imitation is the sincerest form of flattery. I had the tremendous good fortune to be invited to spend a few months in Pivotal’s Cloud Foundry Dojo (I believe I may have been the first person through it) during CloudCredo’s early days, which was an amazing experience. I’ve taken those learnings into CloudCredo and iterated on them.
We’re using the processes to deploy and develop agile platforms and infrastructure – pair-programming test-driven development with our clients. We use the Tracker-based planning process to empower authoritative product managers to prioritise bugs and features on multi-functional teams. You could call them ‘devops’ teams but I think that label has had all possible value eroded; they’re teams focused on delivering business value. You write it, you run it.
I’m obsessed with fast feedback: I view the process as a set of shrinking feedback loops, making their way inwards towards the continuous feedback of pair programming, and expanding back out again. These loops are a good starting point until you find something that might work better. Have the courage to experiment. This content is severely abridged for brevity.
0. Pre-inception – qualify readiness to start the process
1. Inception – reach team consensus on a meaningful step forward, typically three about three months
2. Iteration Planning Meeting – plan the next iteration, pointing
3. Standup – today’s work and pairs
4. TDD (for design, refactoring, confidence, and numerous other reasons)
4. Acceptance – by an empowered product manager, the ‘CEO of the product’
3. Standup – helps/interestings/blockers from yesterday
2. Retrospective – can we improve how we’re delivering? The first derivative of delivery.
1. Review – can we improve how we’re improving on delivery? The second derivative of delivery.
Over the course of my career I’ve delivered many projects using a traditional “up-front architecture” approach, heavy on Gantt charts and light on prototypes. This was necessary when procuring servers (and the data centre space for them to live in) took months if not years. The cost of change for the infrastructure was massive; similar to changing the foundations beneath a skyscraper. The architecture needed to be right, and IT processes evolved around these assumptions.
The big change in our industry has been the increased agility of infrastructure. A six-month server procurement has become a thirty-millisecond container-starting API call. Infrastructure is now defined as code. The incredibly low cost of change means we can now use a process that embraces, rather than resists, change – and that process is Extreme Programming(XP).
Six Sigma is a fantastic methodology for eliminating defects and minimising deviation. Six Sigma is easy to implement with software: cp -ra
. It’s less relevant when developing and running new services, where learning is more important.
Scrum and XP are both agile methodologies although I believe XP has a greater focus on learning. XP embraces change during sprints/iterations, Scrum favours commitments. XP is a more holistic process, recommending engineering practices such as TDD and pair-programming. As mentioned above, I see value in forming explicit opinions on these practices.
Tracker-based planning is actually a software-development-domain specific implementation of Kanban. Blur your eyes and you can see the swim lanes. Achieving and maintaining stable delivery flow is very important.
As we’re building and deploying new systems and services we need to embrace learning as we make progress. We’re helping our clients to learn – and we’re using a process that facilitates learning and embraces change based on that learning. Between competing organisations the organisation that can learn fastest, and channel its efforts in the right direction, will win. We learn through feedback loops such as PDCA, OODA – both developments of scientific method.
The focus of XP is learning. Learn about what you’re building so you can adjust your plans as you find out what’s good and what’s bad about what you’re building. Learn about how you’re building it; what’s working and what needs adjustment. Use emergent velocity for data-driven planning.
There are a number of patterns we’re using to support our chosen methodology. Here’s a small selection:
This information begs an obvious question: if this is CloudCredo’s secret sauce, why would I blog about it publicly? This again comes back to XP, and two of XP’s core values – courage and communication. I have the courage to lay CloudCredo’s approach bare for all to see. If we’re not working in the right way, or there’s flaws, I hope they will be pointed out so we can improve. We’re continuously trying to improve. Please get in contact if you’ve got any feedback.
If you’ve been thoroughly confused by the UK Garage links at the end of all the Twelve Blogs of Christmas, well, maybe it’s a London thing.
]]>This is likely to be the most nonsensical of my thoroughly nonsensical Twelve Blogs of Christmas. I wanted to capture the recurring themes of the processes I’ve employed when bringing software to life; from inception to massive scale. Everything here should be overridden by domain-specific knowledge/concerns as appropriate. Or ignored altogether :)
If you’ve had an idea for something, it’s probably already been done. Go and have a look. If your idea hasn’t been done it’s probably quite likely you can reskin/resell/repackage/repurpose a current service to deliver the new service you’re hoping people will find valuable.
If your idea hasn’t been developed before, look for two or more currently running services you can combine to create the value. Use SaaS. If you have to write software, host it in a PaaS that does as much heavy lifting as possible. Don’t start with infrastructure and build upwards. You don’t have time, and you’ve got better things to do.
Get meaningful feedback about your service as soon as possible. Your idea is probably stupid, and you should stop (or pivot). That’s the harsh reality of the situation. Continue getting feedback about your decisions as fast as possible and as frequently as possible. Your ability to gain and act on feedback will be the key determinant of the success or failure of whatever you’re building. Read ‘The Lean Startup’. Cucumber-value is a rubbish piece of software, but should make you think about how you can quantify and measure real metrics for success.
Business services are owned by small teams. Teams own one or more (micro?) services. Use Domain-Driven Design as a guide for breaking large problems into smaller services. Don’t overly engineer towards microservices, let them emerge, you can break them down further at a later date. Don’t be afraid to sacrifice your architecture if it needs replacing; you’re learning. Hire people that learn fast; they can gain knowledge from others in the team. Use ‘The Pivotal Way’ – lightweight XP. Initially create a Git repository per service; shared code will emerge into new repositories. Use GitHub or BitBucket – you’re not a Git hosting service (unless that’s your idea, and if it is, it’s already been done).
Deploy two(or more) Cloud Foundry instances per (micro)service. Use run.pivotal.io, Anynines, Fjord IT, AppFog, Bluemix, or any other hosted Cloud Foundry (not your own). Write and deploy your (micro)services quickly, in languages you can develop quickly in: we like Go and Ruby. If you need to develop in a less agile language as requirements emerge you can do that later. Don’t be afraid to use a range of languages; the cost of doing this is mitigated by the PaaS abstraction. Kill your (micro)services and create new ones as necessary.
Begin with a test for ‘Hello World’, an app that outputs ‘Hello World’ in your chosen language/framework, and a CI server that can run your test then deploy the working software to Cloud Foundry. Iterate from there. Use Travis or Circle until you need your own Jenkins/GoCD/Concourse solution.
Log when things go wrong. Measure the latency(time from request to response) when things go right. Use Papertrail for the logs. Use New Relic for the metrics. Replace Papertrail with Logsearch and New Relic with Graphite when you need to.
Use JSON to communicate between your (micro)services. Use JSON templates as lightweight contracts between (micro)services. Use HTTPS with circuit breakers for synchronous communication. Use Redis or RabbitMQ for asynchronous communication. If your code can’t get to Redis/RabbitMQ/another HTTP endpoint – die quickly rather than tying up resources.
Start with JSON in Redis. Move to JSON in MongoDB when you need to query the data with more flexibility. Place your BLOBs in AWS S3. Once you know whether your data – per (micro)service – needs to be consistent or available you can change data store as required. Cassandra is a great available data store. Postgres is a great consistent data store (if you can’t afford Oracle RAC, which you can’t). If you’re generating huge quantities of events throw them into Hadoop.
Generate events – immutable statements of fact – based upon actions. Simple, immutable, repeatable JSON events are a great start. Store the events in an event log and use them to mutate the data stores powering consuming services – separating commands from queries.
Customising Cloud Foundry is remarkably easy; do it when requirements demand. Adding buildpacks, services, even stacks (I added Docker) is straightforward. Don’t start with Cloud Foundry customisation but don’t be afraid to take the step if you need to. Deploy to your own IaaS or even your own metal if security/regulation/performance concerns necessitate it.
In summary: get fast feedback about the software you’re bringing to life. Fail fast if you’re doing the wrong thing. Don’t over-engineer; do what works, deliver quickly. Realise the opportunity cost of your (and your team’s) time. Could you be building something more valuable instead?
Platform-as-a-Service(PaaS) isn’t perfect. There are always going to be some things it does well and some things it does badly. This post takes a look at some of the things it does badly, and how we can make improvements in the future.
This excellent article highlights some key points related to problems with PaaS. I absolutely agree that the data service journey with current implementations is painful. I’ve written about CloudCredo’s plans for “Service Foundry”; this is work-in-progress and needs to be an area of greater focus. Stateful services need to become first-class citizens in the PaaS landscape.
I also agree that maintaining PaaSs is currently too difficult. I think this is a symptom of two separate issues. Firstly, the current crop of configuration management tools (Chef/Puppet/Salt/Ansible etc.) are not fit for the purpose of deploying and maintaining distributed systems, such as PaaSs. Secondly, BOSH is the right tool for the job but currently has a difficult user journey. CloudCredo have invested time and effort attempting to make BOSH easier to consume – but again it’s another work-in-progress. We need to get better at making PaaSs easier to operate, maintain, and upgrade.
Another good blog post highlights how networking concerns can block PaaS adoption. Since the writing of that post a couple of advancements have been made. There is now an easily consumable BOSH release to enable encrypted network traffic of any BOSH-deployed service – although we should always question how secure our encryption is. There is also user-configurable networking inside Cloud Foundry. I believe these additions go a long way towards mitigating user concerns, but I’d certainly be interested in further feedback related to PaaS networking.
The greatest strength of PaaS is that it’s a black box for running your applications; it allows developers to focus on delivering value rather than operating a platform. The greatest weakness of PaaS is that it’s a black box for running your applications; when things go wrong it can be difficult to work out what’s happening. If you application is performing poorly on Heroku, what do you do next? Spend more money and hope? Cloud Foundry’s new Firehose generates huge volumes of information but can prove difficult to consume for PaaS novices. Buildpack integration with monitoring systems is clearly helpful but we could still make enhancements in this area.
PaaS will only improve if we identify and expose the flaws. We need more users, more critiques, more real-world scenarios. Please get in contact if there’s any burning issues blocking your adoption of PaaS.
As I’ve mentioned a few times, Cloud Foundry has won the Platform-as-a-Service(PaaS) war for stateless 12-Factor applications. Stateless PaaS will now be a Cloud Foundry distro war. We all know mutable state is the root of all evil so I’ve been thinking about a stateful CF-like PaaS since I first interacted with Cloud Foundry. In fact, Cloud Foundry version 1 had an interesting service integration, which served to highlight the gaps in the PaaS journey for stateful application developers. I tweeted a while back about some of the work we’re doing in this area; this post will expand on our plans for ‘Service Foundry’ – a PaaS for stateful deployments.
It’s interesting to consider that you could deploy Cloud Foundry itself in such a system. Does this actually look like a manifesto for BOSH version 2? Or does this look more like OpenShift v3? The Kubernetes overlap certainly suggests something similar to RedHat’s latest IaaS+ effort.
I’ve given talks suggesting that microservices and PaaS are the future of application development. Cloud Foundry makes it incredibly easy to be tall enough for stateless microservices. I want users to have a similarly easy journey for the state powering their microservices.
If you’d like to help with this effort please do get in contact with CloudCredo.
I enjoy attending developer-focused conferences – in particular QCon and GOTO – to talk to the real consumers of PaaS. I also attend infrastructure-orientated conferences as a consumer of IaaS. I was recently on the PaaS panel at the Apache CloudStack Conference when I made what seemed a bizarre statement to many in the audience: “If your problem isn’t mutable state then you’re probably doing something wrong.”. I’m taking this opportunity to explain what I meant.
I’ve actually been making statements in this vein for a long time. In my experience, when delivering solutions, you can usually find a correct answer to your problem, given time to research and engineer sufficiently. The majority of optimisation problems have occurred when I’ve been facing issues related to mutable state. These generally involve functions related to latency, consistency, availability and performance. CAP theorum is perhaps the most frequently occurring; choose availability or consistency in the face of network partitions in a distributed system. You cannot have both.
Approaches to mutable state vary across programming paradigms. Object-orientated programming favours encapsulating mutable state in objects; usually controlling concurrent access to the mutable data via mutexs or semaphores. In my experience this can lead to heavy performance issues when scaling. Functional programming favours having no mutable state; instead passing immutable values between functions. This causes different issues as it shifts the burden to the developer to model their data – in what can sometimes be an unnatural way.
Mutable state in Infrastructure-as-a-Service can also cause issues. VMWare’s various infrastructure offerings favour a consistent view of the infrastructure landscape, often using Oracle or MSSQL as an ACID-compliant database for consumers to rely upon. This makes the API easy to consume but difficult to scale to huge levels for producers. Consumers can rely on their mutations being immediately reflected across the API.
AWS EC2 offers an eventually consistent view of their landscape. This means EC2 can be delivered at a truly unprecedented scale but can cause issues for platform-level consumers. Intelligence must be built into the tooling that consumes the API, as large consumers of EC2, such as Netflix, have done.
At a Platform-as-a-Service(PaaS) level we have tended to deal with the issue of mutable state by abdicating it to external services. Stateless application hosting is a done deal – Cloud Foundry has won that battle. Stateless application developers can choose to locate their data in whichever external service best suits the data; this is the polyglot approach embraced by PaaS. The real issues arise when we attempt to provide a Cloud-Foundry-like journey for developers and operators around stateful services.
An increasing number of PaaS developers appear to have become preoccupied with scheduling. Scheduler debates are so hot right now. No PaaS conference would be complete without an Omega-style optimistic scheduler versus Mesos-style pessimistic scheduler conversation. In my experience scheduling performance is rarely the constraint blocking PaaS adoption: the constraint is usually dealing with the necessary mutable state generated/consumed by the applications running in the PaaS.
RedHat recently stated that dividing stateful and stateless services “is an arbitrary distinction”. I don’t agree with this perspective; I think it’s a very important distinction. There are some key issues here – if an application container appears to have stopped functioning, what action should the PaaS take? If the container is stateless the PaaS can request a new container be started: potentially creating a duplicate. If the container is stateful the choice is more complicated. Should a new container be started, maximising service availability, but risking split-brain scenarios in a stateful environment? Should the container remain offline, reducing availability but ensuring data consistency? Should the container be restarted following a successful fencing operation? What does fencing look like in a distributed, scheduled, containerised PaaS environment? These questions may lead to a separate breed of stateful PaaSs emerging focusing on stateful concerns, even if they share similar-looking APIs.
1 2 |
|
The “Two PaaS” debate has led to some people perceiving Cloud Foundry as a stateless PaaS and BOSH as a stateful PaaS. I believe this is an incorrect interpretation. I think BOSH is a fantastic system – it has had a greater influence on me than Cloud Foundry itself – but it is not a PaaS. BOSH is the purest embodiment of the principles of Infrastructure as Code: it fires up metal, lays down code, and attaches disks for state. It is not, in its current form, a scheduler in the manner of Mesos/Omega. BOSH is the world’s best deployer of schedulers and distributed systems. A stateful PaaS would look more like Diego/Lattice, modified to address the concerns above. BOSH would be a great way to deploy this new PaaS.
CloudCredo’s clients often ask about how to run Cloud Foundry across multiple sites for performance and resilience. This is a natural question to ask: as an application developer, if I specify that I need to have ten instances of my application available, I expect ten instances to be available – whether the underlying infrastructure is available or not. The Platform-as-a-Service(PaaS) abstraction implies that my application should continue to run as desired, and that the PaaS should be designed to handle failures in the infrastructure layer.
This idea sounds great in theory but can lead to some problems in practice. What should Cloud Foundry do, as a distributed system, in the event of a network partition? Should each partition converge on the desired state of the whole system, leading to twice the number of applications being online, and potential split-brain issues with singleton applications? Should the whole Cloud Foundry shut down to ensure application consistency? The key point to note here, which is not immediately obvious, is that whilst Cloud Foundry hosts stateless applications, it is actually a stateful system itself. This means CAP theorem must be obeyed; we can opt for consistency or availability in our Cloud Foundry deployment.
Cloud Foundry has, at its core, a Cloud Controller database maintaining the desired state of the system. If we maintain a single, consistent state of this database we will be running a CP(consistent, partition-tolerant) Cloud Foundry. If we allow multiple, divergent copies of this database we will be running an AP(available, partition-tolerant) Cloud Foundry.
Cloud Foundry’s current engineering direction seems to favour a consistent view of the desired application state. This is exhibited by the choice of Raft as a consensus algorithm, and Etcd as an implementation, for CF’s next generation Diego components. These choices pose some difficult questions:
Running a consistent Cloud Foundry makes application deployment and management very easy; there is a single API endpoint for management. At CloudCredo we deploy consistent Cloud Foundry installations across multiple availability zones within a single region. This mitigates the latency concerns and provides a convenient management structure for application deployment.
When I’ve needed to deploy Cloud Foundry across multiple regions it has usually been to provide for a very high availability service level. To provide for this I have deployed multiple, completely separate Cloud Foundry installations, often on heterogeneous infrastructure providers, in diverse regions. An example of this kind of installation would be the donations platform for Comic Relief. The huge benefit of this strategy is that no outages in any individual Cloud Foundry can stop the service – availability is maximised. The two major downsides are that deployment and orchestration of applications becomes significantly more complex, and that the applications and data services need to be developed to handle being deployed in a distributed system of this nature.
I’m currently deploying multi-zone ‘consistent CF’ – and then multi-region ‘available CF’ if requirements demand it. This is a domain specific choice; and it takes a significant amount of work in the application(particularly around state) to bring multi-region availability.
Cloud Foundry has won the PaaS war. Heroku blazed the trail with their PaaS for Twelve-Factor Apps, and VMWare/Pivotal have built an open-source implementation and ecosystem to provide both enterprises and developers with an obvious choice. Buildpacks have become the standard for translation from application code to runnable unit.
The question is rapidly changing from “which PaaS are you using?” to “which distribution of Cloud Foundry are you using?”. Just as Hadoop became synonymous with big data so Cloud Foundry has become PaaS. The emerging battle is between the distribution vendors; Pivotal, IBM, HP and ActiveState already have offerings in the market. I’m sure we’ll see more from other players as the Cloud Foundry Foundation gains momentum.
The other fascinating aspect of the development of the Cloud Foundry ecosystem is ‘Platform as a Service’ versus ‘Platform as a Service as a Product’(PaaSaaP). Some vendors are offering Cloud Foundry as installable, supported software – where the onus is on the customer to deploy the software to their chosen infrastructure in order to provide a service. Other vendors are deploying and running Cloud Foundry on behalf of their uses to provide a true PaaS experience. A few vendors are offering both. Some PaaS purists have denigrated Cloud Foundry for offering this flexibility, but I see this as one of Cloud Foundry’s greatest strengths. Developers can minimise Time-to-Value by deploying quickly to a vendor’s cloud-based solution – and then deploy Cloud Foundry to their own infrastructure when non-functional requirements emerge to make a custom deployment necessary.
We will also see domain-specific Cloud Foundy implementations for particular markets. The ‘core’ Cloud Foundry specification, provided by the CF Foundation, will provide a key set of capabilities and an API for developers to work against – but we will see extensions providing for additional requirements and innovation. Time will tell which flavours of Cloud Foundry are successful and which are left behind.
The big paradigm shift for developers using Platform-as-a-Service(PaaS) is understanding “The Twelve-Factor App”; a set of patterns developed by the team at Heroku enabling applications to be orchestrated and distributed at scale. By adopting these patterns developers can take advantage of PaaS, via services such as Heroku and Cloud Foundry, reducing their operational responsibilities so they can focus on delivering value.
I’ve had a long-running debate with various members of the PaaS community about 12-Factor’s relevance to enterprises. I’ve heard many claims that enterprises don’t want to adopt these patterns, and would rather mix their state, config, and application together as a tangled ball of mud. At a board level the enterprises I’ve worked for, and interacted with, have been seeking organisational agility – the kind of fast delivery and iteration PaaS brings to software development. Somehow this message often gets lost in middle management, leading to a resistance to change and clinging to legacy practices like a safety blanket.
We need to stop telling enterprises that modern patterns, such as 12 Factor and Microservices, won’t work for them. We need to help enterprises to lower their time-to-value and increase their operational efficiency. The only winners from keeping enterprises stuck in the dark ages are the incumbent vendors, happy to continue charging extortionate prices for outdated systems and software.
Enterprises can adopt 12 Factor, Microservices, and PaaS. They want to be more agile, not less. Let’s help them.
At a simple level, Infrastructure-as-a-Service deals with virtual machines, Platform-as-a-Service deals with applications, and Software-as-a-Service deals with users. I’ve spoken a few times about Containers-as-a-Service(CaaS) – the idea that containers will become a new meaningful unit of currency in cloud computing. I’ve also written about the ramifications of CaaS for PaaS.
The hottest debate in the container ecosystem seems to be whether to use containers as single processes or multi-process virtual machines. The Docker folks seem to encourage the use of single process containers – a view I’m supportive of, following principles from software development, such as single responsibility and inversion of control. The alternative approach is to use containers as virtual machines with many processes. The virtual machine approach seems to resonate well with the current generation of operating systems and configuration management tooling.
I believe both approaches will continue to thrive for some time. The true value in Docker lies in the packaging and portability of containers, and this holds true whether you’re packaging filesystems for a single process or a number of processes. Containers used as virtual machines will quickly erode the IaaS market – if only for their portability, speed, and density. Consumers and producers can gain benefits when choosing the CaaS abstraction over IaaS.
Over time there will be a move towards per-process containers, away from virtual machine-like containers, as the deployment and orchestration benefits from this approach become clearer. Just as SOLID, and TDD took time to gain momentum in software development so the correct patterns for containers will gain traction over time.
We will also see increasing numbers of customised operating systems, designed to run with the Linux kernel but providing a subset of the functionality, tuned for single-purpose containers. OSv is a great example of this. I also think we’re likely to see an increasing number of container host OSs, presenting something that looks like a Linux kernel to hosted containers, under development. The boundaries between hypervisor and namespaced kernel will blur.
The other main areas for innovation will be storage and networking. Providing containers with reliable storage for mutable state is challenging. The ClusterHQ folks have made great progress in this area using ZFS-on-Linux in Flocker. There are also exciting developments in the networking space from SocketPlane and Weave – simplifying multi-host container networking.
CaaS solutions have already emerged from the big cloud players; you can easily host containers on GCE, AWS, and Azure. Rancher looks to enable CaaS on any infrastructure. Pivotal’s Lattice enables operators to deliver their own CaaS, as does its big brother Cloud Foundry. These solutions are currently largely focused towards Docker but I’m sure we’ll see Rocket and alternatives emerge.
Whilst the future may look bright for containers, there are some warning signs. Multi-tenant container isolation and security are currently far from production ready. It also seems that choosing a layered filesystem beneath your containers can be a risky business; I’ve been bitten by few bugs.
I think CaaS will grow exponentially over the next few years. Tutum are building the first pure CaaS I’ve seen. They attracted $2.65M in seed funding this year. I’m sure they’ll have plenty of competition in the near future.
I enjoy regularly visiting and speaking at tech conferences. They’re a great way to stay abreast of trends and keep a fresh perspective on how the industry is delivering value. Unfortunately I’m usually dismayed at the homogeneous composition of the conference attendees; the overwhelming majority are white males, like myself.
As a company CEO I see this as a tremendous waste. The key role I play in the company is the recruitment and retention of the members of our great team. Enlarging the pool of resource I can draw from will only increase the quality of the team, the diversity of opinions, and create a balanced working environment. I’m not trying to get on my moral high horse; I want a better team so I can make more money.
If you’re interested in PaaS, distributed systems, and Extreme Programming, please do contact CloudCredo – especially if you don’t ‘fit the mould’. We’re looking for fast learners and good communicators. I’d love to think there’s a pool of untapped skills out there I can use to help build our team. I can’t promise to run the perfect company, but I can promise to try.
I wrote this blog back in June but didn’t have time to publish it. No time like the present.
Now the dust has settled following the CF Summit I thought it a good time to note some of the lasting impressions.
CF Summit was a great conference. The buzz and energy around the place was tangible. This year’s host, Andrew Clay Shafer, gave the event a friendly, personable feel, and also has(had) great hair. The big hitters from the Cloud Foundry community were present, along with a newer users from a diverse range of organisations.
Personally, I felt the Summit represented the emergence of Cloud Foundry as the leading enterprise PaaS. I backed Cloud Foundry from day one, and I’ve been vocal about the shortcomings of other PaaSes that fall short of CF. Witnessing some of the biggest names in tech queuing up to take the stage to endorse Cloud Foundry was immensely gratifying. The corollary to those endorsements was the quantity of real world Cloud Foundry success stories in the talks; Cloud Foundry has broken through, and proven its worth.
One of the chief reasons for the dramatic rise in enterprise adoption of Cloud Foundry has been the formation of the CF Foundation. The Foundation members were well represented at CF Summit, from small innovators such as CloudCredo, to tech giants such as IBM and Cisco. It’s fantastic to see the Cloud Foundry ecosystem crystallise as a Foundation to move the project forward.
The noticeable trend amongst the Summit attendees was big businesses realising the importance of speed. Canopy are shining example of this; formed from large enterprises but able to quickly and effectively make use of outstanding new technologies. I fear for their competition; it’s striking when organisations decisively deliver capability at this scale.
We were overjoyed with the Summit from a CloudCredo perspective. The ecosystem’s response to our work with them was fantastic. We are proud our work was mentioned in the following talks:
We’d love to be able to add your organisation’s name to that list. Please get in contact to talk about how we can help you deliver with Cloud Foundry.
I’ve got a backlog of half-formed blog posts (backblog?) that have been hanging around, like bad smells, for a while. I’ve decided to get them all done and out over Christmas. I’m cheating by imposing the following restrictions:
I’ve noticed the vast majority of my traffic is from the USA. As part of my single-handed mission to bring back UK Garage, and further its influence in the world, I will be linking to some of the UKG greats at the end of each blog post.
Let the blogging commence!
As you’re probably aware by now – CoreOS have released a new container runtime called ‘Rocket’. Rocket will inevitably be perceived as a reaction, and competitor, to Docker. The language in CoreOS’s Rocket blog post suggests Docker have deviated from CoreOS’s preferred path, and the tone suggests an accusation of betrayal. The timing of CoreOS’s announcement, the week before DockerCon EU, implies a deliberate attempt to undermine Docker’s marketing.
GigaOM have published a great summary of the community’s reaction here. Given Docker’s amazing ecosystem and adoption levels I’m actually quite surprised more people haven’t come out in strong support of Docker. Solomon Hykes has exactly three things to say (followed by a subsequent thirteen) here.
I spoke at DockerCon 2014 about the Cloud Foundry/Docker integration project I’d been working on. This work has now been subsumed into the Docker on Diego work which is beginning to form the core of Cloud Foundry V3. While working on the initial proof of concept I had regular communication with the people at Docker. I found them friendly and open to the idea of using Docker containers as another deployable unit within Cloud Foundry.
Once Docker raised their $40M investment the tone changed. Docker became a ‘platform’. Docker’s collaboration with Cloud Foundry, to use Docker inside Cloud Foundry, seemed to stall. It appeared Docker were trying to eat their ecosystem. Was investor pressure for a huge return causing Docker to try to capture too many markets rather than focusing on their core?
Cloud Foundry will continue to orchestrate various units of currency; applications via buildpacks, containers via Docker, and potentially containers via Rocket. My company, CloudCredo, is already looking at what a Rocket integration with Cloud Foundry would look like. cf push https://storage-mirror.example.com/webapp-1.0.0.aci
is on the way.
The role of infrastructure should be to provide reliable, consumable bricks to enable innovation at higher levels. If we create beautiful, unique, novel bricks it becomes impossible to build houses. This is a problem I see regularly with OpenStack deployments; they’re amazing, wonderful, unique, organic. I cannot, however, easily deploy platforms to them.
This issue manifests itself in deployments orchestrated by configuration management.
Configuration management is too focused on innovation at the server level rather than thinking about the entire system. Devops has become a silo.
There are some tools and patterns emerging to tackle these problems.
We need SOLID for infrastructure. We need to develop standardised, commoditised, loosely coupled, single-responsibility components from which we can build higher-order systems and services. Only then will we be enabling innovation higher up the value chain.
Devops should be about enabling the business to deliver effectively. We’ve got stuck up our own arses configuration management.
Having played with Heroku, and being dismayed at not being able to play with my own Heroku, I was overjoyed when VMWare released Cloud Foundry. I created a Vagrant box on the Cloud Foundry release day and began distributing it to clients. I worked with one of my clients at the time, OpenCredo, to develop and deploy one of their new services to a Cloud Foundry installation I created. I believe this was the first SLA-led production deployment of Cloud Foundry globally.
I spoke about the rationale behind running/developing your own PaaS at QCon London 2012. I also discussed some of the use cases I’d fulfilled using PaaS, including OpenCredo’s.
QCon 2012 – Lessons Learned Deploying PaaS
I was similarly happy when I heard RedHat had bought Makara, a product I’d briefly experimented with, and were looking at producing their own PaaS. I’ve used RedHat-based systems for many years with great success, have always found YUM/RPM great to use, and was apparently the 7th person in the UK to achieve RedHat Architect status. A RedHat-delivered PaaS would surely be the panacea for all my problems.
I was scoping a large project at this time with availability as a prime concern. It occurred to me that I could use two turnkey PaaSes simultaneously, Cloud Foundry and OpenShift, such that if there was an issue with either I could simply direct all traffic to the other PaaS. I discussed the deployment and progress at DevopsDays.
DevopsDays Rome 2012 – How I Learned to Stop Worrying and Love the PaaS – I start about 32 minutes in.
Unfortunately, the project didn’t quite work out as planned. We had a number of issues with OpenShift which meant we had no choice but to withdraw it from production usage. Scalability was an enormous problem; bringing an application to production scale was a sub-twenty second operation in Cloud Foundry; it took fourty-eight hours plus in OpenShift. We had to write our own deployment and orchestration layer for OpenShift based on Chef and shell – Cloud Foundry has the fantastic BOSH tool enabling deployment, scaling, and upgrades. These reasons, alongside some nasty bugs and outages, meant we were unable to use OpenShift for our deployment.
Beyond this I feel OpenShift, like many ‘PaaSish’ systems, has got the focus wrong. There seems to be a plethora of container-orchestration systems being produced at the moment, which are really just a slight re-focus on the IaaS abstraction layer. OpenShift is in danger of falling into this trap. PaaS needs to remain focussed on the application as the unit of currency, and not the container or virtual machine. It looks entirely possible (and would make an interesting project) to run Cloud Foundry inside OpenShift, illustrating the conceptual difference.
We settled on using distributed Cloud Foundry instances across diverse IaaS providers to deliver the project; it was a great success. I blogged about it for Cloud Foundry’s blog.
CloudFoundry.com blog post – UK Charity Raises Record Donations Powered by Cloud Foundry
I’ve remained supportive of RedHat’s efforts to deliver a PaaS solution but fear they’re not quite there yet. I organised a London PaaS User Group meetup to help RedHat to put their side of the case across to the London community. It sounded like they had some exciting developments in the pipeline but even with the new features it’s likely we would have been unable to deliver the enterprise-grade services our project required.
Perhaps Redhat’s customer base are largely systems administrators rather than developers. Perhaps Redhat have more experience at deploying and managing servers than applications. For whatever reason, I think it would be a denigration of PaaS to allow it to be misconstrued as container-based IaaS. Containers can be a by-product of service provision but should not be the focus of PaaS.
Cloud Foundry isn’t perfect but, at the moment, is the only PaaS product I’d recommend to anyone looking to make a long-term investment.
]]>I’ve always aimed to make system builds reproducible, but with little success. Gem, pear, pecl, rpm, license agreements, configure/make/make install: they all take their toll. This can lead to inconsistent builds between environments – or even in a single tier – due to scaling up/down.
As I’ve tended to use RPM-based systems (misspent youth), I’ve attempted, wherever possible, to get all non-configuration files on a server into RPMs. I’ve been more promiscuous with configuration management, moving from home grown, to Cfengine, via Puppet, to Chef. I’m currently using chef-solo, with tooling such as Noah and MCollective for orchestration. Don’t even mention the number of deployment/ALM tooling solutions I’ve been through(although Capistrano has never annoyed me to any great extent).
Even with long term usage of RPMs, build reproducibility has been far from simple. RH Satellite/Spacewalk should make this easy, but unfortunately it’s a bloated mess. I’ve usually resorted to simple apache/createrepo, but this poses its own problems. Do you have a repo per environment? How do you track which servers were built against which repo? How do you roll out updates in a manageable fashion?
I’ve created a simple setup called Yumtags! to address some of these issues. The basic idea is that you can drop RPMs in to a directory, and then “freeze” the directory at that point in time by creating and storing repository metadata against a tag. This tag can then be used, perhaps in a chef-solo-driven repository definition, to update, build, and reproduce systems in a known state. It currently features simple JSON-driven integration for CI systems, so RPM-based integration pipelines can be easily automated. There’s a million and one things missing from it, but now it does the basic story I’ve shared it for others to hack on.
]]>Monitoring sucks.
Following the “if it hurts, do it more often” mantra that has driven the success of patterns such as Continuous Delivery, there might be some value in jumping head-first into the world of monitoring.
I’ve been evangelising to anyone that will listen, and a lot of people that won’t, about declarative/convergent frameworks for some time now. Sometimes you have to describe how to converge (definitions may not yet exist), so I particularly enjoy working with frameworks such as Chef that enable you to easily move from declarative to imperative as the need arises.
These two trains of thought(monitoring + system convergence) collided a while back to make me think about Monitoring-Driven Operations, i.e. that we can declare the status of the monitoring systems (that servers and services are available during expected hours) and converge the environment(s) on this desired state if there’s a gap between observed and desired states. MDO for operations, TDD for developers.
It turned out I wasn’t alone in thinking about monitoring from this perspective.
Reusing the application behaviours in production is a natural, logical extension of the testing pipeline. Were we able to ensure that the behaviours include non-functional requirements, would it be possible to use these monitored behaviours to converge an environment towards a state that passes all scenarios?
The missing link here is the imperative element. We can declare the desired state for an environment (cucumber-nagios is a good start), however we need a framework to express how we get there. I think Chef/Puppet can help with a lot of the hard labour here, but I don’t think that either, in their current formats, are appropriate to converge a service from a failed monitoring check.
Cloud Foundry(brilliant) uses a health manager, message bus, and cloud controller interacting to accomplish something similar. In this situation the cloud controller knows how to converge the state of the environment when the health manager observes a disparity between observed and desired states.
I’m thinking about developing a system that works in the following way(“The Escalator”). Please get in contact if you’ve got any feedback.
As examples: – If there is nothing in place (failed IaaS provider or a new system) – an entire new environment is built from scratch (escalation steps 1 through 6). – If a node fails: a new node is built, converged, and deployed to (steps 4 through 6). – If the application fails the smoke test, the application is redeployed (step 6).
Obviously it is up to Barry to ensure his escalation steps are sensible, i.e. use multiple IaaS providers, redeploy previous versions of applications if they persistently fail smoke tests. Each escalation step should declare a quiescence period, during which no further actions will be taken. There’s no point attempting to deploy an application if you have no nodes.
If an escalation process fails, the monitoring system could attempt a re-convergence if necessary or contact an administrator.
This overlaps significantly with a couple of discussions: here and here at the recent Opscode Community Summit, so perhaps someone else is already creating something similar with Chef.
]]>Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
As I’m from an operations background, I’ve always had great trouble communicating with the customer over requirements that have been relevant to my domain. The usual situation is that the “customer”, often the product owner, views the working i.e. functionally complete software on a developer’s machine, but bringing this to an operational service level is an afterthought (until it goes offline, and the real customers start complaining).
Traditionally, capturing these specifications has fallen under the umbrella of “non-functional requirements”. As Tom Sulston pointed out to me, this suggests requirements that aren’t working, which is precisely the opposite of what we’re attempting to express.
I’ve sought to tackle this in a couple of ways.
Spread FUD throughout the non-technical customer base.
Get the team in a room together and express “cross-cutting concerns” (I half-inched the idea from AOP) that span the project as a whole.
I haven’t been happy with the results of either approach, so I’d be interested to talk to anyone with a satisfactory solution in this space.
]]>Having the capability to deploy frequently is important for a variety of reasons; fast feedback, quickly realising value, reducing risk deltas, increasing confidence, and so on. However, I also think that frequent deployment is useless without making use of that feedback.
Continuous Deployment is an enabler of fast feedback, but it’s not the end goal. If the feedback isn’t utilised by product owners to inform their decisions, there’s little point in creating it. The practice becomes a local optimisation.
I’ve deliberately chosen to differentiate Continuous Delivery from Continuous Deployment here, as I believe Continuous Delivery implies that value is being delivered, whereas Continuous Deployment suggests focusing on deploying frequently.
We need to optimise cycle time for the whole business, not just the dev/ops/devops/<current silo label> team.
]]>