SYS-CON MEDIA Authors: Elizabeth White, Liz McMillan, William Schmarzo, Yeshim Deniz, Jason Bloomberg

Blog Feed Post

Why you should build an Immutable Infrastructure

Why you should build an Immutable Infrastructure – by Florian Motlik, CTO of Codeship

Some of the major challenges today when building infrastructure are predictability, scalability and automated recovery. A predictable system will promote the exact same artifact that you tested into your production system so no intermittent failure can cause any trouble. A scalable system make it trivial, especially automatically, to deal with any rise in traffic. And automated recovery will make sure your team can focus on building a better product and sleep during the night instead of maintaining infrastructure constantly.

At Codeship we’ve found that an Infrastructure made up of immutable components has helped us tremendously with these goals.

Julian Dunn from Chef recently released a blog post about their stance on immutable infrastructure.

Chad Fowler summed it up very well in a tweet

Instead of going over every piece of the article, I want to present an overview of the experience we – and others – have had in making parts of our infrastructure immutable.

What is Immutable Infrastructure

Immutable infrastructure is comprised of immutable components that are replaced for every deployment, rather than being updated in-place. Those components are started from a common image that is built once per deployment and can be tested and validated. The common image can be built through automation, but doesn’t have to be. Immutability is independent of any tool or workflow for building the images.

Its best use case is in a cloud or virtualized environment. While it’s possible in non-virtualized environments, the benefit doesn’t outweigh the effort.

State Isolation

The main criticism against immutable infrastructure – as stated in the Chef blog post – is that there is always state somewhere in the system and, therefore, the whole system isn’t immutable. That misses the point of immutable components. The main advantage when it comes to state in immutable infrastructure is that it is siloed. The boundaries between layers storing state and the layers that are ephemeral are clearly drawn and no leakage can possibly happen between those layers. There simply is no way to mix state into different components when you can’t expect them to be up and running the next minute.

Atomic Deployments and Validation

Updating an existing server can easily have unintended consequences. That’s why Chef, Puppet, CFEngine or other such tools exist – to take care of consistency across your infrastructure. A central system is necessary to manage the expected state of each server and to take action to ensure compliance. Deployment is not an atomic action but a transition that can go wrong and lead to an unknown state. This becomes very hard and complex to debug, as the exact state you are in is hard to know. Chef, Puppet or CFEngine are very complex systems as they have to deal with an overly complex problem.

Another solution to that problem is to build completely new images and servers that contain the application and the environment every time you want to deploy. In that case, the deployment doesn’t depend on the status the servers were in before, so the result is much more predictable and repeatable. Any third-party issues that may cause the deployment to fail can be caught by validating the new image and ensuring no production system was impacted. This one image can then be used to start any number of servers and switch atomically from the old machines to the new ones by changing the load balancer, for example.

There are of course downsides to rebuilding your images with every deployment. A full rebuild of the system takes a lot longer than simply updating and restarting the application. By layering your deployment you can optimize this, e.g. have a repository to build a base image and use that base image to just put in your application for the deployment image, but it will still be a slower process.

Another problem is that you introduce dependencies to third parties during deployment. If you install packages in the system and your apt repository is slow or down this can fail the deployment. While this could be a problem in a non immutable infrastructure as well you typically interact less with third party systems when you just push new code into an already provisioned system.

By deploying from a pre-provisioned base image and updating that base image regularly you can soften that problem, but it’s still there and might fail a deployment from time to time.

Building the automation currently still takes more time at the beginning of the project, as the tools for building immutable infrastructure are still new or need to be developed. It is definitely more investment in the beginning, but pays off immediately.

You can still use Chef, Puppet, CFEngine or Ansible to build your images, but as they aren’t built for an immutable infrastructure workflow they tend to be more complex than necessary.

Fast Recovery by preserving History

As all deployments are done by building new images, history is preserved automatically for rollback when necessary. The same process and automation that is used to deploy the next version can be used to roll back, which ensures the process of rolling back will work. By automating the creation of the images, you can even recreate historical images and branch off from earlier points in the history of the infrastructure.

Data schema changes are a potential problem, but that’s a general issue with rollbacks. Backwards compatibility and zero downtime deployments are a way to make sure this will work regardless of the changes.

Simple Experimentation

As you control the whole environment and application, any experiments with new versions of the language, operating system or dependencies are easy. With strict testing and validation in place, and the ability to roll-back if necessary, all the fear of upgrading any dependency is removed. Experimentation becomes an integral and trivial part of building your infrastructure.

Makes you collect your logs and metrics in a central location

With immutable components in place, it’s easy to simply kill a misbehaving server. While often errors are simply a product of the environment, for example a third party system misbehaving, and can be ignored, some will keep coming up. Not having access into the servers puts the right incentive on the team to collect and store logs and system metrics externally. This way, debugging can happen while the server is long gone.

If logs and metrics are missing to properly debug an issue, it’s easy to add more data collection to the infrastructure and replace all existing servers. Then once the error comes up again you can debug it fully from the data stored on an external system.

Conclusions

Immutable components as part of your infrastructure are a way to reduce inconsistency in your infrastructure and improve the trust into your deployment process. Atomic deployments, combined with validation of the image and easy rollback, make managing your infrastructure a lot easier.

It forces teams to silo data and expect failures that are inherent when building on top of a cloud infrastructure or when building systems in general. This increases resilience and trains you in a process to withstand any problems, especially in an automated fashion. Furthermore, it helps with building simple and independent components that are easy to deploy and scale.

And it’s not a theoretical idea. At Codeship, we’ve built our infrastructure this way for a long time. Heroku and other PaaS providers are built as immutable components and lots of companies – small and very large – have used immutability as a core concept of their infrastructure.

Tools like Packer have made building immutable components very easy. Together with existing cloud infrastructure they are a powerful concept to help you build better and safer infrastructure. Let me know in the comments if you have any questions or interesting insights to share.

Thanks

I got great feedback by the following people on this article. Thanks for taking the time and helping me to make it much clearer and simply better.

Links

Read the original blog entry...

More Stories By Manuel Weiss

I am the cofounder of Codeship – a hosted Continuous Integration and Deployment platform for web applications. On the Codeship blog we love to write about Software Testing, Continuos Integration and Deployment. Also check out our weekly screencast series 'Testing Tuesday'!

Latest Stories
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
Fact: storage performance problems have only gotten more complicated, as applications not only have become largely virtualized, but also have moved to cloud-based infrastructures. Storage performance in virtualized environments isn’t just about IOPS anymore. Instead, you need to guarantee performance for individual VMs, helping applications maintain performance as the number of VMs continues to go up in real time. In his session at Cloud Expo, Dhiraj Sehgal, Product and Marketing at Tintri, sha...
According to Forrester Research, every business will become either a digital predator or digital prey by 2020. To avoid demise, organizations must rapidly create new sources of value in their end-to-end customer experiences. True digital predators also must break down information and process silos and extend digital transformation initiatives to empower employees with the digital resources needed to win, serve, and retain customers.
In his session at 19th Cloud Expo, Claude Remillard, Principal Program Manager in Developer Division at Microsoft, contrasted how his team used config as code and immutable patterns for continuous delivery of microservices and apps to the cloud. He showed how the immutable patterns helps developers do away with most of the complexity of config as code-enabling scenarios such as rollback, zero downtime upgrades with far greater simplicity. He also demoed building immutable pipelines in the cloud ...
More and more companies are looking to microservices as an architectural pattern for breaking apart applications into more manageable pieces so that agile teams can deliver new features quicker and more effectively. What this pattern has done more than anything to date is spark organizational transformations, setting the foundation for future application development. In practice, however, there are a number of considerations to make that go beyond simply “build, ship, and run,” which changes how...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, will provide an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life ...
Smart Cities are here to stay, but for their promise to be delivered, the data they produce must not be put in new siloes. In his session at @ThingsExpo, Mathias Herberts, Co-founder and CTO of Cityzen Data, discussed the best practices that will ensure a successful smart city journey.
A look across the tech landscape at the disruptive technologies that are increasing in prominence and speculate as to which will be most impactful for communications – namely, AI and Cloud Computing. In his session at 20th Cloud Expo, Curtis Peterson, VP of Operations at RingCentral, highlighted the current challenges of these transformative technologies and shared strategies for preparing your organization for these changes. This “view from the top” outlined the latest trends and developments i...
When you focus on a journey from up-close, you look at your own technical and cultural history and how you changed it for the benefit of the customer. This was our starting point: too many integration issues, 13 SWP days and very long cycles. It was evident that in this fast-paced industry we could no longer afford this reality. We needed something that would take us beyond reducing the development lifecycles, CI and Agile methodologies. We made a fundamental difference, even changed our culture...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...
Rafay enables developers to automate the distribution, operations, cross-region scaling and lifecycle management of containerized microservices across public and private clouds, and service provider networks. Rafay's platform is built around foundational elements that together deliver an optimal abstraction layer across disparate infrastructure, making it easy for developers to scale and operate applications across any number of locations or regions. Consumed as a service, Rafay's platform elimi...