Docker is an Immature Technology on AWS ECS

I’ve been working on converting some backend API development environments from a vagrant VM-based platform onto docker containers, provisioned using a Chef configuration that also builds the EC2 instances that production services are deployed onto.

Originally, I intended to move containerised production deployments onto Amazon’s Elastic Container Service (ECS) but backed out when I realised how immature the technology is. I’m sure it won’t stay that way with the velocity that AWS is moving at, but right now ECS is under-developed.

I’d expected ECS to be similar to the Lambda platform, which allows functions to be pushed onto AWS for deployment in a hidden pool of compute resource that can scale on demand. I imagined we’d be pushing docker containers into a similar pool of essentially infinite resource for which we’d pay a simple usage fee.

Unfortunately that’s not how it is. On ECS you define and manage a pool of EC2 resource that containers are deployed onto.

For some use-cases this is probably not an impediment, but when you’re dealing with a large number of projects that rarely require more than a handful of servers, the requirement to manage both the container layer and the underlying compute resource layer is an overhead that surprised me.

I can’t wait for deployment of containers on the AWS platform to remove the need to manage compute resource beyond the simple specification of operating characteristics to service an expected workload.

Making Enterprise Architecture Relevant in the Land of Devops

In 5 Tips to Automate Your Cloud Infrastructure I mentioned automated compliance testing. This is the idea that once your entire solution (both infrastructure and software) is defined using artefacts under version control, then the way is open to checking those artefacts for compliance against an enterprise-wide set of rules… in an automated workflow… not in meetings with an “enterprise architecture team” who wield the fiery blame hammer to punish teams ignorant of the latest Powerpoint edict from on high.

That’s a bit of a harsh characterisation. Harsh, but unfortunately true for some of the large businesses I’ve worked with over the years, where it seems that the traditional enterprise architecture role has managed to avoid the magnetic pull towards automation that drives efficient solution delivery. Things are sometimes so toxic that EA is seen as an optional extra, or worse, an impediment for delivery teams.

And so there may be an unhealthy tension between delivery teams who see themselves as “actually doing stuff” and the EA team, which is often far removed from the concerns of project teams working with specific business units to automate their processes.

But there is a useful role for a central authority that understands a wide landscape and its history. This is the context in which the totality of IT solutions supports the business as efficiently as possible right now, with a view to the need to support a likely path of flight for the future.

The EA role can be a helpful force for project teams when it becomes integrated into the automated workflows that modern delivery uses. When it sits outside the automation of infrastructure build and deployment then it becomes a cumbersome obstacle, but what if compliance against the enterprise architecture was simply a part of the automated testing? A build that failed the EA tests would instantly make visible an issue that could be addressed immediately. The alternative of non-compliance going unnoticed until the next ‘architecture review board’ months in the future looks barbaric and not very helpful at all.

Of course this approach depends entirely on automation of the entire project delivery from infrastructure through build and deployment, which only a “cloud” approach (i.e. API-driven infrastructure) can bring.

How can compliance be automated? By building and maintaining common “recipes” (to use a Chef term) that projects use, which are regularly audited and tweaked by the EA function. By attaching metadata to project repositories that bespoke tools can scrape and analyse. By running tools such as Amazon Inspector and any one of the myriad third-party tools. By using AWS Config to ensure that infrastructure changes are compliant. And on it goes. We’ll return to this topic in the future.

This is a brave new world in which EA is a helpful enabler not a grumpy librarian. A world in which EA is intimitely bound to the project teams and their processes. An aid for projects, able to bring relief to stretched teams by identifying potential for reuse of existing solutions (infrastructure recipes, micro-services, APIs, components, COTS). And a source of compliance to ensure that teams are building value within the growing IT estate.

5 Tips to Automate Your Cloud Infrastructure

This is the one about how “moving to the cloud” isn’t the same as moving your infrastructure into someone else’s data-centre.

Managing physical infrastructure is difficult. All that buying and installing tons of physical kit and making sure it all continues to work. Inventory management, replacement cycles, engineers on call, physical security… and on it goes.

Hard work. And frankly, if your business is not in the business of managing physical hardware then the finance folk should make it very difficult to keep this commodity activity in-house rather than outsourcing it: what’s the ROI on managing physical kit? Your business is probably best focussed on whatever its unique domain is, as the resource-strapped startups running cloud-based workloads know very well.

But after landing on a public cloud, it’s easy to fall into the trap. The trap of sitting in front of the AWS Console and provisioning kit by clicking around a plethora of web pages. Create a server here, bring up a load balancer there, configure the SSL, upload a certificate, set up the ports and proxy rules, type in some domain names to hook it all together. SSH in and install the required dependencies. git pull the code down. Done?

Then the same thing for the next project a couple of weeks later. And again and again. But all the instances are slightly different and the naming conventions evolve over time. Until two years down the line there’s a big old muddle of resources that no one really understands.

So when Amazon sends you the email that says one of your servers has been retired you’re stuck trying to recreate the custom snowflake instance that evolved over time. Good luck. Or maybe you saved a machine image along the way, so some of the platform is still intact without having to remember how this one was built.

Just say no.

Because the real win that the public cloud platforms bring is the tooling to break the cycle of manual configuration management. Projects can be deployable by running one script that creates and provisions a solution via the cloud provider’s API to create infrastructure in the same way every time, provisioning the platform using a configuration management tool and pulling down a tagged version of the software. Quick, repeatable and safely under version control.

Let’s summarise with five summary tips that follow from adopting the public cloud:

  1. automate your infrastructure (automation is good, do things once centrally). See Chef, Puppet, Ansible, Cloudformation
  2. put your infrastructure under version control (software without version control is unthinkable; same for the infrastructure supporting the software solutions)
  3. make deployments repeatable (painless temporary environments)
  4. make deployments rapid and frictionless (keep delivery teams moving)
  5. automate compliance testing (banish ‘snowflake’ systems; more on this later)