Best practice in cloud architecture

All organisations leveraging the cloud need to ensure that its operation is best practice to get the most business benefit. Luckily, there are well-defined roadmaps to achieve this.

But it starts with a holistic view. It’s a mistake to consider cloud architecture purely via a technical lens. Instead organisations conducting a thorough review need to look at their entire operations – what does the business do day-to-day? What are all of its business units and goals in the short, medium and long term? What problems is it facing and what successes has it had? It is useful to get C suite buy-in here, though this can sometimes be challenging if an organisation has CEOs (chief executive officers) and CFOs (chief financial officers) who see the exercise as a technical one, the domain solely of the CTO (chief technical officer) or infrastructure manager.

Once this review is complete, however, it’s possible to drill down into the individual components. Greg Cockburn, Principal Practice Lead, AC3, recommends a measured approach. “Both Azure and AWS suggest that you do a review on a per-workload basis and that you don’t make it too big,” he says, “because otherwise you can be missing the weeds for the forest.”

It’s also possible to streamline actions in some areas if the processes are the same across the different business units. Incident response, for example, may require similar processes across the organisation and, regardless of which unit deploys an application, it will be funnelled to the same internal incident management team. Therefore, the answers given to a review from one unit can be employed for all similar workloads.

The Pillars

Once the scope of the cloud architecture review has been decided, there are a number of pillars that need to be considered, including:

Reliability

Organisations need to know how robust its cloud architecture will be in the face of unexpected disruptions, so this pillar covers such areas as disaster recovery (DR) planning, and includes testing and running scenarios (if human interaction is required) or automated processes where possible, using the services and components that will automatically fail over or just running multiple systems in a highly available scenario.

Security

All the traditional elements in the security space need to be defined, measured and monitored – identity and access management (keeping a careful check on the removal of any users who no longer require access), confidentiality, data integrity, encryption, good password policies and two- or multi-factor authentication. General system hygiene includes such areas as code reviews, system patching and checking for vulnerabilities in any in-house libraries if the organisation is developing its own applications. Clear and comprehensive logging means that, if an incident does occur, as much information as possible can be handed over to the authorities to enable a better understanding of what has taken place.

Performance Efficiency

Many services can scale out and up or simple automation can be implemented to do this. But this pillar also relates to monitoring and operation. It’s important not to monitor for the threshold of a CPU, but to consider actual usage. Buying an instance that is two or three times larger than what is needed is simply a waste of money and resources. Instead it’s advisable to build applications with scale sets or autoscaling groups, so that if further capacity is required, scaling up is easy. Performance efficiency has the added benefit of offering extra reliability too.

Cost Optimisation

There are a few key things to consider in this pillar, such as ensuring that applications and processes aren’t left running needlessly. Classic examples of this are development environments. Developers don’t tend to work 24/7, so their environments don’t need to either. A smart way to optimise this is to put the environment into a cloud formation template, giving the user the ability to spin it up on demand. It’s even possible to build in automated teardowns.

Then there are such considerations as unattached EBS volumes that should be eliminated. It’s a simple process to build policies in Amazon S3, for example and, if objects are not being accessed, move them into Amazon Glacier.

Keeping on top of new developments, capabilities and services can also influence cost optimisation as users are able to ensure they are getting the best use of their resources.

Operational Excellence

This pillar focuses on monitoring and observability, incident response, automation and the processes around looking after the system. Many organisations may find that much of this is learned by osmosis within teams, but a structured approach that documents and monitors key metric objectives from an infrastructure, application or business perspective gives clear parameters and goals to work towards. If a stated goal is defined, it can then be measured and ultimately improved. Even small companies can pick 10 key objectives to write down that can then be referred back to by all stakeholders.

It should be noted that during a review each of these pillars needs to be examined and weighed up individually regarding its importance to the organisation and its business goals and requirements. Identifying priorities can see play-offs between the different pillars – security against cost optimisation, for example.

Another key reason for implementing a robust cloud architecture review program is that the cloud is always evolving. “It moves so fast that everybody is head down, concentrating on what they’re doing and they’re not listening,” says Cockburn.