The AWS Well-Architected Framework Guide

July 12, 2023 TH Author

Not so easy, huh? Luckily AWS has created several reports on the Well-Architected Framework to explain cloud architectural design principals that can help guide you through the process. For example, in the case of an Amazon S3 bucket, you need to remember to disallow public read access, ensure logging is enabled, use customer-provided keys to ensure encryption is on, and so on.

With so many cloud services and resources, it’s tough to remember what to do and what configurations should be there. However, as you can see from links to articles on infrastructure configuration, Trend Micro has lots of information about what should be done to build cloud architecture to best practice levels. The Trend Cloud One™ – Conformity Knowledge Base contains 1,000 best practice articles to help you understand each cloud best practice, how to audit, and how to remediate the misconfiguration.

Cloud infrastructure misconfiguration automation
Automation is an essential step towards minimizing the risk of a breach, always scanning and providing feedback to stay ahead of threat actors. For those building in the cloud, having an automated tool to continuously scan your cloud infrastructure for misconfigurations is a thing of beauty. It can ensure you are always complying with those 1,000 best practices, without the heavy lifting.

If you’d like to be relieved from manually checking for adherence to well-architected design principals, sign up for a free trial of Conformity.

Alternately, if you’d to see how well-architected your infrastructure is, check out the free guided public cloud risk self-assessment to get personalized results in minutes.

The Six Pillars of a Well-Architected Framework
Conformity and its Knowledge Base are built around the AWS Well-Architected Framework, which is defined by six pillars:

Operational Excellence: Running and monitoring systems
Security: Protecting information and systems
Reliability: Ensuring a workload performs as it should
Performance Efficiency: Efficient use of IT
Cost Optimization: Avoiding unnecessary costs
Sustainability: Environmental impacts

Each of these pillars has its own set of design principals, which are extremely useful for evaluating your architecture and determining if you have implemented design principles that allow you to scale over time.

1. Operational Excellence Pillar

Starting with the Operational Excellence pillar, creating the most effective and efficient cloud infrastructure is a natural goal. So, when creating or changing the infrastructure, it is critical to follow the path of best practices outlined in the AWS Operational Excellence pillar.

The Operational Excellence pillar focuses on two business objectives:

Running workloads in the most efficient way possible.
Understanding your efficiency to be able to improve processes and procedures on an ongoing basis.

To achieve these objectives, there are five critical design principles can be utilized:

Perform operations as code, so you can apply engineering principles to your entire cloud environment. Applications, infrastructure, and so on, can all be defined as code and updated as code.
Make frequent, small, reversible changes, as opposed to large changes that make it difficult to determine the cause of the failure—if one were to occur. It also requires development and operations teams to be prepared to reverse the change that was just made in the event of a failure.
Refine operations procedures frequently by reviewing them with the entire team to ensure everyone is familiar with them and determine if they can be updated.
Anticipate failure to ensure that the sources of future failures are found and removed. A pre-mortem exercise should be conducted to determine how things can go wrong to be prepared.
Learn from all operational failures and share them across all teams. This allows teams to evolve and continue to increase procedures and skills.

CI/CD is good, but to ensure operational excellence, there must be proper controls on the process and procedures for building and deploying software, which include a plan for failure. It is always best to plan for the worst, and hope for the best. So if there is a failure, we will be ready for it.

2. Security Pillar

With data storage and processing in the cloud, especially in today’s regulatory environment, it is critical to ensure we build security into our environment from the beginning.

There are several critical design principles that strengthen our ability to keep our data and business secure, however, here are the seven recommended based on the Security pillar:

Implement a strong identity foundation to control access using core security concepts, such as the principle of least privilege and separation of duties.
Enable traceability through logging and metrics throughout the cloud infrastructure. It is only with logs that we know what has happened.
Apply security at all layers throughout the entire cloud infrastructure using multiple security controls with defense in depth. This applies to compute, network, and storage.
Automate security best practices to help scale rapidly and securely in the cloud. Utilizing controls managed as code in version-controlled templates makes it easier to scale securely.
Always protect data in transit and at rest, using appropriate controls based on sensitivity. These controls include (but not limited to) access control, tokenization, and encryption.
Keep people away from data to reduce the chance of mishandling, modification, or human error.
Prepare for security events by having incident response plans and teams in place. Incidents will occur and it is essential to ensure that a business is prepared.

Five areas to configure in the cloud to help achieve a well-architected infrastructure
There are several security tools that enable us to fulfill on the design principles, above. AWS has broken security into five areas that we should configure in the cloud:

Identity and access management (IAM), which involves the establishment of identities and permissions for humans and machines. It is critical to manage this through the lifecycle of the identity.
Detection of an attack. A common challenge faced by businesses, detection attacks often arise from user error. Enablement of logging features, as well as the delivery of those logs to the SIEM is essential. Once the SIEM has detected a malicious action, alerts should be sent out.
Infrastructure protection of the network and the compute resources is critical. This is done through a variety of tools and mechanisms. This comprises of either infrastructure tools or code protection, included (but not limited to) virtual private clouds (VPCs), code review, vulnerability assessments, gateways, firewalls, load balancers, hardening, and code signing.
Data protection in transit and at rest is also critical. This is primarily done with IAM and encryption. Most discussions of encryption review what algorithms are used and the key size. The most important piece to discuss, in relationship to encryption, is the location of the key and who has control over it. It is also important to determine the authenticity of the public key certificates.
Incident response is the ability to respond immediately and effectively when an adverse agent occurs. As the saying goes “failing to plan is planning to fail,” not having incident responses planned and practiced can lead to a costly incident.

What is essential to remember is that security of a cloud ecosystem is a split responsibility. AWS has defined where responsibility lies with them versus where it lies with the consumer. It is good to review the AWS shared responsibility models to ensure you are upholding your end of the deal.

3. Reliability Pillar

Reliability is important for any IT-related system, as it must provide the services that users and customers need, when they need it. Reliability involves understanding the level of availability that your business requires from any given system.

When it comes to the Reliability pillar, AWS has defined five critical design principles:

Automatically recover from failure. Depending on business needs, it may be essential that there are automated recovery controls in place, as the time it takes a human to intervene may be longer than a business can tolerate.
Test recovery procedures. Backing up the data from an Amazon S3 bucket is good first step, but the process is not complete until the restoration procedure is verified. If the data cannot be restored, then it has not been successfully backed up.
Scale horizontally to increase aggregate workload availability as an alternate way to envision a cloud infrastructure. If the business is using a virtual machine with a large amount of CPU capacity to handle all user requests, you may want to consider breaking it down into multiple, smaller virtual machines that are load balanced. That way, if a machine failed, the impact is not a denial of service, and if planned well, the users may never know there was a failure.
Stop guessing capacity. Careful planning and capacity management is critical to the reliability of an IT environment and may save you money where you are spending on unnecessary capacity needs.
Manage change and automation so alternations to the cloud do not interfere with the reliability of the infrastructure. Change management is core to ITIL. Changes should not be made unless they are planned, documented, tested, and approved. There must also be a backup plan for if/when a change breaks your environment.

With availability being at the core of this pillar, it is good to understand its definition. AWS defines availability as:

You May Also Like

DevOps vs SRE: Differences & Similarities

Application Security 101

Virtual Patching 101