AWS Well-Architected Framework

Mer av detta

Layered Application

av Daniel Sima

Tissue Types of the Body

av Sarah Chapman

The Hate U Give Maverick

av Amroop Bains - Jean Augustine SS (2612)

First Round of Bridge Auction

av Big {hl98}L

Main topic

Cost Optimization Pillar

COST 10. How do you manage and/or consider the adoption of new services?

Meet regularly with your AWS Solutions Architect, Consultant, or Account Team, and consider which new services or features you could adopt to save money.

COST 9. Did you consider data-transfer charges when designing your architecture?

Balance the data transfer costs of your architecture with your high availability (HA) and reliability needs.

Analyze the situation and use AWS Direct Connect to save money and improve performance.

Architect to optimize data transfer (application design, WAN acceleration,etc.).

Use a CDN

COST 8. Do you decommission resources that you no longer need or stop resources that are temporarily not needed?

Reconcile decommissioned resources based on either system or process.

Have a process in place to identify and decommission orphaned resources.

Design your system to gracefully handle instance termination as you identify and decommission non-critical or unrequired instances or resources with low utilization.

COST 7. How are you monitoring usage and spending?

Finance driven charge back method Use this to allocate instances and resources to cost centers (e.g., tagging).

Use AWS Cost Explorer

Notifications Let key members of our team know if our spending moves outside well-defined limits

Monitoring Monitor usage and spend regularly using Amazon CloudWatch or a third-party provider (examples: Cloudability, CloudCheckr)

Cost-efficient architecture Have a plan for both usage and spending (per unit – e.g., user, gigabyte of data)

Review Detailed Billing Reports Have a standard process to load and interpret the Detailed Billing Reports.

Tag all resources To be able to correlate changes in your bill to changes in our infrastructure and usage.

COST 6. What access controls and procedures do you have in place to govern AWS usage?

Track project lifecycle Track, measure, and audit the life cycle of projects, teams, and environments to avoid using and paying for unnecessary resources.

Establish groups and roles (Example: Dev/Test/Prod); use AWS governance mechanisms such as IAM to control who can spin up instances and resources in each group. (This applies to AWS services or third-party solutions.)

COST 5. Are there managed services (higher-level services than Amazon EC2, Amazon EBS, Amazon S3) you can use to improve your ROI?

Consider AWS CloudFormation, AWS Elastic Beanstalk, or AWS Opsworks Use AWS CloudFormation templates / AWS Elastic Beanstalk /AWS OpsWorks to achieve the benefits of standardization and cost control.

Consider other application level services Use Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), Amazon Simple Email Service (SES) where appropriate

Consider appropriate databases Use Amazon Relational Database Service (RDS) (Postgres, MySQL, SQL Server, Oracle Server) or Amazon DynamoDB (or other key-value stores, NoSQL alternatives) where it’s appropriate.

Analyze Services Analyze application-level services to see which ones you can use.

COST 4. Have you selected the appropriate pricing model to meet your cost targets?

Consider Cost Factor costs into region selection.

Automated Action Have your architecture allows you to turn off unused instances (e.g., use Auto Scaling to scale down during non-business hours).

Sell Reserved Instances As your needs change, sell Reserved Instances you no longer need on the Reserved Instances Marketplace, and purchase others.

Analyze Usage Regularly analyze usage and purchase Reserved Instances accordingly.

Spot Use Spot instances for select workloads.

COST 3. Have you selected the appropriate resources to meet your cost targets?

Profiled Applications Profile your applications so you know when to use which type of Amazon EBS (magnetic, general purpose (SSD), provisioned IOPS). Use EBS-Optimized instances only when necessary.

Custom Metrics Load custom memory scripts and inspect memory usage using CloudWatch.

Amazon CloudWatch Use CloudWatch to determine processor load.

Third-party products For example, use third-party products such as CopperEgg or New Relic to determine appropriate instance types.

Match instance profile based on need For example, match based on workload and instance description –compute, memory, or storage intensive

COST 2. How are you optimizing your usage of AWS services?

Service-specific optimizations Examples - include minimizing I/O for Amazon EBS; - avoiding uploading too many small files into Amazon S3; using Spot instances extensively for Amazon EMR; etc.

COST 1. How do you make sure your capacity matches but does not substantially exceed what you need?

Appropriately provisioned Appropriately provision throughput, sizing, and storage for services such as Amazon DynamoDB, Amazon EBS (provisioned IOPS), Amazon RDS, Amazon EMR, etc.

Time-based approach Examples: follow the sun, turn off Dev/Test instances over the weekend, follow quarterly or annual schedules (e.g., Black Friday).

Queue-based approach Run your own Amazon Simple Queue Service (SQS) queue and spin up/shut down instances based on demand.

Demand-based approach Use Auto Scaling to respond to variable demand.

The ability to avoid or eliminate unneeded cost or suboptimal resources

Performance Efficiency Pillar

PERF 16. How do you ensure the proximity and caching solutions you have matches demand?

Planned Plan future proximity or caching solutions based on metrics and/or planned events.

Monitor Monitor cache usage and demand over time.

Periodic Review Review cache usage and demand over time.

PERF 15. How do you monitor your proximity and caching solutions to ensure performance is as expected?

Amazon CloudWatch monitoring Use CloudWatch to monitor instances

Third-party monitoring Use third-party tools to monitor systems.

Alarm-based notifications Plan for your monitoring systems to automatically alert you if metrics are out of safe bounds

PERF 14. How do you ensure you continue to have the most appropriate proximity and caching solutions as new solutions are launched?

Review Cyclically reselect a new instance type and size based on predicted resource needs.

Benchmarking After each new instance type is released, carry out a load test of a known workload on AWS, and use that to estimate the best selection.

Load Test After each relevant new instance type is released deploy the latest version of the system on AWS, use monitoring to capture performance metrics, and then select based on a calculation of performance/cost.

Proactive Monitoring– Amazon Cloud Watch monitoring Use Amazon CloudWatch to monitor proximity and caching solutions.

Proactive Monitoring–Third-party monitoring Use third-party tools to monitor proximity and caching solutions.

Alarm-based notification Plan for your monitoring systems to automatically alert you if metrics are out of safe bounds.

Trigger-based actions Plan for alarms to cause automated actions to remediate or escalate issues.

PERF 13. How do you select the appropriate proximity and caching solutions for your system?

Policy/Reference Architecture Select instance type and size based on predicted resource need based on an internal governance standard.

Cost/Budget Selecting instance type and size based on predicted resource need based on internal cost controls

Benchmarking Load test a known workload on AWS and use that to estimate the best selection; testing a known performance benchmark vs. a known workload

Guidance from AWS or from an APN Partner Select a proximity and caching solution based on best practice advice.

Load Test Deploy the latest version of your system on AWS using different instance types and sizes, use monitoring to capture performance metrics, and then make a selection based on a calculation of performance/cost.

PERF 12. How do you ensure the capacity and throughput of your databases matches demand?

Planned Plan for future capacity and throughput based on metrics and/or planned events.

Automated Automate against metrics.

PERF 11. How do you monitor your databases to ensure performance is as expected?

Amazon CloudWatch monitoring Use CloudWatch to monitor databases

Third-party monitoring Use third party tools to monitor databases

Periodic review Periodically review your monitoring dashboards

Alarm-based notifications Plan to have your monitoring systems automatically alert you if metrics are out of safe bounds.

Trigger-based actions Plan to have alarms cause automated actions to remediate or escalate issues.

PERF 10. How do you ensure that you continue to have the most appropriate database solution and features as new database solution and features are launched?

Review Cyclically reselect new instance type and size based on predicted resource need.

Load Test After each relevant new instance type is released, - deploy the latest version of the system on AWS, - use monitoring to capture performance metrics, - and then make a selection based on a calculation of performance/cost

PERF 9. How do you select the appropriate database solution for your system?

Policy/Reference Architecture Select instance type and size based on predicted resource needs based on an internal governance standard.

Cost/Budget Select instance type and size based on predicted resource needs based on internal cost controls

Benchmarking Load test a known workload on AWS and use that to estimate the best selection – testing a known performance benchmark vs. a known workload.

Guidance from AWS or from an APN Partner Select a solution based on best practice advice

PERF 8. How do you ensure that the capacity and throughput of your storage solutions matches demand?

PERF 7. How do you monitor your storage solution to ensure it is performing as expected?

PERF 6. How do you ensure that you continue to have the most appropriate storage solution as new storage solutions and features are launched?

PERF 5. How do you select the appropriate storage solution for your system?

PERF 4. How do you ensure that the quantity of your instances matches demand?

PERF 3 How do you monitor your instances post-launch to ensure they are performing as expected?

PERF 2 How do you ensure that you continue to have the most appropriate instance type as new instance types and features are introduced?

PERF 1 How do you select the appropriate instance type for your system?

The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.

Reliability Pillar

REL 9 How are you planning for recovery?

Automated Recovery Implemented Use AWS and/or third-party tools to automate system recovery.

DR Tested and Validated Regularly test failover to DR to ensure RTO and RPO are met.

Service Limits Request an increase of service limits with the DR site to accommodate a failover.

Configuration Drift Ensure that Amazon Machine Images (AMIs) and the system configuration state are up-to-date at the DR site/region.

Disaster Recovery Establish a DR strategy.

Objectives Defined Define RTO and RPO.

REL 8 How does your system withstand component failures?

Notification Plan to receive notifications of any significant events.

Monitoring Continuously monitor the health of your system

Auto Healing Use automated capabilities to detect failures and perform an action to remediate.

Multi-AZ /Region Distribute applications across multiple Availability Zones /regions.

Load Balancing Use a load balancer in front of a pool of resources.

REL 7 How are you backing up your data?

Periodic Recovery Testing Validate that the backup process implementation meets RTO and RPO through a recovery test.

Backups are Secured and/or Encrypted See the AWS Security Best Practices whitepaper.

Automated Backups Use AWS features, AWS Marketplace solutions, or third-party software to automate backups.

Data is Backed Up Back up important data

using

Third-party software to meet RPO

Amazon EBS snapshots

Amazon S3,

REL 6. How are you executing change management?

Change Management Automated Automate deployments /patching.

REL 5. How are you monitoring AWS resources?

Monitoring Monitor your applications with Amazon CloudWatch or third-party tools.

Notification Plan to receive notifications when significant events occur.

Automated Response Use automation to take action when failure is detected, e.g., to replace failed components.

Review Perform frequent reviews of the system based on significant events to evaluate the architecture.

REL 4. How does your system adapt to changes in demand?

Load Test Adopt a load testing methodology to measure if scaling activity will meet application requirements.

Automated scaling Use automatically scalable services

e.g.

AWS Elastic Beanstalk

Amazon DynamoDB

Auto Scaling

Amazon CloudFront

Amazon S3

REL 3. Do you have an escalation path to deal with technical issues?

Leverage AWS Support APIs Integrate the AWS Support API with your internal monitoring and ticketing systems.

Planned Ongoing engagement /relationship with AWS Support or an APN Partner.

REL 2. How are you planning your network topology on AWS?

IP subnet allocation Individual Amazon VPC IP address ranges should be large enough to accommodate an application’s requirements including factoring in future expansion and allocation of IP addresses to subnets across Availability Zones.

Non-overlapping private IP ranges The use of your IP address ranges and subnets in your virtual private cloud should not overlap each other, other cloud environments, or your on-premises environments

Highly available connectivity to the system Highly available load balancing and/or proxy, DNS-based solution, AWS Marketplace appliances, etc.

Highly available connectivity to AWS Multiple DX circuits, multiple VPN tunnels, AWS Marketplace appliances.

REL 1. How are you managing AWS Service Limits for your account?

Be aware of fixed service limits Be aware of unchangeable service limits and architected around these.

Set up automated monitoring Implement tools, e.g., SDKs, to alert you when thresholds are being approached.

Monitor and manage limits Evaluate your potential usage on AWS, increase your regional limits appropriately, and allow planned growth in usage.

The ability of a system to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.

General Design Principles

Automate to make architectural experimentation easier

Automation allows you to create and replicate your systems at low cost (no manual effort). You can track changes to your automation, audit the impact, and revert to previous parameters when necessary.

Allow for evolutionary architectures

In a traditional environment, architectural decisions are often implemented as a static, one-time event, with a few major versions of a system during its lifetime. As a business and its context continue to change, these initial decisions may hinder the system’s ability to deliver changing business requirements. In the cloud, the capability to automate and test on demand lowers the risk of impact from design changes. This allows systems to evolve over time so that businesses can take advantage of new innovations as a standard practice.

Test systems at production scale

Because you can automate the creation of test environments that emulate your production configurations, you can carry out testing easily. You can also remove the test serialization that occurs in on-premises environments where teams have to queue to use the test resources.

Lower the risk of architecture change

In a traditional, non-cloud environment, it is usually cost-prohibitive to create a duplicate environment solely for testing. Consequently, most test environments are not tested at live levels of production demand. In the cloud, you can create a duplicate environment on demand, complete your testing, and then decommission the resources. Because you only pay for the test environment when it is running, you can simulate your live environment for a fraction of the cost of testing on premises.

Stop guessing your capacity needs

Eliminate guessing your infrastructure capacity needs. When you make a capacity decision before you deploy a system, you might end up sitting on expensive idle resources or dealing with the performance implications of limited capacity. With cloud computing, these problems can go away. You can use as much or as little capacity as you need, and scale up and down automatically.

Security Pillar

SEC 10 How are you capturing and analyzing AWS logs?

Operating system or third-party application logs.

Other AWS service-specific log sources.

Amazon CloudWatch logs.

Amazon S3 bucket logs.

Amazon Virtual Private Cloud (VPC) filter logs.

Elastic Load Balancing (ELB) logs.

AWS CloudTrail.

SEC 9 How are you protecting the integrity of the operating system on your Amazon EC2 instances?

Use of a solution from the AWS Marketplace or an APN Partner

Use of a custom AMI or configuration management tools (i.e., Puppet or Chef) that is secured by default.

Host-based intrusion detection controls are used for EC2 instances.

File integrity controls are used for EC2 instances.

SEC 8 How are you enforcing AWS service level protection?

Service-specific requirements are defined and used.

Resource requirements are defined for sensitive API calls, such as requiring MFA authentication and encryption.

Periodic auditing of permissions.

Separation of duties.

Credentials configured with the least privilege

SEC 7 How are you enforcing network and host-level boundary protection?

AWS Trusted Advisor checks are regularly reviewed.

Security testing is performed regularly.

Bastion host technique is used to manage the instances.

Private connectivity to a VPC is used (e.g., VPN, AWS Direct Connect, VPC peering, etc.)

Service-specific access controls are used (e.g., bucket policies).

Subnets and network ACLs are used appropriately.

Host-based firewalls with minimal authorizations are used

Trusted VPC access is via a private mechanism (e.g., Virtual Private Network (VPN), IPsec tunnel, AWS Direct Connect, AWS Marketplace solution, etc.).

The system runs in one or more VPCs.

Security groups with minimal authorizations are used to enforce rolebased access.

SEC 6 How are you managing keys and credentials?

AWS server-side techniques are used with AWS managed keys

example

Amazon S3 SSE

AWS Marketplace solution is being used. ex SafeNet, TrendMicro

Use AWS CloudHSM

Appropriate key and credential rotation policy is being used

SEC 5 How are you limiting automated access to AWS resources? (e.g., applications, scripts, and/or third-party tool or service)

OS-specific controls are used for EC2 instances

IAM user credential is used, but not hardcoded into scripts and applications

IAM roles for Amazon EC2

SEC 4 How are you defining roles and responsibilities of system users to control human access to the AWS Management Console and API?

Users, groups, and roles are clearly defined and granted only the minimum privileges needed to accomplish business requirements

A solution from the AWS Marketplace (e.g., Okta, Ping Identity) or from an APN Partner

IAM roles for cross-account access

Employee life-cycle policies are defined and enforced

AWS Security Token Service (STS)

Web Identity Federation

SAML integration

IAM users and groups

SEC 3 How are you protecting access to and use of the AWS root account credentials?

AWS Marketplace solution is being used.

There is a MFA hardware device associated with the AWS root account

The AWS root account credentials are only used for only minimal required activities.

SEC 2 How are you encrypting and protecting your data in transit?

AWS Marketplace solution is being used.

Private connectivity (e.g., AWS Direct Connect).

VPN based solution

SSL or equivalent is used for communication.

SSL enabled AWS APIs are used appropriately.

SEC 1 How are you encrypting and protecting your data at rest?

Best practices

A solution from the AWS Marketplace or from an APN Partner.

Data at rest is encrypted using client side techniques.

Data at rest is encrypted using AWS service specific controls

Example

Amazon EBS encrypted volumes

Amazon S3 SSE,

Amazon RelationalDatabase Service (RDS) Transparent Data Encryption (TDE)

AWS Well-Architected Framework

Layered Application

Tissue Types of the Body

The Hate U Give Maverick

First Round of Bridge Auction