AWS Well-Architected Review Automation: Building Your Own Assessment Tools

AWS Well-Architected Review Automation: Building Your Own Assessment Tools

Understanding the AWS Well-Architected Framework (WAF)

The AWS Well-Architected Framework (WAF) is a set of best practices that guide customers in building secure, high-performing, resilient, efficient, and sustainable infrastructure for their applications. The WAF consists of six pillars:

Operational Excellence: Run and monitor systems, and continuously improve processes and procedures.
Security: Protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
Reliability: Ensure a workload performs its intended function correctly and consistently when it’s expected to.
Performance Efficiency: Use computing resources efficiently and maintain efficiency as demand changes.
Cost Optimization: Avoid unnecessary costs and achieve business outcomes at the lowest price point.
Sustainability: Minimize the environmental impacts of running cloud workloads.

Rationale for Building Custom Automation Tools

While AWS provides the Well-Architected Tool, custom automation extends its capabilities, allowing for deeper integration and organization-specific checks. By automating WAF assessments, you can:

Scale & Consistency: Ensure consistent application of best practices across numerous accounts and workloads.
Continuous Compliance & Feedback: Provide real-time insights and track progress over time.
Customization & Specificity: Enforce internal compliance standards or specific interpretations of WAF principles that go beyond generic AWS checks.

Core Components of a Custom Assessment Tool

Data Collection/Discovery: Identify and retrieve configuration and runtime data for AWS resources using technologies like:
- AWS Config
- AWS APIs/SDKs (e.g., Boto3 for Python)
- CloudFormation/CDK/Terraform State Files
- CloudTrail
Assessment Engine/Rule Logic: Evaluate collected data against WAF principles and custom rules using frameworks like Policy-as-Code, AWS Lambda, AWS Step Functions, and custom code.
Reporting & Visualization: Present findings clearly, prioritize issues, and track progress over time using technologies like:
- Amazon CloudWatch Dashboards
- Amazon Athena & Amazon QuickSight
- Custom Web UI
Notifications & Alerting: Inform relevant teams or individuals about critical findings using technologies like:
- Amazon SNS (email, SMS)
- AWS Chatbot (Slack, Chime)
- Custom integrations (JIRA, ServiceNow API calls)

Key AWS Services & Technologies Involved

Compute: AWS Lambda, AWS Fargate
Data Storage/Databases: Amazon S3, Amazon DynamoDB, AWS Config
Orchestration: AWS Step Functions
Messaging & Events: Amazon SQS, Amazon SNS, Amazon EventBridge (CloudWatch Events)
Monitoring & Logging: Amazon CloudWatch, AWS CloudTrail
Identity & Access Management: AWS IAM
Infrastructure as Code (IaC): AWS CloudFormation, AWS CDK, HashiCorp Terraform

Frameworks & Current Trends

Compliance-as-Code: Define compliance rules in code, making them versionable, testable, and deployable.
Policy-as-Code: Similar to Compliance-as-Code, but broader, covering security, operational, and cost policies.
Event-Driven Architecture: Use AWS EventBridge/CloudWatch Events to trigger assessments based on configuration changes or schedules.

Practical Code Examples

Example 1: Using AWS Lambda and Boto3 for WAF Checks

import boto3

lambda_handler = lambda event, context:
    # Get the S3 bucket name from the event
    bucket_name = event['Records'][0]['s3']['bucket']['name']

    # Check if the bucket is public
    s3 = boto3.client('s3')
    response = s3.get_bucket_policy(Bucket=bucket_name)
    if response['Policy'] == '':
        return {
            'statusCode': 400,
            'statusMessage': 'Bucket is publicly accessible'
        }
    else:
        return {'statusCode': 200, 'statusMessage': 'Bucket is not publicly accessible'}

Example 2: Using AWS Step Functions and AWS Lambda for WAF Checks

# Step function definition
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  AssessmentStepFunction:
    Type: 'AWS::StepFunctions::StateMachine'
    Properties:
      RoleArn: !GetAtt 'AssessmentRole.Arn'
      ExecutionStartToCloseTimeout: 3600
      StateMachineType: 'Standard'

States:
  - Name: 'CheckS3BucketPolicy'
    Type: 'Task'
    Resource: !GetAtt 'LambdaFunction.ARN'
    Next: 'EvaluateResult'

  - Name: 'EvaluateResult'
    Type: 'Choice'
    Choices:
      - StringEquals: '{ "statusCode": 200 }'
        Next: 'BucketNotPublic'
      - StringEquals: '{ "statusCode": 400 }'
        Next: 'BucketIsPublic'

  - Name: 'BucketNotPublic'
    Type: 'Pass'
    Next: 'End'

  - Name: 'BucketIsPublic'
    Type: 'Fail'
    Error: 'The bucket is publicly accessible'

Outputs:
  - !GetAtt 'AssessmentRole.Arn'

Real-World Example

A large e-commerce company uses AWS to host its web application. The company has a complex tagging system for cost allocation, and it wants to ensure that all S3 buckets have the correct tags applied. A custom WAF assessment tool is built using AWS Lambda and Boto3 to scan S3 buckets and check their tags against the desired configuration.

Best Practices

Start small & iterate: Begin with a few high-impact WAF areas and expand gradually.
Version Control: Store all assessment code, rules, and configurations in Git.
Modularity: Design the tool with modular components for easy updates and maintenance.
Clear Documentation: Document custom rules, their rationale, and remediation steps.
Regular Review & Refinement: Periodically review the rules and the tool’s effectiveness.

Troubleshooting

Common issue: False positives due to overly aggressive rules
- Solution: Adjust rule logic or add exceptions

This post provides a comprehensive guide on building custom WAF assessment tools using AWS services and technologies. By following this example, you can automate WAF reviews and ensure continuous compliance with your organization’s standards.

Discover more from Zechariah's Tech Journal

Subscribe to get the latest posts sent to your email.