AWS Solutions Architect: Key Lessons from Certification

The practice exam results email arrived on a Sunday morning: 58%. Passing threshold: 72%. I’d been building on AWS for four years — EC2, Lambda, RDS, the usual suspects. I thought the Solutions Architect Associate exam would validate what I already knew. Instead, it exposed a gap between “I can make this work” and “I can design this correctly according to AWS’s worldview.”

Three weeks of focused study later, I passed SAA-C03 with 82%. The knowledge I gained wasn’t a list of services — it was a design vocabulary. Tradeoffs I could articulate in architecture reviews. Patterns I recognized in production incidents before they became outages. A structured way to evaluate every infrastructure decision against reliability, security, performance, cost, and operational excellence.

Whether you’re studying for the cert or just want the lessons without the exam fee, here’s what actually mattered.

The Well-Architected Framework: Your Design Lens

AWS organizes good architecture around six pillars. The exam tests all of them, but more importantly, they became how I evaluate every design decision:

Operational Excellence

Run and monitor systems to deliver business value, and continuously improve supporting processes and procedures.

What this means in practice: Infrastructure as Code (not clicking in the console), automated deployments, runbooks for common incidents, postmortems without blame. The exam loves CloudFormation, Systems Manager, and CloudWatch. Production loves them too — the team that clicks through the AWS Console to provision resources is the team that can’t reproduce their infrastructure after an outage.

Security

Protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.

Exam favorites: IAM least privilege, encryption at rest (KMS) and in transit (TLS), VPC security groups vs. NACLs, AWS WAF, Secrets Manager, GuardDuty, Security Hub.

Production lesson: The exam emphasizes defense in depth — multiple layers of security. One security group rule is not a security strategy. I redesigned our staging environment after studying this pillar and found three services with 0.0.0.0/0 ingress rules that “had always been there.”

See the AWS Well-Architected Framework for the full pillar descriptions and best practices.

Reliability

Recover from infrastructure or service disruptions, dynamically acquire computing resources, and mitigate disruptions like misconfigurations or transient network issues.

The patterns that appear constantly:

Multi-AZ deployments for databases and critical services
Auto Scaling groups with health checks
Elastic Load Balancing across availability zones
S3 cross-region replication for disaster recovery
Route 53 health checks with failover routing

The production incident that drove this home: A single-AZ RDS instance failed during an AWS maintenance window we hadn’t noticed in the notification email. Downtime: 47 minutes. After the exam prep, I knew Multi-AZ was the answer — not because I’d never heard of it, but because I understood the RTO/RPO tradeoffs well enough to justify the cost to leadership.

Performance Efficiency

Use computing resources efficiently to meet system requirements, and maintain that efficiency as demand changes and technologies evolve.

Key concepts:

Right-sizing instances (not always the biggest)
Caching layers (ElastiCache, CloudFront, DAX for DynamoDB)
Serverless for variable workloads (Lambda, Fargate)
Selecting appropriate database types (RDS vs. DynamoDB vs. ElastiCache vs. S3)
CDN for static and dynamic content acceleration

The exam trap: choosing the most powerful service when a simpler one fits. CloudFront + S3 beats EC2 serving static files. DynamoDB beats RDS for key-value access patterns at scale. Lambda beats EC2 for sporadic, short-duration tasks.

Cost Optimization

Avoid or eliminate unneeded cost or suboptimal resources.

Exam favorites:

Reserved Instances and Savings Plans for predictable workloads
Spot Instances for fault-tolerant, flexible workloads
S3 lifecycle policies (Standard → IA → Glacier)
Right-sizing with Compute Optimizer
Data transfer cost awareness (cross-AZ, cross-region, internet egress)

The real-world win: Studying cost optimization led me to audit our S3 buckets. $340/month in Standard storage for logs nobody had queried in eighteen months. Lifecycle policy to Glacier: $28/month. The exam pays for itself in one audit.

Sustainability

Minimize the environmental impacts of running cloud workloads. The newest pillar, lighter on the exam, but increasingly relevant.

Practical actions: Right-size to avoid idle compute, use Graviton instances (better performance per watt), schedule non-production environments to shut down overnight, use managed services (AWS optimizes utilization better than your dedicated server).

Core Services: Know Them Deep, Not Wide

The exam covers hundreds of services. Production uses maybe thirty regularly. Focus depth on the core:

Compute

Service	Use When
EC2	Full control, persistent workloads, specific OS/software needs
Lambda	Event-driven, short-duration, variable traffic
ECS/Fargate	Containerized apps without managing servers
Elastic Beanstalk	Quick deployment, managed platform (exam loves this for “simplicity”)

The question pattern: “Company needs X, Y, Z” — match requirements to service. If they say “no infrastructure management” → Lambda or Fargate. If they say “legacy application, full OS access” → EC2.

Storage

Service	Use When
S3	Object storage, static assets, data lakes, backups
EBS	Block storage for EC2 instances (databases, boot volumes)
EFS	Shared file storage across multiple EC2 instances
FSx	Specialized file systems (Windows, Lustre, NetApp, OpenZFS)

Exam trap: using EBS for shared access (it’s single-instance). Using S3 for database storage (it’s object, not block). Know the access patterns.

Database

Service	Use When
RDS	Relational data, SQL, ACID transactions, complex queries
Aurora	RDS but faster, more scalable, higher cost
DynamoDB	Key-value/document, massive scale, predictable single-digit ms latency
ElastiCache	Caching (Redis/Memcached), session stores
DocumentDB	MongoDB-compatible document database

The question I missed twice on practice exams: DynamoDB for relational queries with complex joins. DynamoDB is incredible for access patterns you design for. It’s terrible for ad-hoc SQL.

Networking

VPC — your private network. Subnets (public/private), route tables, internet gateways, NAT gateways
CloudFront — CDN. Cache content at edge locations globally
Route 53 — DNS. Routing policies: simple, weighted, latency-based, failover, geolocation
API Gateway — managed API layer for Lambda, HTTP, WebSocket APIs
Direct Connect / VPN — hybrid cloud connectivity

Know the difference: security groups (stateful, instance-level) vs. NACLs (stateless, subnet-level). The exam tests this repeatedly.

Architecture Patterns the Exam Loves

High Availability

Users → Route 53 (failover routing)
     → CloudFront (edge caching)
     → ALB (multi-AZ, health checks)
     → Auto Scaling Group (EC2 across AZs)
     → RDS Multi-AZ (automatic failover)
     → ElastiCache (session cache, multi-AZ)

Every component has a failover story. No single point of failure. This diagram answers 30% of exam scenarios.

Disaster Recovery Strategies

Strategy	RTO	RPO	Cost	Implementation
Backup & Restore	Hours	Hours	$	S3 backups, restore when needed
Pilot Light	10s of min	Minutes	$$	Minimal standby (DB replica, AMIs ready)
Warm Standby	Minutes	Seconds	$$$	Scaled-down full copy, scale up on failover
Multi-Site Active-Active	Near zero	Near zero		Full production in two regions

The exam gives you RTO/RPO requirements and expects you to pick the right strategy. “RPO of 1 hour, minimize cost” → Backup & Restore. “RTO of 5 minutes, business-critical” → Warm Standby or Pilot Light.

Decoupling with Messaging

Producer → SQS Queue → Consumer (Lambda/EC2)
Producer → SNS Topic → Multiple Subscribers
Producer → Kinesis Stream → Real-time processors
Producer → EventBridge → Event-driven architecture

When to use which:

SQS — task queues, one consumer per message, ordering not critical (or FIFO if it is)
SNS — pub/sub, fan-out to multiple consumers, notifications
Kinesis — real-time streaming, analytics, ordered processing at scale
EventBridge — event bus, schema registry, cross-account events, scheduled rules

The exam scenario pattern: “Application produces events, multiple services need to react independently” → SNS. “Process orders one at a time in order” → SQS FIFO. “Analyze clickstream data in real-time” → Kinesis.

Exam Strategy: How I Went From 58% to 82%

Reading Scenarios Correctly

Exam questions are 3-5 sentence scenarios. Read them twice. Identify:

Requirements — what must the solution do?
Constraints — cost, management overhead, compliance, existing infrastructure
Keywords — “least operational overhead” (managed services), “most cost-effective” (Spot, Reserved, S3 IA), “minimum latency” (CloudFront, ElastiCache, right region)

The trap: picking the technically best solution that violates a stated constraint. If they say “least operational overhead,” EC2 with custom autoscaling scripts loses to Elastic Beanstalk or Lambda, even if EC2 is more flexible.

Process of Elimination

Four answers. Usually two are clearly wrong, one is plausible but misses a constraint, one is correct. Eliminate the obviously wrong ones first:

DynamoDB for relational queries → eliminate
Single-AZ for “high availability” requirement → eliminate
On-premises solution for “migrate to cloud” scenario → eliminate

Then compare the remaining two against the specific constraints.

Time Management

65 questions, 130 minutes. ~2 minutes per question. Flag uncertain ones, answer everything first, review flagged ones. I spent too long on early questions during the practice exam and rushed the last fifteen.

What to Study

High ROI topics (appear frequently):

VPC design (subnets, gateways, endpoints, peering, Transit Gateway)
IAM (roles, policies, STS, cross-account access, permission boundaries)
S3 (storage classes, lifecycle, versioning, replication, presigned URLs)
RDS/Aurora (Multi-AZ, read replicas, backup, migration)
Lambda (triggers, concurrency, VPC integration, cold starts)
Route 53 (routing policies, health checks)
Disaster recovery strategies (RTO/RPO matching)
Well-Architected Framework pillars

Lower ROI (know basics, don’t deep-dive):

Individual ML services (SageMaker basics suffice)
Media services (Elastic Transcoder, etc.)
Older generation services being replaced
Service-specific configuration details

Study resources that worked for me:

AWS Skill Builder — free official courses
Adrian Cantrill’s SAA-C03 course — comprehensive, scenario-focused
Tutorials Dojo practice exams — closest to real exam format
Hands-on labs in your own AWS account — nothing replaces building

What the Cert Changed in My Daily Work

Beyond the badge:

Architecture reviews got faster. I recognize patterns — “this is a decoupling problem, SQS fits” — instead of debating abstractly.

Cost conversations got easier. I can articulate why Multi-AZ RDS costs more and what downtime costs the business. Finance understands tradeoffs when you speak in dollars.

Security audits got sharper. The defense-in-depth mindset from exam prep made me paranoid in useful ways. Found three staging security issues in the first month after certifying.

I stopped over-engineering. The exam teaches you to pick the simplest service that meets requirements. I used to default to Kubernetes. Now I default to “what’s the least operational overhead that works?”

I failed a design review confidently. A colleague proposed a multi-region active-active setup for an internal tool with 50 users. I could articulate why Pilot Light was sufficient, with specific RTO/RPO math. Before the exam, I would have nodded along.

Practical Takeaways

The AWS Solutions Architect Associate certification isn’t about memorizing services. It’s about learning to think in tradeoffs — reliability vs. cost, performance vs. simplicity, security vs. velocity.

If you’re studying:

Learn the Well-Architected Framework pillars — they’re the exam’s organizing principle
Practice scenario questions, not flashcards of service features
Build things in AWS — S3 static site, Lambda API, RDS with Multi-AZ failover
Focus on VPC, IAM, S3, RDS, Lambda — they dominate the exam
Read every answer choice against the scenario’s constraints, not just technical merit

If you’re not studying but want the knowledge:

Use the Well-Architected Framework for architecture reviews
Match disaster recovery strategy to actual RTO/RPO requirements
Default to managed services unless you have a specific reason not to
Audit costs quarterly — S3 lifecycle, right-sizing, Reserved Instances
Design for Multi-AZ before you need it, not after an outage

The practice exam score of 58% stung. The production architectures that improved afterward didn’t. The cert is a means, not an end — but the design vocabulary it teaches is worth the study hours even if you never sit for the exam.

AWS Solutions Architect lessons — December 2023. Exam content evolves; check the AWS Certification page for current SAA exam guide.