AWS Solutions Architect: Key Lessons from Certification
The practice exam results email arrived on a Sunday morning: 58%. Passing threshold: 72%. I’d been building on AWS for four years — EC2, Lambda, RDS, the usual suspects. I thought the Solutions Architect Associate exam would validate what I already knew. Instead, it exposed a gap between “I can make this work” and “I can design this correctly according to AWS’s worldview.”
Three weeks of focused study later, I passed SAA-C03 with 82%. The knowledge I gained wasn’t a list of services — it was a design vocabulary. Tradeoffs I could articulate in architecture reviews. Patterns I recognized in production incidents before they became outages. A structured way to evaluate every infrastructure decision against reliability, security, performance, cost, and operational excellence.
Whether you’re studying for the cert or just want the lessons without the exam fee, here’s what actually mattered.
The Well-Architected Framework: Your Design Lens
AWS organizes good architecture around six pillars. The exam tests all of them, but more importantly, they became how I evaluate every design decision:
Operational Excellence
Run and monitor systems to deliver business value, and continuously improve supporting processes and procedures.
What this means in practice: Infrastructure as Code (not clicking in the console), automated deployments, runbooks for common incidents, postmortems without blame. The exam loves CloudFormation, Systems Manager, and CloudWatch. Production loves them too — the team that clicks through the AWS Console to provision resources is the team that can’t reproduce their infrastructure after an outage.
Security
Protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
Exam favorites: IAM least privilege, encryption at rest (KMS) and in transit (TLS), VPC security groups vs. NACLs, AWS WAF, Secrets Manager, GuardDuty, Security Hub.
Production lesson: The exam emphasizes defense in depth — multiple layers of security. One security group rule is not a security strategy. I redesigned our staging environment after studying this pillar and found three services with 0.0.0.0/0 ingress rules that “had always been there.”
See the AWS Well-Architected Framework for the full pillar descriptions and best practices.
Reliability
Recover from infrastructure or service disruptions, dynamically acquire computing resources, and mitigate disruptions like misconfigurations or transient network issues.
The patterns that appear constantly:
- Multi-AZ deployments for databases and critical services
- Auto Scaling groups with health checks
- Elastic Load Balancing across availability zones
- S3 cross-region replication for disaster recovery
- Route 53 health checks with failover routing
The production incident that drove this home: A single-AZ RDS instance failed during an AWS maintenance window we hadn’t noticed in the notification email. Downtime: 47 minutes. After the exam prep, I knew Multi-AZ was the answer — not because I’d never heard of it, but because I understood the RTO/RPO tradeoffs well enough to justify the cost to leadership.
Performance Efficiency
Use computing resources efficiently to meet system requirements, and maintain that efficiency as demand changes and technologies evolve.
Key concepts:
- Right-sizing instances (not always the biggest)
- Caching layers (ElastiCache, CloudFront, DAX for DynamoDB)
- Serverless for variable workloads (Lambda, Fargate)
- Selecting appropriate database types (RDS vs. DynamoDB vs. ElastiCache vs. S3)
- CDN for static and dynamic content acceleration
The exam trap: choosing the most powerful service when a simpler one fits. CloudFront + S3 beats EC2 serving static files. DynamoDB beats RDS for key-value access patterns at scale. Lambda beats EC2 for sporadic, short-duration tasks.
Cost Optimization
Avoid or eliminate unneeded cost or suboptimal resources.
Exam favorites:
- Reserved Instances and Savings Plans for predictable workloads
- Spot Instances for fault-tolerant, flexible workloads
- S3 lifecycle policies (Standard → IA → Glacier)
- Right-sizing with Compute Optimizer
- Data transfer cost awareness (cross-AZ, cross-region, internet egress)
The real-world win: Studying cost optimization led me to audit our S3 buckets. $340/month in Standard storage for logs nobody had queried in eighteen months. Lifecycle policy to Glacier: $28/month. The exam pays for itself in one audit.
Sustainability
Minimize the environmental impacts of running cloud workloads. The newest pillar, lighter on the exam, but increasingly relevant.
Practical actions: Right-size to avoid idle compute, use Graviton instances (better performance per watt), schedule non-production environments to shut down overnight, use managed services (AWS optimizes utilization better than your dedicated server).
Core Services: Know Them Deep, Not Wide
The exam covers hundreds of services. Production uses maybe thirty regularly. Focus depth on the core:
Compute
| Service | Use When |
|---|---|
| EC2 | Full control, persistent workloads, specific OS/software needs |
| Lambda | Event-driven, short-duration, variable traffic |
| ECS/Fargate | Containerized apps without managing servers |
| Elastic Beanstalk | Quick deployment, managed platform (exam loves this for “simplicity”) |
The question pattern: “Company needs X, Y, Z” — match requirements to service. If they say “no infrastructure management” → Lambda or Fargate. If they say “legacy application, full OS access” → EC2.
Storage
| Service | Use When |
|---|---|
| S3 | Object storage, static assets, data lakes, backups |
| EBS | Block storage for EC2 instances (databases, boot volumes) |
| EFS | Shared file storage across multiple EC2 instances |
| FSx | Specialized file systems (Windows, Lustre, NetApp, OpenZFS) |
Exam trap: using EBS for shared access (it’s single-instance). Using S3 for database storage (it’s object, not block). Know the access patterns.
Database
| Service | Use When |
|---|---|
| RDS | Relational data, SQL, ACID transactions, complex queries |
| Aurora | RDS but faster, more scalable, higher cost |
| DynamoDB | Key-value/document, massive scale, predictable single-digit ms latency |
| ElastiCache | Caching (Redis/Memcached), session stores |
| DocumentDB | MongoDB-compatible document database |
The question I missed twice on practice exams: DynamoDB for relational queries with complex joins. DynamoDB is incredible for access patterns you design for. It’s terrible for ad-hoc SQL.
Networking
- VPC — your private network. Subnets (public/private), route tables, internet gateways, NAT gateways
- CloudFront — CDN. Cache content at edge locations globally
- Route 53 — DNS. Routing policies: simple, weighted, latency-based, failover, geolocation
- API Gateway — managed API layer for Lambda, HTTP, WebSocket APIs
- Direct Connect / VPN — hybrid cloud connectivity
Know the difference: security groups (stateful, instance-level) vs. NACLs (stateless, subnet-level). The exam tests this repeatedly.
Architecture Patterns the Exam Loves
High Availability
Users → Route 53 (failover routing)
→ CloudFront (edge caching)
→ ALB (multi-AZ, health checks)
→ Auto Scaling Group (EC2 across AZs)
→ RDS Multi-AZ (automatic failover)
→ ElastiCache (session cache, multi-AZ)
Every component has a failover story. No single point of failure. This diagram answers 30% of exam scenarios.
Disaster Recovery Strategies
| Strategy | RTO | RPO | Cost | Implementation |
|---|---|---|---|---|
| Backup & Restore | Hours | Hours | $ | S3 backups, restore when needed |
| Pilot Light | 10s of min | Minutes | $$ | Minimal standby (DB replica, AMIs ready) |
| Warm Standby | Minutes | Seconds | $$$ | Scaled-down full copy, scale up on failover |
| Multi-Site Active-Active | Near zero | Near zero | \(\) | Full production in two regions |
The exam gives you RTO/RPO requirements and expects you to pick the right strategy. “RPO of 1 hour, minimize cost” → Backup & Restore. “RTO of 5 minutes, business-critical” → Warm Standby or Pilot Light.
Decoupling with Messaging
Producer → SQS Queue → Consumer (Lambda/EC2)
Producer → SNS Topic → Multiple Subscribers
Producer → Kinesis Stream → Real-time processors
Producer → EventBridge → Event-driven architecture
When to use which:
- SQS — task queues, one consumer per message, ordering not critical (or FIFO if it is)
- SNS — pub/sub, fan-out to multiple consumers, notifications
- Kinesis — real-time streaming, analytics, ordered processing at scale
- EventBridge — event bus, schema registry, cross-account events, scheduled rules
The exam scenario pattern: “Application produces events, multiple services need to react independently” → SNS. “Process orders one at a time in order” → SQS FIFO. “Analyze clickstream data in real-time” → Kinesis.
Exam Strategy: How I Went From 58% to 82%
Reading Scenarios Correctly
Exam questions are 3-5 sentence scenarios. Read them twice. Identify:
- Requirements — what must the solution do?
- Constraints — cost, management overhead, compliance, existing infrastructure
- Keywords — “least operational overhead” (managed services), “most cost-effective” (Spot, Reserved, S3 IA), “minimum latency” (CloudFront, ElastiCache, right region)
The trap: picking the technically best solution that violates a stated constraint. If they say “least operational overhead,” EC2 with custom autoscaling scripts loses to Elastic Beanstalk or Lambda, even if EC2 is more flexible.
Process of Elimination
Four answers. Usually two are clearly wrong, one is plausible but misses a constraint, one is correct. Eliminate the obviously wrong ones first:
- DynamoDB for relational queries → eliminate
- Single-AZ for “high availability” requirement → eliminate
- On-premises solution for “migrate to cloud” scenario → eliminate
Then compare the remaining two against the specific constraints.
Time Management
65 questions, 130 minutes. ~2 minutes per question. Flag uncertain ones, answer everything first, review flagged ones. I spent too long on early questions during the practice exam and rushed the last fifteen.
What to Study
High ROI topics (appear frequently):
- VPC design (subnets, gateways, endpoints, peering, Transit Gateway)
- IAM (roles, policies, STS, cross-account access, permission boundaries)
- S3 (storage classes, lifecycle, versioning, replication, presigned URLs)
- RDS/Aurora (Multi-AZ, read replicas, backup, migration)
- Lambda (triggers, concurrency, VPC integration, cold starts)
- Route 53 (routing policies, health checks)
- Disaster recovery strategies (RTO/RPO matching)
- Well-Architected Framework pillars
Lower ROI (know basics, don’t deep-dive):
- Individual ML services (SageMaker basics suffice)
- Media services (Elastic Transcoder, etc.)
- Older generation services being replaced
- Service-specific configuration details
Study resources that worked for me:
- AWS Skill Builder — free official courses
- Adrian Cantrill’s SAA-C03 course — comprehensive, scenario-focused
- Tutorials Dojo practice exams — closest to real exam format
- Hands-on labs in your own AWS account — nothing replaces building
What the Cert Changed in My Daily Work
Beyond the badge:
Architecture reviews got faster. I recognize patterns — “this is a decoupling problem, SQS fits” — instead of debating abstractly.
Cost conversations got easier. I can articulate why Multi-AZ RDS costs more and what downtime costs the business. Finance understands tradeoffs when you speak in dollars.
Security audits got sharper. The defense-in-depth mindset from exam prep made me paranoid in useful ways. Found three staging security issues in the first month after certifying.
I stopped over-engineering. The exam teaches you to pick the simplest service that meets requirements. I used to default to Kubernetes. Now I default to “what’s the least operational overhead that works?”
I failed a design review confidently. A colleague proposed a multi-region active-active setup for an internal tool with 50 users. I could articulate why Pilot Light was sufficient, with specific RTO/RPO math. Before the exam, I would have nodded along.
Practical Takeaways
The AWS Solutions Architect Associate certification isn’t about memorizing services. It’s about learning to think in tradeoffs — reliability vs. cost, performance vs. simplicity, security vs. velocity.
If you’re studying:
- Learn the Well-Architected Framework pillars — they’re the exam’s organizing principle
- Practice scenario questions, not flashcards of service features
- Build things in AWS — S3 static site, Lambda API, RDS with Multi-AZ failover
- Focus on VPC, IAM, S3, RDS, Lambda — they dominate the exam
- Read every answer choice against the scenario’s constraints, not just technical merit
If you’re not studying but want the knowledge:
- Use the Well-Architected Framework for architecture reviews
- Match disaster recovery strategy to actual RTO/RPO requirements
- Default to managed services unless you have a specific reason not to
- Audit costs quarterly — S3 lifecycle, right-sizing, Reserved Instances
- Design for Multi-AZ before you need it, not after an outage
The practice exam score of 58% stung. The production architectures that improved afterward didn’t. The cert is a means, not an end — but the design vocabulary it teaches is worth the study hours even if you never sit for the exam.
AWS Solutions Architect lessons — December 2023. Exam content evolves; check the AWS Certification page for current SAA exam guide.