AWS S3: Beyond Simple Storage

Most developers think of S3 as just a place to dump files. But S3 is a powerful building block for scalable architectures. After moving petabytes of data through S3, here are the patterns that transformed how we build cloud applications.

S3 Fundamentals Done Right

Bucket Naming Strategy

# Good naming conventions
company-app-production-assets
company-app-staging-logs
company-app-backups-2016

# Avoid
my-bucket
test123
prod

Create buckets programmatically:

import boto3

s3 = boto3.client('s3')

# Create bucket with proper configuration
s3.create_bucket(
    Bucket='mycompany-prod-assets',
    CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
)

# Enable versioning
s3.put_bucket_versioning(
    Bucket='mycompany-prod-assets',
    VersioningConfiguration={'Status': 'Enabled'}
)

# Enable encryption
s3.put_bucket_encryption(
    Bucket='mycompany-prod-assets',
    ServerSideEncryptionConfiguration={
        'Rules': [{
            'ApplyServerSideEncryptionByDefault': {
                'SSEAlgorithm': 'AES256'
            }
        }]
    }
)

Object Key Design

Design keys for performance and organization:

# Good - enables S3 prefix optimization
uploads/2016/06/15/user-123/avatar.jpg
logs/production/2016-06-15/app-server-01.log
backups/database/daily/2016-06-15-db-snapshot.sql.gz

# Bad - all objects in same prefix
user-123-avatar.jpg
app-server-01-2016-06-15.log

Lifecycle Policies

Automatically manage object lifecycles to reduce costs:

{
  "Rules": [
    {
      "Id": "Move old logs to Glacier",
      "Status": "Enabled",
      "Prefix": "logs/",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    },
    {
      "Id": "Clean up incomplete multipart uploads",
      "Status": "Enabled",
      "Prefix": "",
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 7
      }
    },
    {
      "Id": "Delete old versions",
      "Status": "Enabled",
      "Prefix": "",
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 30
      }
    }
  ]
}

Apply via AWS CLI:

aws s3api put-bucket-lifecycle-configuration \
    --bucket mycompany-prod-assets \
    --lifecycle-configuration file://lifecycle.json

S3 Event Notifications

Trigger workflows when objects are created/deleted:

{
  "LambdaFunctionConfigurations": [
    {
      "LambdaFunctionArn": "arn:aws:lambda:us-west-2:123456789:function:ProcessImage",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": {
          "FilterRules": [
            {
              "Name": "prefix",
              "Value": "uploads/images/"
            },
            {
              "Name": "suffix",
              "Value": ".jpg"
            }
          ]
        }
      }
    }
  ]
}

Lambda function to process images:

import boto3
from PIL import Image
import io

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # Get uploaded image
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    
    # Download image
    response = s3.get_object(Bucket=bucket, Key=key)
    image_data = response['Body'].read()
    
    # Create thumbnail
    image = Image.open(io.BytesIO(image_data))
    image.thumbnail((200, 200))
    
    # Save thumbnail
    buffer = io.BytesIO()
    image.save(buffer, 'JPEG')
    buffer.seek(0)
    
    # Upload thumbnail
    thumbnail_key = key.replace('uploads/', 'thumbnails/')
    s3.put_object(
        Bucket=bucket,
        Key=thumbnail_key,
        Body=buffer,
        ContentType='image/jpeg'
    )
    
    return {
        'statusCode': 200,
        'body': f'Processed {key}'
    }

Direct Upload from Browser

Secure direct uploads using presigned URLs:

# Backend API endpoint
from flask import Flask, jsonify, request
import boto3
from datetime import timedelta

app = Flask(__name__)
s3 = boto3.client('s3')

@app.route('/api/upload-url', methods=['POST'])
def generate_upload_url():
    data = request.json
    filename = data['filename']
    content_type = data['contentType']
    
    # Generate unique key
    key = f"uploads/{user_id}/{uuid.uuid4()}/{filename}"
    
    # Generate presigned URL (valid for 5 minutes)
    presigned_url = s3.generate_presigned_url(
        'put_object',
        Params={
            'Bucket': 'mycompany-prod-assets',
            'Key': key,
            'ContentType': content_type
        },
        ExpiresIn=300
    )
    
    return jsonify({
        'uploadUrl': presigned_url,
        'key': key
    })

Frontend JavaScript:

async function uploadFile(file) {
    // Get presigned URL from backend
    const response = await fetch('/api/upload-url', {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify({
            filename: file.name,
            contentType: file.type
        })
    });
    
    const {uploadUrl, key} = await response.json();
    
    // Upload directly to S3
    await fetch(uploadUrl, {
        method: 'PUT',
        body: file,
        headers: {
            'Content-Type': file.type
        }
    });
    
    return key;
}

// Usage
document.getElementById('fileInput').addEventListener('change', async (e) => {
    const file = e.target.files[0];
    const key = await uploadFile(file);
    console.log('Uploaded to:', key);
});

Static Website Hosting

Host static websites directly from S3:

# Enable website hosting
aws s3 website s3://mycompany-website \
    --index-document index.html \
    --error-document error.html

# Set bucket policy for public read
aws s3api put-bucket-policy \
    --bucket mycompany-website \
    --policy '{
        "Version": "2012-10-17",
        "Statement": [{
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::mycompany-website/*"
        }]
    }'

# Deploy website
aws s3 sync ./dist s3://mycompany-website \
    --delete \
    --cache-control "max-age=31536000"

CloudFront Integration

Serve S3 content through CDN:

{
  "DistributionConfig": {
    "Origins": [{
      "Id": "S3-mycompany-prod-assets",
      "DomainName": "mycompany-prod-assets.s3.amazonaws.com",
      "S3OriginConfig": {
        "OriginAccessIdentity": "origin-access-identity/cloudfront/ABCDEFG"
      }
    }],
    "DefaultCacheBehavior": {
      "TargetOriginId": "S3-mycompany-prod-assets",
      "ViewerProtocolPolicy": "redirect-to-https",
      "AllowedMethods": ["GET", "HEAD"],
      "CachedMethods": ["GET", "HEAD"],
      "ForwardedValues": {
        "QueryString": false,
        "Cookies": {"Forward": "none"}
      },
      "MinTTL": 0,
      "DefaultTTL": 86400,
      "MaxTTL": 31536000
    },
    "Enabled": true,
    "Comment": "CDN for S3 assets",
    "Aliases": ["assets.mycompany.com"],
    "ViewerCertificate": {
      "ACMCertificateArn": "arn:aws:acm:us-east-1:123456789:certificate/abc",
      "SSLSupportMethod": "sni-only",
      "MinimumProtocolVersion": "TLSv1.2_2016"
    }
  }
}

Python helper for CloudFront invalidation:

import boto3

cloudfront = boto3.client('cloudfront')

def invalidate_cache(distribution_id, paths):
    """Invalidate CloudFront cache for specific paths"""
    cloudfront.create_invalidation(
        DistributionId=distribution_id,
        InvalidationBatch={
            'Paths': {
                'Quantity': len(paths),
                'Items': paths
            },
            'CallerReference': str(time.time())
        }
    )

# Usage
invalidate_cache('E1234ABCD', ['/images/*', '/css/*'])

Multipart Upload for Large Files

Handle large files efficiently:

import boto3
from boto3.s3.transfer import TransferConfig

s3 = boto3.client('s3')

# Configure multipart threshold and chunk size
config = TransferConfig(
    multipart_threshold=1024 * 25,  # 25 MB
    max_concurrency=10,
    multipart_chunksize=1024 * 25,  # 25 MB
    use_threads=True
)

# Upload large file
s3.upload_file(
    'large-file.zip',
    'mycompany-prod-assets',
    'uploads/large-file.zip',
    Config=config,
    Callback=ProgressPercentage('large-file.zip')
)

# Progress callback
class ProgressPercentage:
    def __init__(self, filename):
        self._filename = filename
        self._size = float(os.path.getsize(filename))
        self._seen_so_far = 0
        self._lock = threading.Lock()

    def __call__(self, bytes_amount):
        with self._lock:
            self._seen_so_far += bytes_amount
            percentage = (self._seen_so_far / self._size) * 100
            print(f"\r{self._filename} {percentage:.2f}% complete", end='')

Cross-Region Replication

Replicate objects across regions for disaster recovery:

{
  "Role": "arn:aws:iam::123456789:role/s3-replication-role",
  "Rules": [{
    "Status": "Enabled",
    "Priority": 1,
    "Filter": {"Prefix": ""},
    "Destination": {
      "Bucket": "arn:aws:s3:::mycompany-backup-eu-west-1",
      "ReplicationTime": {
        "Status": "Enabled",
        "Time": {"Minutes": 15}
      },
      "Metrics": {
        "Status": "Enabled",
        "EventThreshold": {"Minutes": 15}
      }
    },
    "DeleteMarkerReplication": {"Status": "Enabled"}
  }]
}

S3 Security Best Practices

Bucket Policies

Restrict access by IP or VPC:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:*",
    "Resource": [
      "arn:aws:s3:::mycompany-prod-assets",
      "arn:aws:s3:::mycompany-prod-assets/*"
    ],
    "Condition": {
      "NotIpAddress": {
        "aws:SourceIp": [
          "203.0.113.0/24",
          "198.51.100.0/24"
        ]
      }
    }
  }]
}

IAM Policies

Grant least-privilege access:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:PutObject",
      "s3:DeleteObject"
    ],
    "Resource": "arn:aws:s3:::mycompany-prod-assets/uploads/${aws:username}/*"
  }]
}

Server-Side Encryption

Use KMS for encryption:

s3.put_object(
    Bucket='mycompany-prod-assets',
    Key='sensitive-data.txt',
    Body=b'secret information',
    ServerSideEncryption='aws:kms',
    SSEKMSKeyId='arn:aws:kms:us-west-2:123456789:key/abc-123'
)

Cost Optimization

Storage Classes

# Archive old logs to Glacier
def archive_old_logs():
    s3 = boto3.resource('s3')
    bucket = s3.Bucket('mycompany-logs')
    
    cutoff_date = datetime.now() - timedelta(days=90)
    
    for obj in bucket.objects.filter(Prefix='logs/'):
        if obj.last_modified < cutoff_date:
            obj.copy_from(
                CopySource={'Bucket': bucket.name, 'Key': obj.key},
                StorageClass='GLACIER',
                MetadataDirective='COPY'
            )

Intelligent Tiering

Enable for automatic cost optimization:

aws s3api put-bucket-intelligent-tiering-configuration \
    --bucket mycompany-prod-assets \
    --id EntireBucket \
    --intelligent-tiering-configuration '{
        "Id": "EntireBucket",
        "Status": "Enabled",
        "Tierings": [
            {
                "Days": 90,
                "AccessTier": "ARCHIVE_ACCESS"
            },
            {
                "Days": 180,
                "AccessTier": "DEEP_ARCHIVE_ACCESS"
            }
        ]
    }'

Monitoring and Logging

Enable S3 Access Logging

s3.put_bucket_logging(
    Bucket='mycompany-prod-assets',
    BucketLoggingStatus={
        'LoggingEnabled': {
            'TargetBucket': 'mycompany-logs',
            'TargetPrefix': 's3-access-logs/'
        }
    }
)

CloudWatch Metrics

import boto3

cloudwatch = boto3.client('cloudwatch')

def get_s3_metrics(bucket_name):
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/S3',
        MetricName='NumberOfObjects',
        Dimensions=[
            {'Name': 'BucketName', 'Value': bucket_name},
            {'Name': 'StorageType', 'Value': 'AllStorageTypes'}
        ],
        StartTime=datetime.utcnow() - timedelta(days=1),
        EndTime=datetime.utcnow(),
        Period=86400,
        Statistics=['Average']
    )
    
    return response['Datapoints']

Advanced Patterns

S3 as a Message Queue

Use S3 events with SQS for reliable processing:

{
  "QueueConfigurations": [{
    "QueueArn": "arn:aws:sqs:us-west-2:123456789:process-uploads",
    "Events": ["s3:ObjectCreated:*"],
    "Filter": {
      "Key": {
        "FilterRules": [{
          "Name": "prefix",
          "Value": "uploads/"
        }]
      }
    }
  }]
}

Data Lake Architecture

Organize data for analytics:

s3://data-lake/
├── raw/
│   ├── year=2016/
│   │   ├── month=06/
│   │   │   ├── day=15/
│   │   │   │   └── data.parquet
├── processed/
│   ├── users/
│   │   └── year=2016/month=06/day=15/
├── analytics/
│   └── reports/
│       └── daily-summary-2016-06-15.csv

Conclusion

S3 is far more than storage—it’s a platform for building scalable systems:

Use lifecycle policies to optimize costs
Leverage event notifications for automation
Implement direct uploads for better UX
Integrate CloudFront for global performance
Apply security best practices
Monitor usage and costs

Start simple, then layer on advanced features as needed. The patterns shown here will handle billions of objects in production.

S3 best practices from mid-2016, when these patterns were emerging as standards.