DynamoDB Data Modeling: Patterns and Best Practices

DynamoDB requires a different mindset than relational databases. After modeling complex applications in DynamoDB, I’ve learned that understanding access patterns is more important than normalization.

DynamoDB Basics

Key Concepts

Partition Key: Determines data distribution
Sort Key: Orders items within partition
GSI: Global Secondary Index (different partition/sort keys)
LSI: Local Secondary Index (same partition, different sort key)

Table Structure

import boto3

dynamodb = boto3.resource('dynamodb')

# Create table
table = dynamodb.create_table(
    TableName='Users',
    KeySchema=[
        {
            'AttributeName': 'userId',
            'KeyType': 'HASH'  # Partition key
        },
        {
            'AttributeName': 'timestamp',
            'KeyType': 'RANGE'  # Sort key
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'userId',
            'AttributeType': 'S'
        },
        {
            'AttributeName': 'timestamp',
            'AttributeType': 'N'
        }
    ],
    BillingMode='PAY_PER_REQUEST'
)

Access Patterns First

Design your table based on how you’ll query it:

Access Patterns:
Get user by ID
Get user's orders
Get orders by status
Get user's recent activity

Single Table Design

Denormalized Data Model

# Single table for users, orders, and activities
{
    "PK": "USER#123",           # Partition key
    "SK": "PROFILE",            # Sort key
    "GSI1PK": "USER#123",
    "GSI1SK": "PROFILE",
    "userId": "123",
    "name": "John Doe",
    "email": "john@example.com",
    "type": "user"
}

{
    "PK": "USER#123",
    "SK": "ORDER#456",
    "GSI1PK": "ORDER#456",
    "GSI1SK": "STATUS#pending",
    "orderId": "456",
    "userId": "123",
    "status": "pending",
    "total": 99.99,
    "type": "order"
}

{
    "PK": "USER#123",
    "SK": "ACTIVITY#2018-04-15T10:00:00Z",
    "GSI1PK": "ACTIVITY",
    "GSI1SK": "2018-04-15T10:00:00Z",
    "activityType": "login",
    "timestamp": "2018-04-15T10:00:00Z",
    "type": "activity"
}

Query Patterns

# Get user profile
response = table.query(
    KeyConditionExpression='PK = :pk AND SK = :sk',
    ExpressionAttributeValues={
        ':pk': 'USER#123',
        ':sk': 'PROFILE'
    }
)

# Get user's orders
response = table.query(
    KeyConditionExpression='PK = :pk AND begins_with(SK, :prefix)',
    ExpressionAttributeValues={
        ':pk': 'USER#123',
        ':prefix': 'ORDER#'
    }
)

# Get orders by status (using GSI1)
response = table.query(
    IndexName='GSI1',
    KeyConditionExpression='GSI1PK = :pk AND begins_with(GSI1SK, :prefix)',
    ExpressionAttributeValues={
        ':pk': 'ORDER#456',
        ':prefix': 'STATUS#'
    }
)

Common Patterns

Pattern 1: Entity with Metadata

# User entity
{
    "PK": "USER#123",
    "SK": "METADATA",
    "name": "John Doe",
    "email": "john@example.com"
}

# User's settings
{
    "PK": "USER#123",
    "SK": "SETTINGS#notifications",
    "emailNotifications": True,
    "pushNotifications": False
}

Pattern 2: Time-Series Data

# Sensor readings
{
    "PK": "SENSOR#temp-1",
    "SK": "2018-04-15T10:00:00Z",
    "temperature": 72.5,
    "humidity": 45.2
}

# Query recent readings
response = table.query(
    KeyConditionExpression='PK = :pk AND SK >= :timestamp',
    ExpressionAttributeValues={
        ':pk': 'SENSOR#temp-1',
        ':timestamp': '2018-04-15T00:00:00Z'
    },
    ScanIndexForward=False  # Descending order
)

Pattern 3: Many-to-Many Relationships

# User-Follow relationship
{
    "PK": "USER#123",
    "SK": "FOLLOWS#456",
    "followedUserId": "456",
    "followedAt": "2018-04-15T10:00:00Z"
}

# Reverse: Who follows user 123
{
    "PK": "USER#123",
    "SK": "FOLLOWED_BY#789",
    "followerId": "789",
    "followedAt": "2018-04-15T10:00:00Z"
}

# Or use GSI
{
    "PK": "USER#123",
    "SK": "FOLLOWS#456",
    "GSI1PK": "USER#456",  # Flipped for reverse lookup
    "GSI1SK": "FOLLOWED_BY#123"
}

GSI Patterns

Sparse Index

# Only index active orders
{
    "PK": "USER#123",
    "SK": "ORDER#456",
    "GSI1PK": "STATUS#active",  # Only if active
    "GSI1SK": "2018-04-15T10:00:00Z",
    "status": "active"
}

# Query active orders
response = table.query(
    IndexName='GSI1',
    KeyConditionExpression='GSI1PK = :status',
    ExpressionAttributeValues={
        ':status': 'STATUS#active'
    }
)

Inverted Index

# Original
{
    "PK": "USER#123",
    "SK": "ORDER#456",
    "orderId": "456"
}

# Inverted for lookup
{
    "PK": "ORDER#456",
    "SK": "USER#123",
    "GSI1PK": "ORDER#456",
    "GSI1SK": "USER#123"
}

Batch Operations

# Batch write
with table.batch_writer() as batch:
    for item in items:
        batch.put_item(Item=item)

# Batch get
response = dynamodb.batch_get_item(
    RequestItems={
        'Users': {
            'Keys': [
                {'userId': '123'},
                {'userId': '456'},
                {'userId': '789'}
            ]
        }
    }
)

Best Practices

Design for access patterns - Not normalization
Use single table - When access patterns overlap
Denormalize data - Reduce queries
Use GSIs wisely - They cost money
Hot partition avoidance - Distribute load
Use composite keys - For complex queries
Monitor capacity - Watch throttling
Use TTL - Auto-delete old data

Conclusion

DynamoDB modeling requires:

Understanding access patterns first
Denormalization over normalization
Strategic use of GSIs
Single table design when appropriate

Start with access patterns, then design your keys. The patterns shown here handle millions of items in production.

DynamoDB data modeling from April 2018, covering production patterns.