discordb/DATABASE_OPTIMIZATION.md

3.8 KiB

Database Optimization for 50k+ Users

1. Database Schema Improvements

Add Indexes

-- Primary performance indexes
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_display_name ON users(display_name);  
CREATE INDEX idx_users_updated_at ON users(updated_at);
CREATE INDEX idx_user_servers_server_id ON user_servers(server_id);

-- Composite indexes for common queries
CREATE INDEX idx_users_username_display ON users(username, display_name);

Query Optimization

-- Instead of GROUP_CONCAT, use separate queries
SELECT * FROM users WHERE user_id = ?;
SELECT server_id FROM user_servers WHERE user_id = ?;

2. Connection Pool Implementation

Replace single connection with proper pooling:

import aiomysql

async def create_pool():
    return await aiomysql.create_pool(
        host='localhost',
        port=3306,
        user='user',
        password='password',
        db='database',
        minsize=5,
        maxsize=20,
        charset='utf8mb4'
    )

3. Pagination Implementation

Database Layer

async def get_users_paginated(self, offset: int = 0, limit: int = 100) -> List[UserData]:
    async with self.pool.acquire() as conn:
        async with conn.cursor() as cursor:
            await cursor.execute("""
                SELECT * FROM users 
                ORDER BY user_id 
                LIMIT %s OFFSET %s
            """, (limit, offset))

CLI Layer

async def search_user_paginated(query: str, page: int = 1, per_page: int = 50):
    offset = (page - 1) * per_page
    users = await database.search_users(query, offset, per_page)
    # Display with pagination controls

4. Search Optimization

-- Add full-text index for better search
ALTER TABLE users ADD FULLTEXT(username, display_name, bio);

-- Use full-text search instead of LIKE
SELECT * FROM users 
WHERE MATCH(username, display_name) AGAINST(? IN BOOLEAN MODE);

Cached Search Results

# Cache frequent searches
from functools import lru_cache
import asyncio

@lru_cache(maxsize=1000)
async def cached_user_search(query: str):
    return await database.search_users(query)

5. Batch Operations

Bulk Inserts

async def save_users_batch(self, users: List[UserData]):
    async with self.pool.acquire() as conn:
        async with conn.cursor() as cursor:
            # Use executemany for bulk operations
            await cursor.executemany("""
                INSERT INTO users (...) VALUES (...) 
                ON DUPLICATE KEY UPDATE ...
            """, [(user.user_id, user.username, ...) for user in users])

6. Rate Limiting Improvements

Smarter Rate Limiting

class AdaptiveRateLimiter:
    def __init__(self):
        self.base_delay = 1.0
        self.consecutive_429s = 0
        
    async def wait(self):
        if self.consecutive_429s > 0:
            delay = self.base_delay * (2 ** self.consecutive_429s)
            await asyncio.sleep(min(delay, 60))  # Cap at 60 seconds
        else:
            await asyncio.sleep(self.base_delay)

Performance Estimates (50k users):

Current Implementation:

  • get_all_users(): ~30-60 seconds + 2-4GB RAM
  • CLI search: ~10-30 seconds per search
  • Database saves: ~5-10x slower due to locking

Optimized Implementation:

  • Paginated queries: ~0.1-0.5 seconds per page
  • Indexed search: ~0.1-1 second per search
  • Connection pool: ~2-3x faster concurrent operations
  • Memory usage: ~50-100MB instead of GB

Implementation Priority:

  1. High Priority: Connection pooling, pagination, indexes
  2. Medium Priority: Search optimization, batch operations
  3. Low Priority: Caching, adaptive rate limiting

The optimized version should handle 50k users with reasonable performance (~10x slower than 500 users instead of 100x slower).