# Database Optimization for 50k+ Users ## 1. Database Schema Improvements ### Add Indexes ```sql -- Primary performance indexes CREATE INDEX idx_users_username ON users(username); CREATE INDEX idx_users_display_name ON users(display_name); CREATE INDEX idx_users_updated_at ON users(updated_at); CREATE INDEX idx_user_servers_server_id ON user_servers(server_id); -- Composite indexes for common queries CREATE INDEX idx_users_username_display ON users(username, display_name); ``` ### Query Optimization ```sql -- Instead of GROUP_CONCAT, use separate queries SELECT * FROM users WHERE user_id = ?; SELECT server_id FROM user_servers WHERE user_id = ?; ``` ## 2. Connection Pool Implementation Replace single connection with proper pooling: ```python import aiomysql async def create_pool(): return await aiomysql.create_pool( host='localhost', port=3306, user='user', password='password', db='database', minsize=5, maxsize=20, charset='utf8mb4' ) ``` ## 3. Pagination Implementation ### Database Layer ```python async def get_users_paginated(self, offset: int = 0, limit: int = 100) -> List[UserData]: async with self.pool.acquire() as conn: async with conn.cursor() as cursor: await cursor.execute(""" SELECT * FROM users ORDER BY user_id LIMIT %s OFFSET %s """, (limit, offset)) ``` ### CLI Layer ```python async def search_user_paginated(query: str, page: int = 1, per_page: int = 50): offset = (page - 1) * per_page users = await database.search_users(query, offset, per_page) # Display with pagination controls ``` ## 4. Search Optimization ### Full-Text Search ```sql -- Add full-text index for better search ALTER TABLE users ADD FULLTEXT(username, display_name, bio); -- Use full-text search instead of LIKE SELECT * FROM users WHERE MATCH(username, display_name) AGAINST(? IN BOOLEAN MODE); ``` ### Cached Search Results ```python # Cache frequent searches from functools import lru_cache import asyncio @lru_cache(maxsize=1000) async def cached_user_search(query: str): return await database.search_users(query) ``` ## 5. Batch Operations ### Bulk Inserts ```python async def save_users_batch(self, users: List[UserData]): async with self.pool.acquire() as conn: async with conn.cursor() as cursor: # Use executemany for bulk operations await cursor.executemany(""" INSERT INTO users (...) VALUES (...) ON DUPLICATE KEY UPDATE ... """, [(user.user_id, user.username, ...) for user in users]) ``` ## 6. Rate Limiting Improvements ### Smarter Rate Limiting ```python class AdaptiveRateLimiter: def __init__(self): self.base_delay = 1.0 self.consecutive_429s = 0 async def wait(self): if self.consecutive_429s > 0: delay = self.base_delay * (2 ** self.consecutive_429s) await asyncio.sleep(min(delay, 60)) # Cap at 60 seconds else: await asyncio.sleep(self.base_delay) ``` ## Performance Estimates (50k users): ### Current Implementation: - **get_all_users()**: ~30-60 seconds + 2-4GB RAM - **CLI search**: ~10-30 seconds per search - **Database saves**: ~5-10x slower due to locking ### Optimized Implementation: - **Paginated queries**: ~0.1-0.5 seconds per page - **Indexed search**: ~0.1-1 second per search - **Connection pool**: ~2-3x faster concurrent operations - **Memory usage**: ~50-100MB instead of GB ## Implementation Priority: 1. **High Priority**: Connection pooling, pagination, indexes 2. **Medium Priority**: Search optimization, batch operations 3. **Low Priority**: Caching, adaptive rate limiting The optimized version should handle 50k users with reasonable performance (~10x slower than 500 users instead of 100x slower).