3.8 KiB
3.8 KiB
Database Optimization for 50k+ Users
1. Database Schema Improvements
Add Indexes
-- Primary performance indexes
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_display_name ON users(display_name);
CREATE INDEX idx_users_updated_at ON users(updated_at);
CREATE INDEX idx_user_servers_server_id ON user_servers(server_id);
-- Composite indexes for common queries
CREATE INDEX idx_users_username_display ON users(username, display_name);
Query Optimization
-- Instead of GROUP_CONCAT, use separate queries
SELECT * FROM users WHERE user_id = ?;
SELECT server_id FROM user_servers WHERE user_id = ?;
2. Connection Pool Implementation
Replace single connection with proper pooling:
import aiomysql
async def create_pool():
return await aiomysql.create_pool(
host='localhost',
port=3306,
user='user',
password='password',
db='database',
minsize=5,
maxsize=20,
charset='utf8mb4'
)
3. Pagination Implementation
Database Layer
async def get_users_paginated(self, offset: int = 0, limit: int = 100) -> List[UserData]:
async with self.pool.acquire() as conn:
async with conn.cursor() as cursor:
await cursor.execute("""
SELECT * FROM users
ORDER BY user_id
LIMIT %s OFFSET %s
""", (limit, offset))
CLI Layer
async def search_user_paginated(query: str, page: int = 1, per_page: int = 50):
offset = (page - 1) * per_page
users = await database.search_users(query, offset, per_page)
# Display with pagination controls
4. Search Optimization
Full-Text Search
-- Add full-text index for better search
ALTER TABLE users ADD FULLTEXT(username, display_name, bio);
-- Use full-text search instead of LIKE
SELECT * FROM users
WHERE MATCH(username, display_name) AGAINST(? IN BOOLEAN MODE);
Cached Search Results
# Cache frequent searches
from functools import lru_cache
import asyncio
@lru_cache(maxsize=1000)
async def cached_user_search(query: str):
return await database.search_users(query)
5. Batch Operations
Bulk Inserts
async def save_users_batch(self, users: List[UserData]):
async with self.pool.acquire() as conn:
async with conn.cursor() as cursor:
# Use executemany for bulk operations
await cursor.executemany("""
INSERT INTO users (...) VALUES (...)
ON DUPLICATE KEY UPDATE ...
""", [(user.user_id, user.username, ...) for user in users])
6. Rate Limiting Improvements
Smarter Rate Limiting
class AdaptiveRateLimiter:
def __init__(self):
self.base_delay = 1.0
self.consecutive_429s = 0
async def wait(self):
if self.consecutive_429s > 0:
delay = self.base_delay * (2 ** self.consecutive_429s)
await asyncio.sleep(min(delay, 60)) # Cap at 60 seconds
else:
await asyncio.sleep(self.base_delay)
Performance Estimates (50k users):
Current Implementation:
- get_all_users(): ~30-60 seconds + 2-4GB RAM
- CLI search: ~10-30 seconds per search
- Database saves: ~5-10x slower due to locking
Optimized Implementation:
- Paginated queries: ~0.1-0.5 seconds per page
- Indexed search: ~0.1-1 second per search
- Connection pool: ~2-3x faster concurrent operations
- Memory usage: ~50-100MB instead of GB
Implementation Priority:
- High Priority: Connection pooling, pagination, indexes
- Medium Priority: Search optimization, batch operations
- Low Priority: Caching, adaptive rate limiting
The optimized version should handle 50k users with reasonable performance (~10x slower than 500 users instead of 100x slower).