discordb/DATABASE_OPTIMIZATION.md

140 lines
3.8 KiB
Markdown

# Database Optimization for 50k+ Users
## 1. Database Schema Improvements
### Add Indexes
```sql
-- Primary performance indexes
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_display_name ON users(display_name);
CREATE INDEX idx_users_updated_at ON users(updated_at);
CREATE INDEX idx_user_servers_server_id ON user_servers(server_id);
-- Composite indexes for common queries
CREATE INDEX idx_users_username_display ON users(username, display_name);
```
### Query Optimization
```sql
-- Instead of GROUP_CONCAT, use separate queries
SELECT * FROM users WHERE user_id = ?;
SELECT server_id FROM user_servers WHERE user_id = ?;
```
## 2. Connection Pool Implementation
Replace single connection with proper pooling:
```python
import aiomysql
async def create_pool():
return await aiomysql.create_pool(
host='localhost',
port=3306,
user='user',
password='password',
db='database',
minsize=5,
maxsize=20,
charset='utf8mb4'
)
```
## 3. Pagination Implementation
### Database Layer
```python
async def get_users_paginated(self, offset: int = 0, limit: int = 100) -> List[UserData]:
async with self.pool.acquire() as conn:
async with conn.cursor() as cursor:
await cursor.execute("""
SELECT * FROM users
ORDER BY user_id
LIMIT %s OFFSET %s
""", (limit, offset))
```
### CLI Layer
```python
async def search_user_paginated(query: str, page: int = 1, per_page: int = 50):
offset = (page - 1) * per_page
users = await database.search_users(query, offset, per_page)
# Display with pagination controls
```
## 4. Search Optimization
### Full-Text Search
```sql
-- Add full-text index for better search
ALTER TABLE users ADD FULLTEXT(username, display_name, bio);
-- Use full-text search instead of LIKE
SELECT * FROM users
WHERE MATCH(username, display_name) AGAINST(? IN BOOLEAN MODE);
```
### Cached Search Results
```python
# Cache frequent searches
from functools import lru_cache
import asyncio
@lru_cache(maxsize=1000)
async def cached_user_search(query: str):
return await database.search_users(query)
```
## 5. Batch Operations
### Bulk Inserts
```python
async def save_users_batch(self, users: List[UserData]):
async with self.pool.acquire() as conn:
async with conn.cursor() as cursor:
# Use executemany for bulk operations
await cursor.executemany("""
INSERT INTO users (...) VALUES (...)
ON DUPLICATE KEY UPDATE ...
""", [(user.user_id, user.username, ...) for user in users])
```
## 6. Rate Limiting Improvements
### Smarter Rate Limiting
```python
class AdaptiveRateLimiter:
def __init__(self):
self.base_delay = 1.0
self.consecutive_429s = 0
async def wait(self):
if self.consecutive_429s > 0:
delay = self.base_delay * (2 ** self.consecutive_429s)
await asyncio.sleep(min(delay, 60)) # Cap at 60 seconds
else:
await asyncio.sleep(self.base_delay)
```
## Performance Estimates (50k users):
### Current Implementation:
- **get_all_users()**: ~30-60 seconds + 2-4GB RAM
- **CLI search**: ~10-30 seconds per search
- **Database saves**: ~5-10x slower due to locking
### Optimized Implementation:
- **Paginated queries**: ~0.1-0.5 seconds per page
- **Indexed search**: ~0.1-1 second per search
- **Connection pool**: ~2-3x faster concurrent operations
- **Memory usage**: ~50-100MB instead of GB
## Implementation Priority:
1. **High Priority**: Connection pooling, pagination, indexes
2. **Medium Priority**: Search optimization, batch operations
3. **Low Priority**: Caching, adaptive rate limiting
The optimized version should handle 50k users with reasonable performance (~10x slower than 500 users instead of 100x slower).