140 lines
3.8 KiB
Markdown
140 lines
3.8 KiB
Markdown
# Database Optimization for 50k+ Users
|
|
|
|
## 1. Database Schema Improvements
|
|
|
|
### Add Indexes
|
|
```sql
|
|
-- Primary performance indexes
|
|
CREATE INDEX idx_users_username ON users(username);
|
|
CREATE INDEX idx_users_display_name ON users(display_name);
|
|
CREATE INDEX idx_users_updated_at ON users(updated_at);
|
|
CREATE INDEX idx_user_servers_server_id ON user_servers(server_id);
|
|
|
|
-- Composite indexes for common queries
|
|
CREATE INDEX idx_users_username_display ON users(username, display_name);
|
|
```
|
|
|
|
### Query Optimization
|
|
```sql
|
|
-- Instead of GROUP_CONCAT, use separate queries
|
|
SELECT * FROM users WHERE user_id = ?;
|
|
SELECT server_id FROM user_servers WHERE user_id = ?;
|
|
```
|
|
|
|
## 2. Connection Pool Implementation
|
|
|
|
Replace single connection with proper pooling:
|
|
|
|
```python
|
|
import aiomysql
|
|
|
|
async def create_pool():
|
|
return await aiomysql.create_pool(
|
|
host='localhost',
|
|
port=3306,
|
|
user='user',
|
|
password='password',
|
|
db='database',
|
|
minsize=5,
|
|
maxsize=20,
|
|
charset='utf8mb4'
|
|
)
|
|
```
|
|
|
|
## 3. Pagination Implementation
|
|
|
|
### Database Layer
|
|
```python
|
|
async def get_users_paginated(self, offset: int = 0, limit: int = 100) -> List[UserData]:
|
|
async with self.pool.acquire() as conn:
|
|
async with conn.cursor() as cursor:
|
|
await cursor.execute("""
|
|
SELECT * FROM users
|
|
ORDER BY user_id
|
|
LIMIT %s OFFSET %s
|
|
""", (limit, offset))
|
|
```
|
|
|
|
### CLI Layer
|
|
```python
|
|
async def search_user_paginated(query: str, page: int = 1, per_page: int = 50):
|
|
offset = (page - 1) * per_page
|
|
users = await database.search_users(query, offset, per_page)
|
|
# Display with pagination controls
|
|
```
|
|
|
|
## 4. Search Optimization
|
|
|
|
### Full-Text Search
|
|
```sql
|
|
-- Add full-text index for better search
|
|
ALTER TABLE users ADD FULLTEXT(username, display_name, bio);
|
|
|
|
-- Use full-text search instead of LIKE
|
|
SELECT * FROM users
|
|
WHERE MATCH(username, display_name) AGAINST(? IN BOOLEAN MODE);
|
|
```
|
|
|
|
### Cached Search Results
|
|
```python
|
|
# Cache frequent searches
|
|
from functools import lru_cache
|
|
import asyncio
|
|
|
|
@lru_cache(maxsize=1000)
|
|
async def cached_user_search(query: str):
|
|
return await database.search_users(query)
|
|
```
|
|
|
|
## 5. Batch Operations
|
|
|
|
### Bulk Inserts
|
|
```python
|
|
async def save_users_batch(self, users: List[UserData]):
|
|
async with self.pool.acquire() as conn:
|
|
async with conn.cursor() as cursor:
|
|
# Use executemany for bulk operations
|
|
await cursor.executemany("""
|
|
INSERT INTO users (...) VALUES (...)
|
|
ON DUPLICATE KEY UPDATE ...
|
|
""", [(user.user_id, user.username, ...) for user in users])
|
|
```
|
|
|
|
## 6. Rate Limiting Improvements
|
|
|
|
### Smarter Rate Limiting
|
|
```python
|
|
class AdaptiveRateLimiter:
|
|
def __init__(self):
|
|
self.base_delay = 1.0
|
|
self.consecutive_429s = 0
|
|
|
|
async def wait(self):
|
|
if self.consecutive_429s > 0:
|
|
delay = self.base_delay * (2 ** self.consecutive_429s)
|
|
await asyncio.sleep(min(delay, 60)) # Cap at 60 seconds
|
|
else:
|
|
await asyncio.sleep(self.base_delay)
|
|
```
|
|
|
|
## Performance Estimates (50k users):
|
|
|
|
### Current Implementation:
|
|
- **get_all_users()**: ~30-60 seconds + 2-4GB RAM
|
|
- **CLI search**: ~10-30 seconds per search
|
|
- **Database saves**: ~5-10x slower due to locking
|
|
|
|
### Optimized Implementation:
|
|
- **Paginated queries**: ~0.1-0.5 seconds per page
|
|
- **Indexed search**: ~0.1-1 second per search
|
|
- **Connection pool**: ~2-3x faster concurrent operations
|
|
- **Memory usage**: ~50-100MB instead of GB
|
|
|
|
## Implementation Priority:
|
|
|
|
1. **High Priority**: Connection pooling, pagination, indexes
|
|
2. **Medium Priority**: Search optimization, batch operations
|
|
3. **Low Priority**: Caching, adaptive rate limiting
|
|
|
|
The optimized version should handle 50k users with reasonable performance (~10x slower than 500 users instead of 100x slower).
|