discordb/readme.md
2025-07-13 21:04:53 +03:00

208 lines
5.1 KiB
Markdown

# ⚠️ Warning! ai slop.
# Discord Data Collector
A Python application for collecting Discord user data for research purposes, specifically designed to study information propagation patterns in Discord communities.
## Important Disclaimers
- **Terms of Service**: This application uses self-botting, which violates Discord's Terms of Service and may result in account suspension.
- **Educational Use Only**: This tool is intended solely for educational and research purposes.
- **Privacy Considerations**: Always respect user privacy and obtain proper consent when collecting data.
- **Legal Compliance**: Ensure compliance with applicable data protection laws (GDPR, CCPA, etc.).
## Features
- **User Data Collection**: Automatically collects usernames, profile pictures, bios, status, and server memberships
- **Message Monitoring**: Processes messages from monitored servers to identify active users
- **Rate Limiting**: Built-in rate limiting to avoid hitting Discord API limits
- **Flexible Configuration**: Easy configuration via TOML and environment files
- **Data Export**: Export collected data to CSV format
- **Database Management**: JSON-based storage with automatic backups
- **CLI Tools**: Command-line interface for data management and analysis
## Installation
1. **Clone the repository**:
```bash
git clone <repository-url>
cd discord-data-collector
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Create configuration files**:
```bash
cp .env.example .env
# Edit .env with your Discord token
```
4. **Configure settings**:
- Edit `config.toml` to adjust collection settings
- Add your Discord user token to `.env`
## Configuration
### Environment Variables (.env)
```env
# Your Discord user token (REQUIRED)
DISCORD_TOKEN=your_discord_user_token_here
```
### Configuration File (config.toml)
```toml
[database]
path = "data/users.json"
backup_interval = 3600
[collection]
profile_pictures = true
bio = true
status = true
server_membership = true
[rate_limiting]
request_delay = 1.0
max_requests_per_minute = 30
[monitoring]
target_servers = [] # Empty = monitor all servers
monitor_all_servers = true
[logging]
level = "INFO"
file = "logs/collector.log"
```
## Usage
### Running the Collector
```bash
# Start the data collector
python main.py
```
### CLI Commands
```bash
# Show database statistics
python cli.py stats
# Search for users
python cli.py search "username"
# Export data to CSV
python cli.py export csv -o exported_data.csv
# Test Discord connection
python cli.py test
# Create manual backup
python cli.py backup
# Clean up old backups
python cli.py cleanup
```
## Project Structure
```
discord-data-collector/
├── main.py # Main application entry point
├── cli.py # Command-line interface
├── config.toml # Configuration file
├── .env # Environment variables
├── requirements.txt # Python dependencies
├── src/
│ ├── __init__.py
│ ├── client.py # Discord client implementation
│ ├── config.py # Configuration management
│ ├── database.py # JSON database manager
│ ├── rate_limiter.py # Rate limiting utilities
│ └── logger.py # Logging setup
├── data/
│ ├── users.json # User database
│ └── backups/ # Database backups
└── logs/
└── collector.log # Application logs
```
## Data Structure
Each user entry contains:
```json
{
"user_id": 123456789,
"username": "example_user",
"discriminator": "1234",
"display_name": "Example User",
"avatar_url": "https://cdn.discordapp.com/avatars/...",
"banner_url": "https://cdn.discordapp.com/banners/...",
"bio": "User's about me section",
"status": "online",
"activity": "Playing a game",
"servers": [111111111, 222222222],
"created_at": "2024-01-01T00:00:00",
"updated_at": "2024-01-01T12:00:00"
}
```
## Features in Detail
### Rate Limiting
- Configurable request delays
- Per-minute request limits
- Automatic backoff on rate limit hits
### Data Collection
- Real-time message monitoring
- Member list scanning
- Profile updates tracking
- Server membership tracking
### Database Management
- Automatic backups
- Data deduplication
- Export capabilities
- Statistics generation
### Logging
- Configurable log levels
- File rotation
- Separate Discord.py logging
## Future Enhancements
- MongoDB integration for better scalability
- Web dashboard for data visualization
- Advanced search and filtering
- Data analysis tools
- Network analysis features
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## License
This project is for educational purposes only. Use responsibly and in compliance with applicable laws and terms of service.
## Support
For issues or questions, please create an issue in the repository.
---
**Remember**: This tool is for educational research only. Always respect user privacy and platform terms of service.