208 lines
5.1 KiB
Markdown
208 lines
5.1 KiB
Markdown
# ⚠️ Warning! ai slop.
|
|
|
|
|
|
# Discord Data Collector
|
|
|
|
A Python application for collecting Discord user data for research purposes, specifically designed to study information propagation patterns in Discord communities.
|
|
|
|
## Important Disclaimers
|
|
|
|
- **Terms of Service**: This application uses self-botting, which violates Discord's Terms of Service and may result in account suspension.
|
|
- **Educational Use Only**: This tool is intended solely for educational and research purposes.
|
|
- **Privacy Considerations**: Always respect user privacy and obtain proper consent when collecting data.
|
|
- **Legal Compliance**: Ensure compliance with applicable data protection laws (GDPR, CCPA, etc.).
|
|
|
|
## Features
|
|
|
|
- **User Data Collection**: Automatically collects usernames, profile pictures, bios, status, and server memberships
|
|
- **Message Monitoring**: Processes messages from monitored servers to identify active users
|
|
- **Rate Limiting**: Built-in rate limiting to avoid hitting Discord API limits
|
|
- **Flexible Configuration**: Easy configuration via TOML and environment files
|
|
- **Data Export**: Export collected data to CSV format
|
|
- **Database Management**: JSON-based storage with automatic backups
|
|
- **CLI Tools**: Command-line interface for data management and analysis
|
|
|
|
## Installation
|
|
|
|
1. **Clone the repository**:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd discord-data-collector
|
|
```
|
|
|
|
2. **Install dependencies**:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. **Create configuration files**:
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env with your Discord token
|
|
```
|
|
|
|
4. **Configure settings**:
|
|
- Edit `config.toml` to adjust collection settings
|
|
- Add your Discord user token to `.env`
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables (.env)
|
|
|
|
```env
|
|
# Your Discord user token (REQUIRED)
|
|
DISCORD_TOKEN=your_discord_user_token_here
|
|
```
|
|
|
|
### Configuration File (config.toml)
|
|
|
|
```toml
|
|
[database]
|
|
path = "data/users.json"
|
|
backup_interval = 3600
|
|
|
|
[collection]
|
|
profile_pictures = true
|
|
bio = true
|
|
status = true
|
|
server_membership = true
|
|
|
|
[rate_limiting]
|
|
request_delay = 1.0
|
|
max_requests_per_minute = 30
|
|
|
|
[monitoring]
|
|
target_servers = [] # Empty = monitor all servers
|
|
monitor_all_servers = true
|
|
|
|
[logging]
|
|
level = "INFO"
|
|
file = "logs/collector.log"
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Running the Collector
|
|
|
|
```bash
|
|
# Start the data collector
|
|
python main.py
|
|
```
|
|
|
|
### CLI Commands
|
|
|
|
```bash
|
|
# Show database statistics
|
|
python cli.py stats
|
|
|
|
# Search for users
|
|
python cli.py search "username"
|
|
|
|
# Export data to CSV
|
|
python cli.py export csv -o exported_data.csv
|
|
|
|
# Test Discord connection
|
|
python cli.py test
|
|
|
|
# Create manual backup
|
|
python cli.py backup
|
|
|
|
# Clean up old backups
|
|
python cli.py cleanup
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
discord-data-collector/
|
|
├── main.py # Main application entry point
|
|
├── cli.py # Command-line interface
|
|
├── config.toml # Configuration file
|
|
├── .env # Environment variables
|
|
├── requirements.txt # Python dependencies
|
|
├── src/
|
|
│ ├── __init__.py
|
|
│ ├── client.py # Discord client implementation
|
|
│ ├── config.py # Configuration management
|
|
│ ├── database.py # JSON database manager
|
|
│ ├── rate_limiter.py # Rate limiting utilities
|
|
│ └── logger.py # Logging setup
|
|
├── data/
|
|
│ ├── users.json # User database
|
|
│ └── backups/ # Database backups
|
|
└── logs/
|
|
└── collector.log # Application logs
|
|
```
|
|
|
|
## Data Structure
|
|
|
|
Each user entry contains:
|
|
|
|
```json
|
|
{
|
|
"user_id": 123456789,
|
|
"username": "example_user",
|
|
"discriminator": "1234",
|
|
"display_name": "Example User",
|
|
"avatar_url": "https://cdn.discordapp.com/avatars/...",
|
|
"banner_url": "https://cdn.discordapp.com/banners/...",
|
|
"bio": "User's about me section",
|
|
"status": "online",
|
|
"activity": "Playing a game",
|
|
"servers": [111111111, 222222222],
|
|
"created_at": "2024-01-01T00:00:00",
|
|
"updated_at": "2024-01-01T12:00:00"
|
|
}
|
|
```
|
|
|
|
## Features in Detail
|
|
|
|
### Rate Limiting
|
|
- Configurable request delays
|
|
- Per-minute request limits
|
|
- Automatic backoff on rate limit hits
|
|
|
|
### Data Collection
|
|
- Real-time message monitoring
|
|
- Member list scanning
|
|
- Profile updates tracking
|
|
- Server membership tracking
|
|
|
|
### Database Management
|
|
- Automatic backups
|
|
- Data deduplication
|
|
- Export capabilities
|
|
- Statistics generation
|
|
|
|
### Logging
|
|
- Configurable log levels
|
|
- File rotation
|
|
- Separate Discord.py logging
|
|
|
|
## Future Enhancements
|
|
|
|
- MongoDB integration for better scalability
|
|
- Web dashboard for data visualization
|
|
- Advanced search and filtering
|
|
- Data analysis tools
|
|
- Network analysis features
|
|
|
|
## Contributing
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch
|
|
3. Make your changes
|
|
4. Add tests if applicable
|
|
5. Submit a pull request
|
|
|
|
## License
|
|
|
|
This project is for educational purposes only. Use responsibly and in compliance with applicable laws and terms of service.
|
|
|
|
## Support
|
|
|
|
For issues or questions, please create an issue in the repository.
|
|
|
|
---
|
|
|
|
**Remember**: This tool is for educational research only. Always respect user privacy and platform terms of service. |