data | ||
src | ||
.env.example | ||
.gitignore | ||
AGENT.md | ||
cli.py | ||
config.toml | ||
DATABASE_OPTIMIZATION.md | ||
LICENSE | ||
main.py | ||
readme.md | ||
README.md | ||
requirements.txt | ||
setup.py | ||
test_discord_connection.py | ||
test_imports.py |
⚠️ Warning! ai slop.
Discord Data Collector
A Python application for collecting Discord user data for research purposes, specifically designed to study information propagation patterns in Discord communities.
Important Disclaimers
- Terms of Service: This application uses self-botting, which violates Discord's Terms of Service and may result in account suspension.
- Educational Use Only: This tool is intended solely for educational and research purposes.
- Privacy Considerations: Always respect user privacy and obtain proper consent when collecting data.
- Legal Compliance: Ensure compliance with applicable data protection laws (GDPR, CCPA, etc.).
Features
- User Data Collection: Automatically collects usernames, profile pictures, bios, status, and server memberships
- Message Monitoring: Processes messages from monitored servers to identify active users
- Rate Limiting: Built-in rate limiting to avoid hitting Discord API limits
- Flexible Configuration: Easy configuration via TOML and environment files
- Data Export: Export collected data to CSV format
- Database Management: JSON-based storage with automatic backups
- CLI Tools: Command-line interface for data management and analysis
Installation
-
Clone the repository:
git clone <repository-url> cd discord-data-collector
-
Install dependencies:
pip install -r requirements.txt
-
Create configuration files:
cp .env.example .env # Edit .env with your Discord token
-
Configure settings:
- Edit
config.toml
to adjust collection settings - Add your Discord user token to
.env
- Edit
Configuration
Environment Variables (.env)
# Your Discord user token (REQUIRED)
DISCORD_TOKEN=your_discord_user_token_here
Configuration File (config.toml)
[database]
path = "data/users.json"
backup_interval = 3600
[collection]
profile_pictures = true
bio = true
status = true
server_membership = true
[rate_limiting]
request_delay = 1.0
max_requests_per_minute = 30
[monitoring]
target_servers = [] # Empty = monitor all servers
monitor_all_servers = true
[logging]
level = "INFO"
file = "logs/collector.log"
Usage
Running the Collector
# Start the data collector
python main.py
CLI Commands
# Show database statistics
python cli.py stats
# Search for users
python cli.py search "username"
# Export data to CSV
python cli.py export csv -o exported_data.csv
# Test Discord connection
python cli.py test
# Create manual backup
python cli.py backup
# Clean up old backups
python cli.py cleanup
Project Structure
discord-data-collector/
├── main.py # Main application entry point
├── cli.py # Command-line interface
├── config.toml # Configuration file
├── .env # Environment variables
├── requirements.txt # Python dependencies
├── src/
│ ├── __init__.py
│ ├── client.py # Discord client implementation
│ ├── config.py # Configuration management
│ ├── database.py # JSON database manager
│ ├── rate_limiter.py # Rate limiting utilities
│ └── logger.py # Logging setup
├── data/
│ ├── users.json # User database
│ └── backups/ # Database backups
└── logs/
└── collector.log # Application logs
Data Structure
Each user entry contains:
{
"user_id": 123456789,
"username": "example_user",
"discriminator": "1234",
"display_name": "Example User",
"avatar_url": "https://cdn.discordapp.com/avatars/...",
"banner_url": "https://cdn.discordapp.com/banners/...",
"bio": "User's about me section",
"status": "online",
"activity": "Playing a game",
"servers": [111111111, 222222222],
"created_at": "2024-01-01T00:00:00",
"updated_at": "2024-01-01T12:00:00"
}
Features in Detail
Rate Limiting
- Configurable request delays
- Per-minute request limits
- Automatic backoff on rate limit hits
Data Collection
- Real-time message monitoring
- Member list scanning
- Profile updates tracking
- Server membership tracking
Database Management
- Automatic backups
- Data deduplication
- Export capabilities
- Statistics generation
Logging
- Configurable log levels
- File rotation
- Separate Discord.py logging
Future Enhancements
- MongoDB integration for better scalability
- Web dashboard for data visualization
- Advanced search and filtering
- Data analysis tools
- Network analysis features
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
This project is for educational purposes only. Use responsibly and in compliance with applicable laws and terms of service.
Support
For issues or questions, please create an issue in the repository.
Remember: This tool is for educational research only. Always respect user privacy and platform terms of service.