# ⚠️ Warning! ai slop. # Discord Data Collector A Python application for collecting Discord user data for research purposes, specifically designed to study information propagation patterns in Discord communities. ## Important Disclaimers - **Terms of Service**: This application uses self-botting, which violates Discord's Terms of Service and may result in account suspension. - **Educational Use Only**: This tool is intended solely for educational and research purposes. - **Privacy Considerations**: Always respect user privacy and obtain proper consent when collecting data. - **Legal Compliance**: Ensure compliance with applicable data protection laws (GDPR, CCPA, etc.). ## Features - **User Data Collection**: Automatically collects usernames, profile pictures, bios, status, and server memberships - **Message Monitoring**: Processes messages from monitored servers to identify active users - **Rate Limiting**: Built-in rate limiting to avoid hitting Discord API limits - **Flexible Configuration**: Easy configuration via TOML and environment files - **Data Export**: Export collected data to CSV format - **Database Management**: JSON-based storage with automatic backups - **CLI Tools**: Command-line interface for data management and analysis ## Installation 1. **Clone the repository**: ```bash git clone cd discord-data-collector ``` 2. **Install dependencies**: ```bash pip install -r requirements.txt ``` 3. **Create configuration files**: ```bash cp .env.example .env # Edit .env with your Discord token ``` 4. **Configure settings**: - Edit `config.toml` to adjust collection settings - Add your Discord user token to `.env` ## Configuration ### Environment Variables (.env) ```env # Your Discord user token (REQUIRED) DISCORD_TOKEN=your_discord_user_token_here ``` ### Configuration File (config.toml) ```toml [database] path = "data/users.json" backup_interval = 3600 [collection] profile_pictures = true bio = true status = true server_membership = true [rate_limiting] request_delay = 1.0 max_requests_per_minute = 30 [monitoring] target_servers = [] # Empty = monitor all servers monitor_all_servers = true [logging] level = "INFO" file = "logs/collector.log" ``` ## Usage ### Running the Collector ```bash # Start the data collector python main.py ``` ### CLI Commands ```bash # Show database statistics python cli.py stats # Search for users python cli.py search "username" # Export data to CSV python cli.py export csv -o exported_data.csv # Test Discord connection python cli.py test # Create manual backup python cli.py backup # Clean up old backups python cli.py cleanup ``` ## Project Structure ``` discord-data-collector/ ├── main.py # Main application entry point ├── cli.py # Command-line interface ├── config.toml # Configuration file ├── .env # Environment variables ├── requirements.txt # Python dependencies ├── src/ │ ├── __init__.py │ ├── client.py # Discord client implementation │ ├── config.py # Configuration management │ ├── database.py # JSON database manager │ ├── rate_limiter.py # Rate limiting utilities │ └── logger.py # Logging setup ├── data/ │ ├── users.json # User database │ └── backups/ # Database backups └── logs/ └── collector.log # Application logs ``` ## Data Structure Each user entry contains: ```json { "user_id": 123456789, "username": "example_user", "discriminator": "1234", "display_name": "Example User", "avatar_url": "https://cdn.discordapp.com/avatars/...", "banner_url": "https://cdn.discordapp.com/banners/...", "bio": "User's about me section", "status": "online", "activity": "Playing a game", "servers": [111111111, 222222222], "created_at": "2024-01-01T00:00:00", "updated_at": "2024-01-01T12:00:00" } ``` ## Features in Detail ### Rate Limiting - Configurable request delays - Per-minute request limits - Automatic backoff on rate limit hits ### Data Collection - Real-time message monitoring - Member list scanning - Profile updates tracking - Server membership tracking ### Database Management - Automatic backups - Data deduplication - Export capabilities - Statistics generation ### Logging - Configurable log levels - File rotation - Separate Discord.py logging ## Future Enhancements - MongoDB integration for better scalability - Web dashboard for data visualization - Advanced search and filtering - Data analysis tools - Network analysis features ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Add tests if applicable 5. Submit a pull request ## License This project is for educational purposes only. Use responsibly and in compliance with applicable laws and terms of service. ## Support For issues or questions, please create an issue in the repository. --- **Remember**: This tool is for educational research only. Always respect user privacy and platform terms of service.