Find a file
2025-07-14 13:50:12 +03:00
data she mongo my db till i 429 Too many requests 2025-07-14 00:28:45 +03:00
src optimized the optimizations (fixed mariadb) 2025-07-14 13:50:12 +03:00
.env.example Initial commit v2 2025-07-13 21:04:53 +03:00
.gitignore Initial commit 2025-07-13 17:27:04 +02:00
AGENT.md jankv3 2025-07-14 01:38:51 +03:00
cli.py why is there optimization in my racism app (d*scord) 2025-07-14 13:44:23 +03:00
config.toml jankv3 2025-07-14 01:38:51 +03:00
DATABASE_OPTIMIZATION.md why is there optimization in my racism app (d*scord) 2025-07-14 13:44:23 +03:00
LICENSE Initial commit 2025-07-13 17:27:04 +02:00
main.py jank (misc fixes, switched to mariadb) 2025-07-14 01:23:39 +03:00
readme.md Initial commit v2 2025-07-13 21:04:53 +03:00
README.md Initial commit 2025-07-13 17:27:04 +02:00
requirements.txt jank (misc fixes, switched to mariadb) 2025-07-14 01:23:39 +03:00
setup.py Initial commit v3 2025-07-13 21:49:22 +03:00
test_discord_connection.py jank (misc fixes, switched to mariadb) 2025-07-14 01:23:39 +03:00
test_imports.py Initial commit v3 2025-07-13 21:49:22 +03:00

⚠️ Warning! ai slop.

Discord Data Collector

A Python application for collecting Discord user data for research purposes, specifically designed to study information propagation patterns in Discord communities.

Important Disclaimers

  • Terms of Service: This application uses self-botting, which violates Discord's Terms of Service and may result in account suspension.
  • Educational Use Only: This tool is intended solely for educational and research purposes.
  • Privacy Considerations: Always respect user privacy and obtain proper consent when collecting data.
  • Legal Compliance: Ensure compliance with applicable data protection laws (GDPR, CCPA, etc.).

Features

  • User Data Collection: Automatically collects usernames, profile pictures, bios, status, and server memberships
  • Message Monitoring: Processes messages from monitored servers to identify active users
  • Rate Limiting: Built-in rate limiting to avoid hitting Discord API limits
  • Flexible Configuration: Easy configuration via TOML and environment files
  • Data Export: Export collected data to CSV format
  • Database Management: JSON-based storage with automatic backups
  • CLI Tools: Command-line interface for data management and analysis

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd discord-data-collector
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Create configuration files:

    cp .env.example .env
    # Edit .env with your Discord token
    
  4. Configure settings:

    • Edit config.toml to adjust collection settings
    • Add your Discord user token to .env

Configuration

Environment Variables (.env)

# Your Discord user token (REQUIRED)
DISCORD_TOKEN=your_discord_user_token_here

Configuration File (config.toml)

[database]
path = "data/users.json"
backup_interval = 3600

[collection]
profile_pictures = true
bio = true
status = true
server_membership = true

[rate_limiting]
request_delay = 1.0
max_requests_per_minute = 30

[monitoring]
target_servers = []  # Empty = monitor all servers
monitor_all_servers = true

[logging]
level = "INFO"
file = "logs/collector.log"

Usage

Running the Collector

# Start the data collector
python main.py

CLI Commands

# Show database statistics
python cli.py stats

# Search for users
python cli.py search "username"

# Export data to CSV
python cli.py export csv -o exported_data.csv

# Test Discord connection
python cli.py test

# Create manual backup
python cli.py backup

# Clean up old backups
python cli.py cleanup

Project Structure

discord-data-collector/
├── main.py                 # Main application entry point
├── cli.py                  # Command-line interface
├── config.toml             # Configuration file
├── .env                    # Environment variables
├── requirements.txt        # Python dependencies
├── src/
│   ├── __init__.py
│   ├── client.py          # Discord client implementation
│   ├── config.py          # Configuration management
│   ├── database.py        # JSON database manager
│   ├── rate_limiter.py    # Rate limiting utilities
│   └── logger.py          # Logging setup
├── data/
│   ├── users.json         # User database
│   └── backups/           # Database backups
└── logs/
    └── collector.log      # Application logs

Data Structure

Each user entry contains:

{
  "user_id": 123456789,
  "username": "example_user",
  "discriminator": "1234",
  "display_name": "Example User",
  "avatar_url": "https://cdn.discordapp.com/avatars/...",
  "banner_url": "https://cdn.discordapp.com/banners/...",
  "bio": "User's about me section",
  "status": "online",
  "activity": "Playing a game",
  "servers": [111111111, 222222222],
  "created_at": "2024-01-01T00:00:00",
  "updated_at": "2024-01-01T12:00:00"
}

Features in Detail

Rate Limiting

  • Configurable request delays
  • Per-minute request limits
  • Automatic backoff on rate limit hits

Data Collection

  • Real-time message monitoring
  • Member list scanning
  • Profile updates tracking
  • Server membership tracking

Database Management

  • Automatic backups
  • Data deduplication
  • Export capabilities
  • Statistics generation

Logging

  • Configurable log levels
  • File rotation
  • Separate Discord.py logging

Future Enhancements

  • MongoDB integration for better scalability
  • Web dashboard for data visualization
  • Advanced search and filtering
  • Data analysis tools
  • Network analysis features

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is for educational purposes only. Use responsibly and in compliance with applicable laws and terms of service.

Support

For issues or questions, please create an issue in the repository.


Remember: This tool is for educational research only. Always respect user privacy and platform terms of service.