Troubleshooting
Troubleshooting
Section titled “Troubleshooting”Solutions for common issues when setting up and running GPUFlow providers.
Container Issues
Section titled “Container Issues”Container won’t start
Section titled “Container won’t start”Symptoms: Container exits immediately or fails to start
Solutions:
Check container logs first:
# Dockerdocker logs gpuflow-provider
# Podmanpodman logs gpuflow-provider
Common causes and fixes:
Missing API key:
# Set the API key environment variabledocker run -d \ --name gpuflow-provider \ -e GPUFLOW_API_KEY="your-actual-key-here" \ [other options...]
Permission errors:
# Fix data directory permissionssudo chown -R $USER:$USER /opt/gpuflowchmod 755 /opt/gpuflow
Port conflicts:
# Check what's using port 8080sudo lsof -i :8080sudo systemctl stop [conflicting-service]
GPU not detected in container
Section titled “GPU not detected in container”Symptoms: Container starts but can’t see GPU
For NVIDIA GPUs:
Check GPU is visible on host:
nvidia-smi
Test GPU in container:
# Dockerdocker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi
# Podmanpodman run --rm --device nvidia.com/gpu=all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi
If GPU test fails:
Regenerate CDI specification (Podman):
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml --forcepodman restart gpuflow-provider
Restart Docker daemon (Docker):
sudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart dockerdocker restart gpuflow-provider
For AMD GPUs:
Verify ROCm installation:
rocm-smi
Check device permissions:
ls -la /dev/dri/# Your user should be in 'render' and 'video' groupsgroups $USER
Add missing groups:
sudo usermod -aG render,video $USER# Log out and back in
Network Connectivity Issues
Section titled “Network Connectivity Issues”Provider can’t reach GPUFlow API
Section titled “Provider can’t reach GPUFlow API”Symptoms: Logs show connection errors or timeouts
Check network connectivity:
# Test basic connectivitycurl -I https://api.gpuflow.app/health
# Check DNS resolutionnslookup api.gpuflow.app
# Test from within containerdocker exec gpuflow-provider curl -I https://api.gpuflow.app/health
Firewall configuration:
# Allow outbound HTTPSsudo ufw allow out 443sudo ufw allow out 80
# For enterprise firewalls, whitelist:# api.gpuflow.app (port 443)# dashboard.gpuflow.app (port 443)
Hardware not appearing in dashboard
Section titled “Hardware not appearing in dashboard”Symptoms: Provider runs but device doesn’t show in “Available to Claim”
Verify provider registration:
# Check logs for registration successdocker logs gpuflow-provider | grep -i registrationpodman logs gpuflow-provider | grep -i registration
Check API key:
# Verify API key is set correctlydocker inspect gpuflow-provider | grep GPUFLOW_API_KEYpodman inspect gpuflow-provider | grep GPUFLOW_API_KEY
Force re-registration:
# Stop and remove containerdocker stop gpuflow-providerdocker rm gpuflow-provider
# Clear data directoryrm -rf /opt/gpuflow/*
# Redeploy with fresh registration# [Re-run your docker/podman run command]
Performance Issues
Section titled “Performance Issues”High CPU usage
Section titled “High CPU usage”Symptoms: System becomes slow, high CPU usage from container
Check resource limits:
# Monitor container resourcesdocker stats gpuflow-providerpodman stats gpuflow-provider
Set resource limits:
# Add limits to container run commanddocker run -d \ --cpus="2.0" \ --memory="4g" \ [other options...]
GPU overheating
Section titled “GPU overheating”Symptoms: GPU temperatures over 85°C, thermal throttling
Monitor temperatures:
# NVIDIAwatch -n 1 nvidia-smi
# AMDwatch -n 1 rocm-smi
Solutions:
- Check case airflow and clean dust from fans
- Reduce GPU power limit if necessary
- Pause listings during high ambient temperatures
- Consider undervolting GPU for better efficiency
Slow network performance
Section titled “Slow network performance”Symptoms: Renters report slow file transfers or high latency
Test network speed:
# Install speedtestsudo apt install speedtest-cli # Ubuntu/Debiansudo dnf install speedtest-cli # Fedora
# Run speed testspeedtest-cli
Optimize network:
- Use wired ethernet instead of WiFi
- Check for ISP throttling during peak hours
- Consider business internet for better uptime
- Monitor bandwidth usage in dashboard
Payment and Account Issues
Section titled “Payment and Account Issues”Payments not being processed
Section titled “Payments not being processed”Symptoms: Earnings not arriving, payment status stuck
Check payment settings:
- Verify wallet address is correct in dashboard
- Ensure selected network has sufficient gas for transactions
- Check minimum payout threshold ($25 equivalent)
Network-specific troubleshooting:
Polygon: Usually processes within 5 minutes
Ethereum: May take 15-30 minutes during network congestion
Arbitrum/Base: Typically fast but check for network maintenance
Account verification issues
Section titled “Account verification issues”Common KYC problems:
Document upload fails:
- Use high-resolution photos
- Ensure all document corners are visible
- Avoid glare or shadows
- Supported formats: JPG, PNG, PDF
Address verification rejected:
- Document must be less than 3 months old
- Name must match account registration
- Utility bills, bank statements, or government mail accepted
Hardware-Specific Issues
Section titled “Hardware-Specific Issues”NVIDIA driver problems
Section titled “NVIDIA driver problems”Driver version conflicts:
# Check current driver versionnvidia-smi
# Remove old drivers (if needed)sudo apt purge nvidia-*sudo apt autoremove
# Install latest stable driversudo apt install nvidia-driver-535sudo reboot
CUDA version mismatches:
# Check CUDA versionnvcc --version
# Update CUDA toolkit if neededwget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.debsudo dpkg -i cuda-keyring_1.0-1_all.debsudo apt updatesudo apt install cuda
AMD ROCm issues
Section titled “AMD ROCm issues”ROCm not detecting GPU:
# Check GPU compatibility/opt/rocm/bin/rocminfo
# Reinstall ROCm if neededsudo apt remove rocm-smi-libsudo apt install rocm-smi-lib
Permission errors:
# Fix device permissionssudo chmod 666 /dev/kfdsudo chmod 666 /dev/dri/*
Getting Additional Help
Section titled “Getting Additional Help”Collect diagnostic information
Section titled “Collect diagnostic information”Before contacting support, gather this information:
# System informationuname -alspci | grep -i gpufree -hdf -h
# Container informationdocker --version # or podman --versiondocker ps -adocker logs gpuflow-provider --tail=50
# GPU informationnvidia-smi # or rocm-smi for AMD
Contact support channels
Section titled “Contact support channels”Community support:
- GPUFlow Discord - Real-time community help
- Community forum - Searchable Q&A
Official support:
- Email: [email protected]
- Response time: 24-48 hours
- Include diagnostic information above
Emergency issues:
- Provider completely offline for 4+ hours
- Suspected security compromise
- Payment processing errors