Skip to content

Troubleshooting

Solutions for common issues when setting up and running GPUFlow providers.

Symptoms: Container exits immediately or fails to start

Solutions:

Check container logs first:

Terminal window
# Docker
docker logs gpuflow-provider
# Podman
podman logs gpuflow-provider

Common causes and fixes:

Missing API key:

Terminal window
# Set the API key environment variable
docker run -d \
--name gpuflow-provider \
-e GPUFLOW_API_KEY="your-actual-key-here" \
[other options...]

Permission errors:

Terminal window
# Fix data directory permissions
sudo chown -R $USER:$USER /opt/gpuflow
chmod 755 /opt/gpuflow

Port conflicts:

Terminal window
# Check what's using port 8080
sudo lsof -i :8080
sudo systemctl stop [conflicting-service]

Symptoms: Container starts but can’t see GPU

For NVIDIA GPUs:

Check GPU is visible on host:

Terminal window
nvidia-smi

Test GPU in container:

Terminal window
# Docker
docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi
# Podman
podman run --rm --device nvidia.com/gpu=all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi

If GPU test fails:

Regenerate CDI specification (Podman):

Terminal window
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml --force
podman restart gpuflow-provider

Restart Docker daemon (Docker):

Terminal window
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
docker restart gpuflow-provider

For AMD GPUs:

Verify ROCm installation:

Terminal window
rocm-smi

Check device permissions:

Terminal window
ls -la /dev/dri/
# Your user should be in 'render' and 'video' groups
groups $USER

Add missing groups:

Terminal window
sudo usermod -aG render,video $USER
# Log out and back in

Symptoms: Logs show connection errors or timeouts

Check network connectivity:

Terminal window
# Test basic connectivity
curl -I https://api.gpuflow.app/health
# Check DNS resolution
nslookup api.gpuflow.app
# Test from within container
docker exec gpuflow-provider curl -I https://api.gpuflow.app/health

Firewall configuration:

Terminal window
# Allow outbound HTTPS
sudo ufw allow out 443
sudo ufw allow out 80
# For enterprise firewalls, whitelist:
# api.gpuflow.app (port 443)
# dashboard.gpuflow.app (port 443)

Symptoms: Provider runs but device doesn’t show in “Available to Claim”

Verify provider registration:

Terminal window
# Check logs for registration success
docker logs gpuflow-provider | grep -i registration
podman logs gpuflow-provider | grep -i registration

Check API key:

Terminal window
# Verify API key is set correctly
docker inspect gpuflow-provider | grep GPUFLOW_API_KEY
podman inspect gpuflow-provider | grep GPUFLOW_API_KEY

Force re-registration:

Terminal window
# Stop and remove container
docker stop gpuflow-provider
docker rm gpuflow-provider
# Clear data directory
rm -rf /opt/gpuflow/*
# Redeploy with fresh registration
# [Re-run your docker/podman run command]

Symptoms: System becomes slow, high CPU usage from container

Check resource limits:

Terminal window
# Monitor container resources
docker stats gpuflow-provider
podman stats gpuflow-provider

Set resource limits:

Terminal window
# Add limits to container run command
docker run -d \
--cpus="2.0" \
--memory="4g" \
[other options...]

Symptoms: GPU temperatures over 85°C, thermal throttling

Monitor temperatures:

Terminal window
# NVIDIA
watch -n 1 nvidia-smi
# AMD
watch -n 1 rocm-smi

Solutions:

  • Check case airflow and clean dust from fans
  • Reduce GPU power limit if necessary
  • Pause listings during high ambient temperatures
  • Consider undervolting GPU for better efficiency

Symptoms: Renters report slow file transfers or high latency

Test network speed:

Terminal window
# Install speedtest
sudo apt install speedtest-cli # Ubuntu/Debian
sudo dnf install speedtest-cli # Fedora
# Run speed test
speedtest-cli

Optimize network:

  • Use wired ethernet instead of WiFi
  • Check for ISP throttling during peak hours
  • Consider business internet for better uptime
  • Monitor bandwidth usage in dashboard

Symptoms: Earnings not arriving, payment status stuck

Check payment settings:

  1. Verify wallet address is correct in dashboard
  2. Ensure selected network has sufficient gas for transactions
  3. Check minimum payout threshold ($25 equivalent)

Network-specific troubleshooting:

Polygon: Usually processes within 5 minutes Ethereum: May take 15-30 minutes during network congestion
Arbitrum/Base: Typically fast but check for network maintenance

Common KYC problems:

Document upload fails:

  • Use high-resolution photos
  • Ensure all document corners are visible
  • Avoid glare or shadows
  • Supported formats: JPG, PNG, PDF

Address verification rejected:

  • Document must be less than 3 months old
  • Name must match account registration
  • Utility bills, bank statements, or government mail accepted

Driver version conflicts:

Terminal window
# Check current driver version
nvidia-smi
# Remove old drivers (if needed)
sudo apt purge nvidia-*
sudo apt autoremove
# Install latest stable driver
sudo apt install nvidia-driver-535
sudo reboot

CUDA version mismatches:

Terminal window
# Check CUDA version
nvcc --version
# Update CUDA toolkit if needed
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install cuda

ROCm not detecting GPU:

Terminal window
# Check GPU compatibility
/opt/rocm/bin/rocminfo
# Reinstall ROCm if needed
sudo apt remove rocm-smi-lib
sudo apt install rocm-smi-lib

Permission errors:

Terminal window
# Fix device permissions
sudo chmod 666 /dev/kfd
sudo chmod 666 /dev/dri/*

Before contacting support, gather this information:

Terminal window
# System information
uname -a
lspci | grep -i gpu
free -h
df -h
# Container information
docker --version # or podman --version
docker ps -a
docker logs gpuflow-provider --tail=50
# GPU information
nvidia-smi # or rocm-smi for AMD

Community support:

Official support:

  • Email: [email protected]
  • Response time: 24-48 hours
  • Include diagnostic information above

Emergency issues:

  • Provider completely offline for 4+ hours
  • Suspected security compromise
  • Payment processing errors