[SOLVED] RlLib PPO training doesn't use GPU

RlLib PPO training doesn't use GPU

I use PPO algorithm in Rllib to train my deep reinforcement learning model. Training is on an AWS p2.xlarge instance which has 4 vCPU and 1 GPU(Tesla K80). And I found PPO doesn't use the GPU.

The training log shows:

Trial status: 1 RUNNING
Current time: 2023-10-07 05:08:00. Total running time: 13s
Logical resource usage: 3.0/4 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭────────────────────────────────────────╮
│ Trial name                    status   │
├────────────────────────────────────────┤
│ PPO_CartPole-v1_74ca6_00000   RUNNING  │
╰────────────────────────────────────────╯

This is my code:

from ray import tune
from ray.rllib.algorithms.ppo import PPO
from ray.rllib.algorithms.a2c import A2C  
from ray.rllib.algorithms.appo import APPO


def train() -> None:
    # config training parameters
    train_config = {
        "env": "CartPole-v1", # MyCustomEnv_v0,
        "framework": "torch",
        "num_workers": 2,
        "num_gpus": 1,  # Add this line to specify using one GPU
        "num_envs_per_worker": 1,
        "model": {
            "fcnet_hiddens": [512, 512, 256],
            "fcnet_activation": "relu",
        },
        "lr": 3e-4,  
        "optimization": {
            "optimizer": "adam",
            "adam_epsilon": 1e-8,
            "adam_beta1": 0.9,
            "adam_beta2": 0.999,
        },  
        "gamma": 0.99,
        "num_sgd_iter": 10,  
        "sgd_minibatch_size": 500, 
        "rollout_fragment_length": 500,
        "train_batch_size": 4000,
        "prioritized_replay": True,
        "prioritized_replay_alpha": 0.6,
        "prioritized_replay_beta": 0.4, 
        "buffer_size": 500000,
        "stop": {"episodes_total": 5000000},
        "exploration_config": {},
    }
    stop_criteria = {"training_iteration": training_iteration}

    # start to train
    try:
        results = tune.run(
            PPO, # PPO,
            config=train_config,
            stop=stop_criteria,
            checkpoint_at_end=True,
            checkpoint_freq=50, 
            restore=model_restore_dir,
            verbose=1,
        )
    except BaseException as e:
        print(f"training error: {e}")
    
if __name__ == "__main__":
    train()

Firstly, I trained for my custom environment "MyCustomEnv_v0", PPO doesn't use GPU. And I tried "CartPole-v1", it still doesn't use GPU. After I changed algorithm from PPO to APPO, it starts to use GPU, A2C is ok too(I changed nothing else), Like this:

Current time: 2023-10-07 05:07:01. Total running time: 0s
Logical resource usage: 3.0/4 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭─────────────────────────────────────────╮
│ Trial name                     status   │
├─────────────────────────────────────────┤
│ APPO_CartPole-v1_59115_00000   PENDING  │
╰─────────────────────────────────────────╯

I checked Rllib official document, and confirm PPO support GPU training.

Why does this happen? How to use GPU in RlLib PPO training?

Solution

This has been solved. Becuase I've only one GPU on my computer and I use 2 workers, when I added a parameter config: "num_gpus_per_worker": 0.5. GPU is used in training.