deep-learningpytorchgpurllib

RlLib PPO training doesn't use GPU


I use PPO algorithm in Rllib to train my deep reinforcement learning model. Training is on an AWS p2.xlarge instance which has 4 vCPU and 1 GPU(Tesla K80). And I found PPO doesn't use the GPU.

The training log shows:

Trial status: 1 RUNNING
Current time: 2023-10-07 05:08:00. Total running time: 13s
Logical resource usage: 3.0/4 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭────────────────────────────────────────╮
│ Trial name                    status   │
├────────────────────────────────────────┤
│ PPO_CartPole-v1_74ca6_00000   RUNNING  │
╰────────────────────────────────────────╯

This is my code:

from ray import tune
from ray.rllib.algorithms.ppo import PPO
from ray.rllib.algorithms.a2c import A2C  
from ray.rllib.algorithms.appo import APPO


def train() -> None:
    # config training parameters
    train_config = {
        "env": "CartPole-v1", # MyCustomEnv_v0,
        "framework": "torch",
        "num_workers": 2,
        "num_gpus": 1,  # Add this line to specify using one GPU
        "num_envs_per_worker": 1,
        "model": {
            "fcnet_hiddens": [512, 512, 256],
            "fcnet_activation": "relu",
        },
        "lr": 3e-4,  
        "optimization": {
            "optimizer": "adam",
            "adam_epsilon": 1e-8,
            "adam_beta1": 0.9,
            "adam_beta2": 0.999,
        },  
        "gamma": 0.99,
        "num_sgd_iter": 10,  
        "sgd_minibatch_size": 500, 
        "rollout_fragment_length": 500,
        "train_batch_size": 4000,
        "prioritized_replay": True,
        "prioritized_replay_alpha": 0.6,
        "prioritized_replay_beta": 0.4, 
        "buffer_size": 500000,
        "stop": {"episodes_total": 5000000},
        "exploration_config": {},
    }
    stop_criteria = {"training_iteration": training_iteration}

    # start to train
    try:
        results = tune.run(
            PPO, # PPO,
            config=train_config,
            stop=stop_criteria,
            checkpoint_at_end=True,
            checkpoint_freq=50, 
            restore=model_restore_dir,
            verbose=1,
        )
    except BaseException as e:
        print(f"training error: {e}")
    
if __name__ == "__main__":
    train()

Firstly, I trained for my custom environment "MyCustomEnv_v0", PPO doesn't use GPU. And I tried "CartPole-v1", it still doesn't use GPU. After I changed algorithm from PPO to APPO, it starts to use GPU, A2C is ok too(I changed nothing else), Like this:

Current time: 2023-10-07 05:07:01. Total running time: 0s
Logical resource usage: 3.0/4 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭─────────────────────────────────────────╮
│ Trial name                     status   │
├─────────────────────────────────────────┤
│ APPO_CartPole-v1_59115_00000   PENDING  │
╰─────────────────────────────────────────╯

I checked Rllib official document, and confirm PPO support GPU training.

Why does this happen? How to use GPU in RlLib PPO training?


Solution

  • This has been solved. Becuase I've only one GPU on my computer and I use 2 workers, when I added a parameter config: "num_gpus_per_worker": 0.5. GPU is used in training.