I use PPO algorithm in Rllib to train my deep reinforcement learning model. Training is on an AWS p2.xlarge instance which has 4 vCPU and 1 GPU(Tesla K80). And I found PPO doesn't use the GPU.
The training log shows:
Trial status: 1 RUNNING
Current time: 2023-10-07 05:08:00. Total running time: 13s
Logical resource usage: 3.0/4 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭────────────────────────────────────────╮
│ Trial name status │
├────────────────────────────────────────┤
│ PPO_CartPole-v1_74ca6_00000 RUNNING │
╰────────────────────────────────────────╯
This is my code:
from ray import tune
from ray.rllib.algorithms.ppo import PPO
from ray.rllib.algorithms.a2c import A2C
from ray.rllib.algorithms.appo import APPO
def train() -> None:
# config training parameters
train_config = {
"env": "CartPole-v1", # MyCustomEnv_v0,
"framework": "torch",
"num_workers": 2,
"num_gpus": 1, # Add this line to specify using one GPU
"num_envs_per_worker": 1,
"model": {
"fcnet_hiddens": [512, 512, 256],
"fcnet_activation": "relu",
},
"lr": 3e-4,
"optimization": {
"optimizer": "adam",
"adam_epsilon": 1e-8,
"adam_beta1": 0.9,
"adam_beta2": 0.999,
},
"gamma": 0.99,
"num_sgd_iter": 10,
"sgd_minibatch_size": 500,
"rollout_fragment_length": 500,
"train_batch_size": 4000,
"prioritized_replay": True,
"prioritized_replay_alpha": 0.6,
"prioritized_replay_beta": 0.4,
"buffer_size": 500000,
"stop": {"episodes_total": 5000000},
"exploration_config": {},
}
stop_criteria = {"training_iteration": training_iteration}
# start to train
try:
results = tune.run(
PPO, # PPO,
config=train_config,
stop=stop_criteria,
checkpoint_at_end=True,
checkpoint_freq=50,
restore=model_restore_dir,
verbose=1,
)
except BaseException as e:
print(f"training error: {e}")
if __name__ == "__main__":
train()
Firstly, I trained for my custom environment "MyCustomEnv_v0", PPO doesn't use GPU. And I tried "CartPole-v1", it still doesn't use GPU. After I changed algorithm from PPO to APPO, it starts to use GPU, A2C is ok too(I changed nothing else), Like this:
Current time: 2023-10-07 05:07:01. Total running time: 0s
Logical resource usage: 3.0/4 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:K80)
╭─────────────────────────────────────────╮
│ Trial name status │
├─────────────────────────────────────────┤
│ APPO_CartPole-v1_59115_00000 PENDING │
╰─────────────────────────────────────────╯
I checked Rllib official document, and confirm PPO support GPU training.
Why does this happen? How to use GPU in RlLib PPO training?
This has been solved. Becuase I've only one GPU on my computer and I use 2 workers, when I added a parameter config: "num_gpus_per_worker": 0.5. GPU is used in training.