[SOLVED] How to Process Data on GPU Instead of RAM for This Python Code?

How to Process Data on GPU Instead of RAM for This Python Code?

I'm currently using the following code to process audio data, but it runs on the RAM. I want to offload the processing to the GPU to improve performance. my code :

def prepare_dataset(batch):
    audio = batch["audio"]
    batch["input_features"] = feature_extractor(
        audio["array"], 
        sampling_rate=audio["sampling_rate"]
    ).input_features[0]
    batch["labels"] = tokenizer(batch["sentence"]).input_ids
    return batch

common_voice = common_voice.map(
    prepare_dataset, 
    remove_columns=common_voice.column_names["train"], 
    num_proc=1
)

How can I modify this code to utilize the GPU for processing instead of the RAM? Any guidance or specific changes are much appreciated!

Solution

you can using the following code to process audio data on GPU

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

def prepare_dataset(batch):
    audio = batch["audio"]

    input_features = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]
    batch["input_features"] = torch.tensor(input_features).to(device)

    labels = tokenizer(batch["sentence"]).input_ids
    batch["labels"] = torch.tensor(labels).to(device)
    return batch

common_voice = common_voice.map(prepare_dataset, remove_columns=common_voice.column_names["train"])