I know how to do it for a single list in a cell but I need to keep the structure of multiple list of lists as in [["I","need","to","remove","punctuations","."],[...],[...]]
-> [["I","need","to","remove","punctuations"],[...],[...]]
All methods I know turn into this -> ["I","need","to","remove","punctuations",...]
data["clean_text"] = data["clean_text"].apply(lambda x: [', '.join([c for c in s if c not in string.punctuation]) for s in x])
data["clean_text"] = data["clean_text"].str.replace(r'[^\w\s]+', '')
...
What's the best way to do that?
Following your approach, I would just add a listcomp with a helper function :
import string
def clean_up(lst):
return [[w for w in sublist if w not in string.punctuation] for sublist in lst]
data["clean_text"] = [clean_up(x) for x in data["text"]]
ā Output :
print(data) # -- with two different columns so we can see the difference
text \
0 [[I, need, to, remove, punctuations, .], [This, is, another, list, with, commas, ,, and, periods, .]]
clean_text
0 [[I, need, to, remove, punctuations], [This, is, another, list, with, commas, and, periods]]