I'm trying to lemmatize german texts which are in a dataframe.
I use german
library to succesfully handle with specific grammatic structure: https://github.com/jfilter/german-preprocessing
My code:
from german import preprocess
df = pd.read_csv('Afd.csv', sep=',')
Lemma = open('MessageAFD_lemma.txt', 'w')
for i in df['message']:
preprocess (i, remove_stop=True)
Lemma.write(i)
Lemma.close()
The process of lemmatization goes successfully, there's no any error in the terminal, but openning the file "MessageAFD_lemma.txt", I get this : (nothing was lemmatized)
The expected result is like:
Input:
preprocess(['Johpannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
Output:
['johannes gut schüler', 'julia trinken tee']
What goes wrong?
The preprocess
function returns a copy of the texts, instead of modifying the input. So you need to write the result of preprocess
to the file, not the original i
messages.
Furthermore, preprocess
accepts a list of texts to process, so you must wrap your message in [message]
, and extract the single result from the returned list with result, = ...
from german import preprocess
df = pd.read_csv('Afd.csv', sep=',')
Lemma = open('MessageAFD_lemma.txt', 'w')
for message in df['message']:
result, = preprocess([message], remove_stop=True)
Lemma.write(result)
Lemma.close()
# Or, to process all messages in one go:
with open('MessageAFD_lemma.txt', 'w') as f:
for result in preprocess(df['message'], remove_stop=True):
f.write(result)