When I run the code and output it, I notice that the messages for the third item in the list are output three times in a row. With the previous and subsequent elements from the list it works problem los. Can anyone help me with this, or does anyone know how to at least remove such duplicates?
Nachrichten = []
for row in googlenews.results():
table_new.append({
'City': ort,
'Title': row['title'],
'URL':row['link'],
'Source': row['site'], })
df = pd.DataFrame(table_new)
dfges = pd.concat(nachrichten, axis='index')
´´´
Your code included some issues regarding lower and upper case e. g. nachrichten
vs. Nachrichten
. Python is case-sensitive though.
To answer your question, you could use drop_duplicates()
to eliminate duplicates based on 'Title'
.
This yields:
dfges['Title'].value_counts().max()
>>> 1
Extended code:
import pandas as pd
from GoogleNews import GoogleNews
googlenews = GoogleNews()
googlenews.set_encode('utf_8')
googlenews.set_lang('en')
googlenews.set_period('7d')
orte = ["Munich", "New York", "Madrid", "London", "Los Angeles", "Frankfurt", "Rom"]
nachrichten = []
for ort in orte:
googlenews.clear()
googlenews.get_news(ort)
table_new = []
for row in googlenews.results():
table_new.append({
'City': ort,
'Title': row['title'],
'Date': row['date'],
'URL':row['link'],
'Source': row['site'], })
df = pd.DataFrame(table_new)
nachrichten.append(df)
dfges = pd.concat(nachrichten, axis='index')
dfges.drop_duplicates(subset=['Title'], keep='last', inplace=True)
print(dfges)