pythonpandasdataframegoogle-news

Python: Dataframe loop repeats by only one element


When I run the code and output it, I notice that the messages for the third item in the list are output three times in a row. With the previous and subsequent elements from the list it works problem los. Can anyone help me with this, or does anyone know how to at least remove such duplicates?

Nachrichten = []
    

    
    for row in googlenews.results(): 
        table_new.append({ 
            'City': ort, 
            'Title': row['title'],  
            'URL':row['link'], 
            'Source': row['site'], }) 
    
        df = pd.DataFrame(table_new) 

dfges = pd.concat(nachrichten, axis='index')
´´´

Solution

  • Your code included some issues regarding lower and upper case e. g. nachrichten vs. Nachrichten. Python is case-sensitive though.

    To answer your question, you could use drop_duplicates() to eliminate duplicates based on 'Title'.

    This yields:

    dfges['Title'].value_counts().max()
    >>> 1
    

    Extended code:

    import pandas as pd 
    from GoogleNews import GoogleNews 
        
    googlenews = GoogleNews() 
    googlenews.set_encode('utf_8') 
    googlenews.set_lang('en') 
    googlenews.set_period('7d')
        
    orte = ["Munich", "New York", "Madrid", "London", "Los Angeles", "Frankfurt", "Rom"] 
    nachrichten = []
        
    for ort in orte: 
        googlenews.clear() 
        googlenews.get_news(ort) 
        table_new = [] 
        
        for row in googlenews.results(): 
            table_new.append({ 
                'City': ort, 
                'Title': row['title'], 
                'Date': row['date'], 
                'URL':row['link'], 
                'Source': row['site'], }) 
        
            df = pd.DataFrame(table_new) 
            
        nachrichten.append(df)
    
    dfges = pd.concat(nachrichten, axis='index')
    dfges.drop_duplicates(subset=['Title'], keep='last', inplace=True)
    print(dfges)