pythonpandasfunction

Several outs instead of one while applying a function to df


I'm running a script in Jupyter which is expected to show a progressbar while applying a function to a df. In 'out' I see several bars instead of expected one. I tried to clear 'out' with import sys sys.stdout.flush() but it increases time significantly.

When I create a df with less rows, say 100 - there is only one bar. When I increase the number of rows, the bars appear more times.

What is the problem, please?
out screenshot

import pandas as pd
import math

iterator_for_progressbar = 1

def progressBar(current, total, barLength = 20):
    percent = math.ceil(float(current) * 100 / total)
    arrow   = '■' * int(percent/100 * barLength)
    spaces  = '□' * (barLength - len(arrow))
    print('Calculating: %s%s %d %%' % (arrow, spaces, percent), end='\r')

def myf(row):
    global iterator_for_progressbar
    progressBar(iterator_for_progressbar, len(df), barLength = 20)   
    iterator_for_progressbar += 1

    row['1'] = 100

df = pd.DataFrame(index = range(0, 5000), columns = ['1','2','3','4','5'] )

df.apply(myf, axis=1)[enter image description here]

Solution

  • You could use the tqdm library for the progress bar instead of creating your own.

    def myf(row):
        row['1'] = 100
        return row
    
    tqdm.pandas(desc="Calculating")
    df = df.progress_apply(myf, axis=1)