Tool doesn't work on big data set - Single positional indexer is out-of-bounds

I am breaking my head for a day already about the following:

I have built a tool to iterate over two df to find duplicated values and sum points if they are duplicated:

It looks like this:

df1 = pd.DataFrame(dict1.items())
df2 = pd.DataFrame(dict2.items())



a = 0
while a != len(df2):

    value_to_compare = df2.iloc[a, 0]

    b = 0

    for row in range(len(df1)):
        if value_to_compare == df1.iloc[b, 0]:
            df1.iloc[b, 1] = df1.iloc[b, 1] + df2.iloc[b, 1]
            b = b + 1

        else:
            b = b + 1

    if b == len(df1):
        df1 = df1.append(df2.iloc[a, :], ignore_index=True)
        a = a + 1



df1 = df1.drop_duplicates(subset=[0], keep='first', ignore_index=True)
print('\n\n',df1)

It works perfect on a data set from 2 dicts like these:

dict1 = {'A': 1, 'B': 1, 'C': 1, 'D': 1}

dict2 = {'a': 1, 'b': 1, 'c': 1, 'd': 1}

But as soon as I apply it to the main program where there is 2 df with a couple of hundred rows (here an example):

              word  occurance
0            labor          4
1      predictions          2
2              nfl          2
3             kids          2
4           africa          2
5         pandemic          2
6             kara          2
7             days          2
8          swisher          2
9            event          2
10             day          2
11        football          2
12          office          2
13              us          2
14        politics          2

and make dicts of them using:

keys1 = words_total['word'].tolist()
values1 = words_total['occurance'].tolist()
dict1 = dict(zip(keys1, values1))
keys2 = words_date['word'].tolist()
values2 = words_date['occurance'].tolist()
dict2 = dict(zip(keys2, values2))

I get the following error:

Traceback (most recent call last):
  File "/Users/Programowanie/PycharmProjects/pythonProject3/main.py", line 120, in <module>
    df1.iloc[b, 1] = df1.iloc[b, 1] + df2.iloc[b, 1]
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 925, in __getitem__
    return self._getitem_tuple(key)
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1506, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 754, in _has_valid_tuple
    self._validate_key(k, i)
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1409, in _validate_key
    self._validate_integer(key, axis)
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

Do you have an idea why this is happening? Thank you in advance :)

Solution

Without seeing the 2 dataframes, my assumption is that the 2 dataframes don't contain the same number of rows, which means when you call and index/row with .iloc it's going to be a value that is out of bounds. For example, if I have a dataframe of 10 rows, I can't go and call the row at index value 15.

Wouldn't it be easier to combine the 2 dataframes, then groupby the 'word' column and sum the occurance value for those?

import pandas as pd

data1 = {'word':['labor','predictions','nfl','kids','africa','pandemic','kara','days',
         'swisher','event','day','football','office','us','politics'],
 'occurance':[4,2,2,2,2,2,2,2,2,2,2,2,2,2,2]}

data2 = {'word':['labor','predictions','nfl','kids','africa','pandemic','kara','days',
         'swisher','event','day','us','politics'],
 'occurance':[1,2,8,2,2,2,1,2,2,7,2,4,5]}

    
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Combine the 2 dataframes
combined_df = pd.concat([df1, df2])


# Groupby the word column and sum the occurance column
occurances = combined_df.groupby('word').agg({"occurance": "sum"}).reset_index()

Output:

print(occurances)
           word  occurance
0        africa          4
1           day          4
2          days          4
3         event          9
4      football          2
5          kara          3
6          kids          4
7         labor          5
8           nfl         10
9        office          2
10     pandemic          4
11     politics          7
12  predictions          4
13      swisher          4
14           us          6