I want to find the number of times each number appears at each index position in a list of 6 number sets when I do not know what the numbers will be, but they will range from 0-99 only.
Example list:
data = [['22', '45', '6', '72', '1', '65'], ['2', '65', '67', '23', '98', '1'], ['13', '45', '98', '4', '12', '65']]
Eventually I will be putting the resulting counts into a pandas DataFrame to look something like this:
num numofoccurances position numoftimesinposition
01 02 04 01
01 02 05 01
02 01 00 01
04 02 03 01
06 01 02 01
12 01 04 01
13 01 00 01
and so on...
The resulting data will be a little different due to the num repeating for each time it appears in a different index position, but hopefully this helps you understand what I'm looking for.
So far, this is what I've started:
data = json.load(f)
numbers = []
contains = []
'''
This section is simply taking the data from the json file and putting it all into a list of lists containing the 6 elements I need in each list
'''
for i in data['data']:
item = [i[9], i[10]]
# print(item)
item = [words for segments in item for words in segments.split()]
numbers.append(item)
'''
This is my attempt to count to number of occurrences for each number in the range then add it to a list.
'''
x = range(1,99)
for i in numbers:
if x in i and not contains:
contains.append(x)
import pandas as pd
num_pos = [(num,pos) for i in data for pos,num in enumerate(i)]
df = pd.DataFrame(num_pos,columns = ['number','position']).assign(numoftimesinposition = 1)
df = df.astype(int).groupby(['number','position']).count().reset_index()
df1 = df.groupby('number').numoftimesinposition.sum().reset_index().\
rename(columns = {'numoftimesinposition':'numofoccurences'}).\
merge(df, on='number')
print(df1)
number numofoccurences position numoftimesinposition
0 1 2 4 1
1 1 2 5 1
4 2 1 0 1
7 4 1 3 1
9 6 1 2 1
2 12 1 4 1
3 13 1 0 1
5 22 1 0 1
6 23 1 3 1
8 45 2 1 2
10 65 3 1 1
11 65 3 5 2
12 67 1 2 1
13 72 1 3 1
14 98 2 2 1
15 98 2 4 1
if the code above feels slow, then use Counter
from collections
:
import pandas as pd
from collections import Counter
num_pos = [(int(num),pos) for i in data for pos,num in enumerate(i)]
count_data = [(num,pos,occurence) for (num,pos), occurence in Counter(num_pos).items()]
df = pd.DataFrame(count_data, columns = ['num','pos','occurence']).sort_values(by='num')
df['total_occurence'] = [Counter(df.num).get(num) for num in df.num]
print(df)