pythondata-miningfrequencydataformatfpgrowth

create a dictionary from .txt file with each line as values and serial num as key


i have a dataset which is a .txt file and each line has items separated by spaces. each line is a different transaction.

the dataset looks like this:

data.txt file

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
20 12 5 41 65
41 6 11 27 81 21
65 15 27 8 31 65 20 19 44 29 41

i created a dictionary with keys as serial num. starting from 0 and each line values seperated by commas as values like this

{0: '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15', 1:'20,12,5,41,65', 2:'41,6,11,27,81,21', 3: '65,15,27,8,31,65,20,19,44,29,41'} 

but i am not able to iterate through each value in dict , is there any way i can convert it into a list of values for each key

i want to find the frequency of each time in the whole dictionary and create a table

item frequency
1 1
2 1
20 2
41 3

like the above

my_dict = {}

with open('text.csv', 'r') as file:
    lines = file.readlines()
    for line in lines:
        my_dict[lines.index(line)] = line.strip()

this is the code i used to create the dictionary but i am not sure what i should change, also i need to find frequency of each value.

Any help would be appreciated. thank u.


Solution

  • Since you're really just counting numbers over the entire file, you can just:

    my_dict = {}
    
    with open('data.txt', 'r') as file:
        for number in file.read().split():
            my_dict[number] = my_dict.get(number, 0) + 1
    
    print(my_dict)
    

    Result:

    {'1': 1, '2': 1, '3': 1, '4': 1, '5': 2, '6': 2, '7': 1, '8': 2, '9': 1, '10': 1, '11': 2, '12': 2, '13': 1, '14': 1, '15': 2, '20': 2, '41': 3, '65': 3, '27': 2, '81': 1, '21': 1, '31': 1, '19': 1, '44': 1, '29': 1}
    

    That just counts the strings representing numbers, you can turn them into actual numbers:

    with open('data.txt', 'r') as file:
        for number in file.read().split():
            my_dict[int(number)] = my_dict.get(int(number), 0) + 1
    

    Result:

    {1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2, 7: 1, 8: 2, 9: 1, 10: 1, 11: 2, 12: 2, 13: 1, 14: 1, 15: 2, 20: 2, 41: 3, 65: 3, 27: 2, 81: 1, 21: 1, 31: 1, 19: 1, 44: 1, 29: 1}
    

    Or:

            my_dict[i] = my_dict.get(i := int(number), 0) + 1