pythonpandascsvtxt

How to cleanup some content from the text file


I have the following data in a CSV.

"ID","OTHER_FIELDS_2"
"87","25 R160  22  13  E"
"87","25 R165  22  08  E"
"77",""
"18","20 BA-10  12  06  2  30  S"
"18","20 BA-20  12  06  2  30  S"
"88","20 TH-42  02  02  5  30  MT"
"66","20 AD-38  12  06  B"
"66","20 AD-38  30  07  B"
"70","20 OL-45  19  11  B"
"70","20 EM-45  19  08  B"

After running my Python code, I got the following output:

18,"
","20 BA-10  12  06  2  30  S
20 BA-20  12  06  2  30  S","
",**********
66,"
","20 AD-38  12  06  B
20 AD-38  30  07  B","
",**********
70,"
","20 OL-45  19  11  B
20 EM-45  19  08  B","
",**********
77,"
",,"
",**********
87,"
","25 R160  22  13  E
25 R165  22  08  E","
",**********
88,"
",20 TH-42  02  02  5  30  MT,"
",**********

But I have to generate the following output in txt format:

18
20 BA-10  12  06  2  30  S
20 BA-20  12  06  2  30  S
**********
66
20 AD-38  12  06  B
20 AD-38  30  07  B
**********
70
20 OL-45  19  11  B
20 EM-45  19  08  B
**********
77
**********
87
25 R160  22  13  E
25 R165  22  08  E
**********
88
20 TH-42  02  02  5  30  MT
**********

Here is my code:

import pandas as pd
import csv

df = pd.read_csv('idDetails.csv')
data_rows = [] 

group_column = 'ID'
selected_columns = ['OTHER_FIELDS_2'] 
grouped_data = df.groupby(group_column)[selected_columns]

for group_name, group_df in grouped_data:
    #print(f"{group_name}")
    other_data = group_df.to_string(header=False,index=False)
    other_data_a = group_df.fillna('').dropna(axis = 0, how = 'all') 
    other_data_b = other_data_a.to_string(header=False,index=False)
    #print(other_data_b) 
    other_data_c = '**********'
    #print(other_data_c)
    data_rows.append([group_name, '\n', other_data_b, '\n', other_data_c])  
dfo = pd.DataFrame(data_rows)
dfo.to_csv('idDetailsoutput.txt', header=False, index=False)

A possible solution to get the desired output is appreciated.


Solution

  • The problem with your current code is that you're putting each group into a row of a DataFrame and then saving it with to_csv(). That adds extra quotes and weird line breaks in your output. Instead, you should just write everything directly to a text file using normal file writing — no need to use pandas.to_csv for this.

    
    import pandas as pd
    
    # Read the CSV
    df = pd.read_csv('idDetails.csv')
    
    # Group by ID
    grouped = df.groupby('ID')['OTHER_FIELDS_2']
    
    # Open the file in write mode
    with open('idDetailsoutput.txt', 'w') as f:
        for group_id, items in grouped:
            f.write(f"{group_id}\n")  # Write the ID
            for item in items:
                if pd.notna(item) and item.strip() != "":
                    f.write(f"{item.strip()}\n")  # Write each non-empty value
            f.write("**********\n")  # Separator line
    

    Output:

    18
    20 BA-10  12  06  2  30  S
    20 BA-20  12  06  2  30  S
    **********
    66
    20 AD-38  12  06  B
    20 AD-38  30  07  B
    **********
    70
    20 OL-45  19  11  B
    20 EM-45  19  08  B
    **********
    77
    **********
    87
    25 R160  22  13  E
    25 R165  22  08  E
    **********
    88
    20 TH-42  02  02  5  30  MT
    **********