What I want is to count the filled and blank cells in each column of a .csv
file.
This is my code:
import pandas as pd
file_path = r"C:\Users\andre\OneDrive\Documentos\Farmácia\Python\Cadastro_clientes\cadastro_cli.csv"
df = pd.read_csv(file_path, sep='|', header=None) # No names argument, read all columns
filled_counts = df.count() # Count non-null entries for all columns
blank_counts = df.isnull().sum() # Count null (blank) entries for all columns
summary = pd.DataFrame({
'Filled': filled_counts,
'Blank': blank_counts
})
print("\nFilled and Blank Counts:")
print(summary)
I only get this, which is not what I want at all:
Filled and Blank Counts:
Filled Blank
0 22318 0
I'm using Jupyter Notebook. Any help or tips are very appreciated!
If you want to tally the number of filled and blank cells for each column individually, use:
summary = pd.DataFrame({
'Filled': df.notnull().sum(), # Count non-null (filled) cells
'Blank': df.isnull().sum() # Count null (blank) cells
})
print(summary, "\n")
If you want to tally for the whole csv, use:
total_filled = summary['Filled'].sum()
total_blank = summary['Blank'].sum()
print(f"Filled: {total_filled}, \nBlank: {total_blank}"
Complete script:
import pandas as pd
file_path = r"C:\Users\andre\OneDrive\Documentos\Farmácia\Python\Cadastro_clientes\cadastro_cli.csv"
df = pd.read_csv(file_path, sep='|') #if csv does not have headers add 'header=None'
# Calculate filled and blank counts for each column
summary = pd.DataFrame({
'Filled': df.notnull().sum(),
'Blank': df.isnull().sum()
})
print(summary, "\n")
# Calculate totals for the entire CSV
total_filled = summary['Filled'].sum()
total_blank = summary['Blank'].sum()
print(f"Total filled: {total_filled} \nTotal blank: {total_blank}")