I'm having trouble with a Python script that converts CSV files to Excel format using pandas. The issue is that numbers in scientific notation in the CSV file are not being correctly interpreted as numbers in the resulting Excel file. Instead, they appear in an undesired scientific format. Here's a simplified version of my function:
import pandas as pd
import os
from tkinter import filedialog, messagebox
def xlsx_conversion():
file_paths = filedialog.askopenfilenames(filetypes=[("Text and CSV files", "*.txt *.csv")])
if not file_paths:
return
for file_path in file_paths:
try:
sep = '\t' if file_path.lower().endswith('.txt') else ','
df = pd.read_csv(file_path, sep=sep)
for col in df.columns:
df[col] = pd.to_numeric(df[col], errors='ignore')
excel_file_path = os.path.splitext(file_path)[0] + '.xlsx'
with pd.ExcelWriter(excel_file_path, engine='openpyxl') as writer:
df.to_excel(writer, index=False, float_format="%.3f")
except Exception as e:
messagebox.showerror("Error", f"Error converting file {file_path}: {e}")
return
messagebox.showinfo("Success", "All files have been converted successfully!")
xlsx_conversion()
Here is an example of my csv file :
And this is what i have after using the function :
The data in row 5 and below are strings, not numeric, and thus unusable for creating graphs, which is the ultimate goal.
The problem is due to the fact that your columns are not purely numeric.
If I provide this .csv file, I get numeric values in the Excel file:
Frequency,Random
5.0e+3,6.01e-4
5.0e+3,6.01e-4
5.0e+3,6.01e-4
5.0e+3,6.01e-4
Your file has some additional material before the headers, and that prevents pandas.read_csv
from detecting that the columns are numerical.
When I use this file, all my numbers end up as strings in the Excel file, just like you get:
!Keysignth,Easdf,MYA1234,A.04.00
!Date: Thur
BEGIN CH1_DATA
Frequency,Random
5.0e+3,6.01e-4
5.0e+3,6.01e-4
5.0e+3,6.01e-4
5.0e+3,6.01e-4
The key mistake was to assume that errors="ignore"
meant "ignore values that cannot be converted, but keep doing the rest of the column". That's not what happens: conversion does not happen at all if any error happens, and errors="ignore"
causes that to be silently ignored.
header=3
to skip the metadataNow, you can give read_csv
a header
parameter, which indicates where the column headers are, and then it will only start reading columnar data after the headers.
In this specific case, using df = pd.read_csv(file_path, sep=sep, header=3)
indicated that the headers were after the three rows of metadata.
Caveat: the resulting Excel file does not contain those meta data lines at the top, since read_csv
ignores all contents before the header line.
Presumably you want to keep your metadata.
You can call pd.to_numeric
with a slice, telling it to start converting only from row 3 (to skip the three rows of metadata):
df.loc[3:, col] = pd.to_numeric(df[col][3:])
and now the Excel file has all the metadata, and the numeric data is properly numeric and ready to graph.
If your number of metadata rows is not constant, you might need to write some logic to find the first numerical row.
One heuristic could be to look for the first numeric value in the first column:
for row, value in enumerate(df[df.columns[0]]):
try:
pd.to_numeric(value)
first_numeric_row = row
break
except ValueError:
pass
else:
print("Warning: no numeric data in the first column")
first_numeric_row = row + 1
And then use
df.loc[first_numeric_row:, col] = pd.to_numeric(df[col][first_numeric_row:])
to convert the numerical data.
I am making a lot of assumptions here, starting with presuming that all columns have the same range of numerical data, and that no non-numeric data exists after the first numeric row. You'll have to adjust this to your real requirements. You might want to look for BEGIN CH1_DATA
instead if you know it's always there, for example.
float_format
You're passing float_format="%.3f"
to to_excel()
. Don't do that. It will force rounding your data before writing it to the Excel file. As @chrslg pointed out, 6e-4, 6.04e-4 or 6.12e-4 will all become 0.001. Once you've converted to numeric, just let the writer handle numeric values for you.