After writing a function to generate some data, I wanted to add the ability to save it. I initially started out with the following code which I ran with 'save=True':
[in]
import csv
... (main body of code - all this works fine)
if save is True:
print("Saving...")
with open('dataset.csv', 'a+') as f:
lines = f.readlines()
for line in lines:
linesplit = line.split(",")
name_in_dataset = linesplit[0]
...
(... some code for the actual saving process - irrelevant)
print("Data added successfully")
[out]
Saving...
I know that the dataset file contains this name and should have saved here, so I was a little confused as to where it went wrong. I started to break down the code until I reached this:
[in]
if save is True:
print("Saving...")
with open('dataset.csv') as f:
lines = f.readlines()
print(lines)
[out]
Saving...
[]
Not really sure why it can't read the lines? I though I had used the same code previously to read the lines of this very file so I'm really confused about why it's not working now.
I've tried adding things to the code such as f.seek(0)
but this has made no difference. I've also tried changing the open function to 'a'
and 'r'
but alas it can't read the lines. I've searched through so many posts about .readlines()
and can't find anyone experiencing this :( I feel like I've just been at work for too long and have forgotten the basic fundamentals of Python coding!
Thanks in advance <3
EDIT: Using the suggestions in the comments I changed the code to:
with open('(file path)/dataset.csv', 'r') as f:
f.seek(0)
lines = csv.reader(f)
print(lines)
and it returned:
Saving...
<csv.reader object at 0x7f01282c7f20>
I see a lot of people new to Python and CSVs trying to use filemode append, and they usually get themselves in some trouble because of it.
In general, I recommend reading the source CSV, modifying the rows, then writing the modified rows to another file. Once you've verified the validity of the new file, you can decide what to do with the old file.
For reading/writing CSV, I recommend using the csv module's reader and writer.
Given the CSV:
Col1,Col2
r1c1,r1c2
r2c1,r2c2
r3c1,r3c2
Use the csv.reader(some_file) function to create a row iterator for that file:
with open('input.csv',newline='',encoding='utf-8') as f:
reader = csv.reader(f)
The local variable reader will yield completely decoded rows. A row can be returned one-at-a-time with next(reader):
next(reader)
# ['Col1', 'Col2']
next(reader)
# ['r1c1', 'r1c2']
A row returned by reader is just a list of strings.
The iterator can also be used in a for-loop, as the documentation shows us:
for row in reader:
print(row)
# ['r2c1', 'r2c2']
# ['r3c1', 'r3c2']
Note that the reader continued reading from where it left off with the next() statements. Also, now the reader has been exhausted—there are no more rows to decode. Trying to read from it will throw the StopIteration exception:
next(reader)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# StopIteration
To get all the rows and be able to loop over them any number of times, use list(reader) when creating the reader to convert the transient iterator into a permanent list of rows:
with open('input.csv',newline='',encoding='utf-8') as f:
reader = csv.reader(f)
header = next(reader)
rows = list(reader)
That saves the first row to its own variable, header. The rest of the rows are added to list named rows. If a row is a list of strings, then the variable rows is a list of list of strings.
If you want to omit the header, call next(reader) by itself (with no left-hand assignment). The reader will dutifully return the header, but it'll just go in to the void.
Now you can do something with those rows:
for row in rows:
name = row[0]
# do something with name...
name = name.lower()
# before saving it back to the list
row[0] = name
Finally, write the modified rows back to a CSV. For me, I will always create a new file:
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(rows)
Once you're happy with output.csv, you can decide what to do with input.csv—leave it, trash it, overwrite it with output.csv (os.rename('output.csv', 'input.csv')
).
Good luck. :)