I'm attempting to open a CSV and decode the URL text, e.g. example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0
, and then save the file. I can do this easily with a string, but I'm struggling to do it with rows in a CSV.
My attempt so far:
#reading
file1 = open('example.csv', 'r')
reader = csv.reader(file1)
url = []
for rows in reader:
url.append = urllib.unquote(rows).decode('utf8')
#also tried "url.append(urllib.unquote(rows).decode('utf8'))", but same error
file1.close()
#writing
file2 = open('example.csv', 'w')
writer = csv.writer(file2)
writer.writerows(url)
file2.close()
The error I'm receiving:
AttributeError: 'list' object has no attribute 'split'
There are a few mistakes in your approach.
csv
module here, Python can read text files just fine. In fact, "line-wise" the default mode when you open a text file for reading.open()
it. Python has no magic text encoding detector, when you don't specify an encoding, reading the file properly may work on your machine and break on another, because different computer configurations may have different "default" encodings.ParseResult
object that conveniently exposes all the different parts of the URL as properties.dict
that you can access with keys..append
is a function. You can't assign to it (.append = '...'
), you need to call it (.append('...')
).with
block to work with files, because with
blocks close the file automatically.Compare:
from urllib.parse import urlparse, parse_qs
with open('example.txt', 'r', encoding='utf-8') as file1:
titles = []
for url in file1:
parts = urlparse(url)
# -> ParseResult(
# scheme='http', netloc='example.com', path='', params='',
# query='title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0',
# fragment='')
q = parse_qs(parts.query)
# -> {'title': ['правовая защита']}
if 'title' in q:
titles.append(q['title'][0])
with open('titles.txt', 'w', encoding='utf-8') as file2:
file2.writelines(titles)
Using list comprehensions and dropping the unnecessary comments, we can compress the above code quite a bit:
from urllib.parse import urlparse, parse_qs
with open('example.txt', 'r', encoding='utf-8') as file1:
queries = [parse_qs(urlparse(url).query) for url in file1]
with open('titles.txt', 'w', encoding='utf-8') as file2:
titles = [q['title'][0] for q in queries if 'title' in q]
file2.writelines(titles)