I'm trying to trawl newsgroup to test some text-based grouping algorithms, fetching batches of newsgroup headers and sticking them into a SQLite database. The database is about as plain as it gets, with text columns for all the data, and the header data as fetched by Python's nntp library always gives 8 values per header. All but one of those are strings, and I convert the only non-string to a string before inserting my data into the database. This notwithstanding, Python falls over itself with the rather useless "TypeError: not all arguments converted during string formatting" error, which is only a marginal step up from just saying "error: good luck, you're on your own."
Does someone who understands how string formatting of a string to a string goes wrong better than I know what's going wrong in the following code?
import nntplib, sqlite3
# newsgroup settings (modify so that this works for you =)
server = 'news.yournewsgroup.com'
port = 119
username = 'your name here'
password = 'your password here'
# set up the newsgroup and sqlite connections
connection = nntplib.NNTP(server, port, username, password)
newsgroup = "comp.graphics.algorithms"
connection.group(newsgroup)
database = sqlite3.connect(newsgroup + ".db")
# create a table definition if it doesn't exist yet
try:
# SQLite doesn't actually have data types. Everything as stored as plain text.
# And so is newsgroup data. Bonus!
database.execute("""CREATE TABLE headers (articleNumber text, subject text,
poster text, date text, id text,
references text, size text,
lines text)""")
except:
# table definition already exists. Not actually an error.
pass
# Get the group meta-data, and set up iterator values for running
# through the header list.
resp, count, first, last, name = connection.group(newsgroup)
total = int(last) - int(first)
step = 10000
steps = total / step;
articleRange = first + '-' + str(int(first)+step)
# grab a batch of headers
print "[FETCHING HEADERS]"
resp, list = connection.xover(first, str(int(first)+step))
print "done."
# process the fetched headers
print "[PROCSSING HEADERS]"
for entry in list:
# Unpack immutable tuple, mutate (because the references list
# should be a string), then repack.
articleNumber, subject, poster, date, id, references, size, lines = entry
argumentList = (articleNumber, subject, poster, date, id, (",".join(references)), size, lines)
try:
# try to chronicle the header information. THIS WILL GO WRONG AT SOME POINT.
database.execute("""INSERT INTO headers (articleNumber, subject, poster,
date, id, reference, size, lines)
VALUES ('?', '?', '?',
'?', '?','?', '?', '?')"""
% argumentList)
except TypeError as err:
# And here is an irking point with Python in general. Something went
# wrong, yet all it tells us is "not all arguments converted during
# string formatting". Despite that error being generated at a point
# where the code knows WHICH argument was the problem.
print err
print type(argumentList[0]), argumentList[0]
print type(argumentList[1]), argumentList[1]
print type(argumentList[2]), argumentList[2]
print type(argumentList[3]), argumentList[3]
print type(argumentList[4]), argumentList[4]
print type(argumentList[5]), argumentList[5]
print type(argumentList[6]), argumentList[6]
print type(argumentList[7]), argumentList[7]
# A quick print set shows us that all arguments are already of type
# "str", and none of them are empty... so it would take quite a bit
# of work to make them fail at being legal strings... Wat?
exit(1)
print "done."
# cleanup
database.close()
connection.quit()
What that error tells you is that you supplies n values to string formatting (%
) but the format string expected less than n values. Specifically, this string:
"""INSERT INTO headers (articleNumber, subject, poster,
date, id, reference, size, lines)
VALUES ('?', '?', '?',
'?', '?','?', '?', '?')"""
does not expect any values for %
-style string formatting. Theres no %d
in it, no %s
, nothing. Instead, the ?
placeholders are for the parameter substitution of the DB API. You don't invoke that with the %
operator (you don't need it at all here). Instead pass the sequence of values as second parameter to the execute
call. Also, you need to drop the quotes from the placeholders to indicate they're supposed to be placeholders and not string literals which happen to contain a single quote character. In summary:
database.execute("""
INSERT INTO headers (articleNumber, subject, poster,
date, id, reference, size, lines)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", # note: comma, not %
argumentList)