A little while ago, I posted on here for help using the API to download data from Tumblr blogs. birryree (https://stackoverflow.com/users/297696/birryree) was kind enough to help me correct my script and figure out where I had been going wrong, and I have been using his script with no problems since (Print more than 20 posts from Tumblr API).
This script requires that I manually input the blog name that I want to download each time. However, I need to download hundreds of blogs, so this has led to me working with hundreds of versions of the same script and is very time-consuming. I did some googling and found that it was possible to write Python scripts where you can input arguments from the command line and then they would be processed (if that's the right terminology) one by one.
I tried to write a script which would let me run a command from the command prompt and which would then download the three blogs I've asked for in the command prompt. (in this case, "prettythingsicantafford.tumblr.com; theficrecfairy.tumblr.com; and staff.tumblr.com).
So my script that I'm trying to run is:
import pytumblr
import sys
def get_all_posts(client, blog):
offset = 0
while True:
response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
# Get the 'posts' field of the response
posts = response['posts']
if not posts: return
for post in posts:
yield post
# move to the next offset
offset += 20
client = pytumblr.TumblrRestClient('SECRET')
blog = (sys.argv[1], sys.argv[2], sys.argv[3])
# use our function
with open('{}-posts.txt'.format(blog), 'w') as out_file:
for post in get_all_posts(client, blog):
print >>out_file, post
I am running the following command from the command prompt
tumblr_test2.py theficrecfairy prettythingsicantafford staff
However, I get the following error message:
Traceback (most recent call last):
File "C:\Users\izzy\test\tumblr_test2.py", line 29, in <module>
for post in get_all_posts(client, blog):
File "C:\Users\izzy\test\tumblr_test2.py", line 8, in get_all_posts
response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
File "C:\Python27\lib\site-packages\pytumblr\helpers.py", line 46, in add_dot_tumblr
args[1] += ".tumblr.com"
TypeError: can only concatenate tuple (not "str") to tuple
I have been trying to modify my script for about two weeks now in response to this error, but I have been unable to correct my no doubt very obvious mistake and would be very grateful for any help or advice.
EDIT FOLLOWING vishes_shell's ADVICE:
I am now working with the following script:
import pytumblr
import sys
def get_all_posts(client, blogs):
for blog in blogs:
offset = 0
while True:
response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True, filter='raw')
# Get the 'posts' field of the response
posts = response['posts']
if not posts: return
for post in posts:
yield post
# move to the next offset
offset += 20
client = pytumblr.TumblrRestClient('SECRET')
blog = sys.argv
# use our function
with open('{}-postsredux.txt'.format(blog), 'w') as out_file:
for post in get_all_posts(client, blog):
print >>out_file, post
However, I now get the following error message:
Traceback (most recent call last):
File "C:\Users\izzy\test\tumblr_test2.py", line 27, in <module>
with open('{}-postsredux.txt'.format(blog), 'w') as out_file:
IOError: [Errno 22] invalid mode ('w') or filename: "
['C:\\\\Users\\\\izzy\\\\test\\\\tumblr_test2.py',
'prettythingsicantafford', 'theficrecfairy']-postsredux.txt"
The problem that you trying to client.posts(blog, ...)
when blog
is tuple
object, declared as:
blog = (sys.argv[1], sys.argv[2], sys.argv[3])
You need to refactor your method to go over each blog separately.
def get_all_posts(client, blogs):
for blog in blogs:
offset = 0
...
while True:
response = client.posts(blog, ...)
...
...
blog = sys.argv
...