i am trying to convert torrent magnet urls in .torrent
files using python script.
python script connects to dht
and waits for meta data then creates torrent file from it.
e.g.
#!/usr/bin/env python
'''
Created on Apr 19, 2012
@author: dan, Faless
GNU GENERAL PUBLIC LICENSE - Version 3
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
http://www.gnu.org/licenses/gpl-3.0.txt
'''
import shutil
import tempfile
import os.path as pt
import sys
import libtorrent as lt
from time import sleep
def magnet2torrent(magnet, output_name=None):
if output_name and \
not pt.isdir(output_name) and \
not pt.isdir(pt.dirname(pt.abspath(output_name))):
print("Invalid output folder: " + pt.dirname(pt.abspath(output_name)))
print("")
sys.exit(0)
tempdir = tempfile.mkdtemp()
ses = lt.session()
params = {
'save_path': tempdir,
'duplicate_is_error': True,
'storage_mode': lt.storage_mode_t(2),
'paused': False,
'auto_managed': True,
'duplicate_is_error': True
}
handle = lt.add_magnet_uri(ses, magnet, params)
print("Downloading Metadata (this may take a while)")
while (not handle.has_metadata()):
try:
sleep(1)
except KeyboardInterrupt:
print("Aborting...")
ses.pause()
print("Cleanup dir " + tempdir)
shutil.rmtree(tempdir)
sys.exit(0)
ses.pause()
print("Done")
torinfo = handle.get_torrent_info()
torfile = lt.create_torrent(torinfo)
output = pt.abspath(torinfo.name() + ".torrent")
if output_name:
if pt.isdir(output_name):
output = pt.abspath(pt.join(
output_name, torinfo.name() + ".torrent"))
elif pt.isdir(pt.dirname(pt.abspath(output_name))):
output = pt.abspath(output_name)
print("Saving torrent file here : " + output + " ...")
torcontent = lt.bencode(torfile.generate())
f = open(output, "wb")
f.write(lt.bencode(torfile.generate()))
f.close()
print("Saved! Cleaning up dir: " + tempdir)
ses.remove_torrent(handle)
shutil.rmtree(tempdir)
return output
def showHelp():
print("")
print("USAGE: " + pt.basename(sys.argv[0]) + " MAGNET [OUTPUT]")
print(" MAGNET\t- the magnet url")
print(" OUTPUT\t- the output torrent file name")
print("")
def main():
if len(sys.argv) < 2:
showHelp()
sys.exit(0)
magnet = sys.argv[1]
output_name = None
if len(sys.argv) >= 3:
output_name = sys.argv[2]
magnet2torrent(magnet, output_name)
if __name__ == "__main__":
main()
above script takes around 1+ minutes to fetch metadata and create .torrent
file while utorrent
client only takes few seconds , why is that ?
How can i make my script faster ?
i would like to fetch metadata for around 1k+ torrents.
e.g. Magnet Link
magnet:?xt=urn:btih:BFEFB51F4670D682E98382ADF81014638A25105A&dn=openSUSE+13.2+DVD+x86_64.iso&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80
update :
i have specified known dht router urls like this in my script.
session = lt.session()
session.listen_on(6881, 6891)
session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()
but it still slow and sometimes i get errors like
DHT error [hostname lookup] (1) Host not found (authoritative)
could not map port using UPnP: no router found
update :
i have wrote this scmall script which fetches hex info hash from DB and tries to fetch meta data from dht and then inserts the torrent file in DB.
i have made it run indefinitely as i did not know how to save state , so keeping it running will get more peers and fetching meta data will quicker .
#!/usr/bin/env python
# this file will run as client or daemon and fetch torrent meta data i.e. torrent files from magnet uri
import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity
import os
from datetime import date, timedelta
#create lock file to make sure only single instance is running
lock_file_name = "/daemon.lock"
if(os.path.isfile(lock_file_name)):
sys.exit('another instance running')
#else:
#f = open(lock_file_name, "w")
#f.close()
session = lt.session()
session.listen_on(6881, 6891)
session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()
alive = True
while alive:
db_conn = MySQLdb.connect( host = 'localhost', user = '', passwd = '', db = 'basesite', unix_socket='') # Open database connection
#print('reconnecting')
#get all records where enabled = 0 and uploaded within yesterday
subset_count = 5 ;
yesterday = date.today() - timedelta(1)
yesterday = yesterday.strftime('%Y-%m-%d %H:%M:%S')
#print(yesterday)
total_count_query = ("SELECT COUNT(*) as total_count FROM content WHERE upload_date > '"+ yesterday +"' AND enabled = '0' ")
#print(total_count_query)
try:
total_count_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
total_count_cursor.execute(total_count_query) # Execute the SQL command
total_count_results = total_count_cursor.fetchone() # Fetch all the rows in a list of lists.
total_count = total_count_results[0]
print(total_count)
except:
print "Error: unable to select data"
total_pages = total_count/subset_count
#print(total_pages)
current_page = 1
while(current_page <= total_pages):
from_count = (current_page * subset_count) - subset_count
#print(current_page)
#print(from_count)
hashes = []
get_mysql_data_query = ("SELECT hash FROM content WHERE upload_date > '" + yesterday +"' AND enabled = '0' ORDER BY record_num ASC LIMIT "+ str(from_count) +" , " + str(subset_count) +" ")
#print(get_mysql_data_query)
try:
get_mysql_data_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
get_mysql_data_cursor.execute(get_mysql_data_query) # Execute the SQL command
get_mysql_data_results = get_mysql_data_cursor.fetchall() # Fetch all the rows in a list of lists.
for row in get_mysql_data_results:
hashes.append(row[0].upper())
except:
print "Error: unable to select data"
print(hashes)
handles = []
for hash in hashes:
tempdir = tempfile.mkdtemp()
add_magnet_uri_params = {
'save_path': tempdir,
'duplicate_is_error': True,
'storage_mode': lt.storage_mode_t(2),
'paused': False,
'auto_managed': True,
'duplicate_is_error': True
}
magnet_uri = "magnet:?xt=urn:btih:" + hash.upper() + "&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80"
#print(magnet_uri)
handle = lt.add_magnet_uri(session, magnet_uri, add_magnet_uri_params)
handles.append(handle) #push handle in handles list
#print("handles length is :")
#print(len(handles))
while(len(handles) != 0):
for h in handles:
#print("inside handles for each loop")
if h.has_metadata():
torinfo = h.get_torrent_info()
final_info_hash = str(torinfo.info_hash())
final_info_hash = final_info_hash.upper()
torfile = lt.create_torrent(torinfo)
torcontent = lt.bencode(torfile.generate())
tfile_size = len(torcontent)
try:
insert_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
insert_cursor.execute("""INSERT INTO dht_tfiles (hash, tdata) VALUES (%s, %s)""", [final_info_hash , torcontent] )
db_conn.commit()
#print "data inserted in DB"
except MySQLdb.Error, e:
try:
print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
except IndexError:
print "MySQL Error: %s" % str(e)
shutil.rmtree(h.save_path()) # remove temp data directory
session.remove_torrent(h) # remove torrnt handle from session
handles.remove(h) #remove handle from list
else:
if(h.status().active_time > 600): # check if handle is more than 10 minutes old i.e. 600 seconds
#print('remove_torrent')
shutil.rmtree(h.save_path()) # remove temp data directory
session.remove_torrent(h) # remove torrnt handle from session
handles.remove(h) #remove handle from list
sleep(1)
#print('sleep1')
print('sleep10')
sleep(10)
current_page = current_page + 1
#print('sleep20')
sleep(20)
os.remove(lock_file_name);
now i need to implement new things as suggested by Arvid.
UPDATE
i have managed to implement what Arvid suggested. and some more extension i found in deluge support forums http://forum.deluge-torrent.org/viewtopic.php?f=7&t=42299&start=10
#!/usr/bin/env python
import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity
import os
from datetime import date, timedelta
def var_dump(obj):
for attr in dir(obj):
print "obj.%s = %s" % (attr, getattr(obj, attr))
session = lt.session()
session.add_extension('ut_pex')
session.add_extension('ut_metadata')
session.add_extension('smart_ban')
session.add_extension('metadata_transfer')
#session = lt.session(lt.fingerprint("DE", 0, 1, 0, 0), flags=1)
session_save_filename = "/tmp/new.client.save_state"
if(os.path.isfile(session_save_filename)):
fileread = open(session_save_filename, 'rb')
session.load_state(lt.bdecode(fileread.read()))
fileread.close()
print('session loaded from file')
else:
print('new session started')
session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()
alerts = []
alive = True
while alive:
a = session.pop_alert()
alerts.append(a)
print('----------')
for a in alerts:
var_dump(a)
alerts.remove(a)
print('sleep10')
sleep(10)
filewrite = open(session_save_filename, "wb")
filewrite.write(lt.bencode(session.save_state()))
filewrite.close()
kept it running for a minute and received alert
obj.msg = no router found
update :
after some testing looks like
session.add_dht_router("router.bitcomet.com", 6881)
causing
('%s: %s', 'alert', 'DHT error [hostname lookup] (1) Host not found (authoritative)')
update: i added
session.start_dht()
session.start_lsd()
session.start_upnp()
session.start_natpmp()
and got alert
('%s: %s', 'portmap_error_alert', 'could not map port using UPnP: no router found')
as MatteoItalia points out, bootstrapping the DHT is not instantaneous, and can sometimes take a while. There's no well defined point in time when the bootstrap process is complete, it's a continuum of being increasingly connected to the network.
The more connected and the more good, stable nodes you know about, the quicker lookups will be. One way to factor out most of the bootstrapping process (to get more of an apples-to-apples comparison) would be to start timing after you get the dht_bootstrap_alert (and also hold off on adding the magnet link until then).
Adding the dht bootstrap nodes will primarily make it possible to bootstrap, it's still not necessarily going to be particularly quick. You typically want about 270 nodes or so (including replacement nodes) to be considered bootstrapped.
One thing you can do to speed up the bootstrap process is to make sure you save and load the session state, which includes the dht routing table. This will reload all the nodes from the previous session into the routing table and (assuming you haven't changed IP and that everything works correctly) bootstrapping should be quicker.
Make sure you don't start the DHT in the session constructor (as the flags argument, just pass in add_default_plugins), load the state, add the router nodes and then start the dht.
Unfortunately there are a lot of moving parts involved in getting this to work internally, the ordering is important and there may be subtle issues.
Also, note that keeping the DHT running continuously would be quicker, as reloading the state still will go through a bootstrap, it will just have more nodes up front to ping and try to "connect" to.
disabling the start_default_features
flag also means UPnP and NAT-PMP won't be started, if you use those, you'll have to start those manually as well.