pythonmultithreadingnntp

Threading NNTP, how? (Newbie here)


I can't wrap my head around how I could possibly rewrite my code to be multi-threaded.

The code I'm writing is made to automatically archive every single article in a list of newsgroups that exist, but I wanna be able to utilize my newsgroup plan and make it up to 20 threads. I've never coded threading before and my attempts were in vein.

Here's my code, excluding the username and pass ( but you can get a free account with max 5 threads if you really want to at https://my.xsusenet.com )

Please don't judge me too hard :(


import nntplib
import sys
import datetime
import os
basetime = datetime.datetime.today()
#daysback = int(sys.argv[1])
#date_list = [basetime - datetime.timedelta(days=x) for x in range(daysback)]
s = nntplib.NNTP('free.xsusenet.com', user='USERNAME', password='PASSWORD') # I am only allowed 5 connections at a time, so try for 4.
groups = []
resp, groups_list_tuple = s.list()


def remove_non_ascii_2(string):
    return string.encode('ascii', errors='ignore').decode()


for g_tuple in groups_list_tuple:
    #print(g_tuple) # DEBUG_LINE
    # Parse group_list info
    group = g_tuple[0]
    last = g_tuple[1]
    first = g_tuple[2]
    flag = g_tuple[3]

    # Parse newsgroup info
    resp, count, first, last, name = s.group(group)
    for message_id in range(first, last):
        resp, number, mes_id = s.next()
        resp, info = s.article(mes_id)
        if os.path.exists('.\\' + group):
            pass
        else:
            os.mkdir('.\\' + group)
        print(f"Downloading: {message_id}")
        outfile = open('.\\' + group + '\\' + str(message_id), 'a', encoding="utf-8")
        for line in info.lines:
            outfile.write(remove_non_ascii_2(str(line)) + '\n')
        outfile.close()

Tried threading using a ThreadPoolExecutor, to cause it to use 20 threads, and failed, caused it to repeat the same process to the same message id. The expected result was to download 20 different messages at a time.

Here's the code I tried with threading, mind you I did like 6-8 variations of it to try and get it to work, this was the last one before I gave up to ask on here.

import nntplib
import sys
import datetime
import os
import concurrent.futures
basetime = datetime.datetime.today()
#daysback = int(sys.argv[1])
#date_list = [basetime - datetime.timedelta(days=x) for x in range(daysback)]
s = nntplib.NNTP('free.xsusenet.com', user='USERNAME', password='PASSWORD') # I am only allowed 5 connections at a time, so try for 4.
groups = []
resp, groups_list_tuple = s.list()


def remove_non_ascii_2(string):
    return string.encode('ascii', errors='ignore').decode()

def download_nntp_file(mess_id):
    resp, count, first, last, name = s.group(group)
    message_id = range(first, last)

    resp, number, mes_id = s.next()
    resp, info = s.article(mes_id)
    if os.path.exists('.\\' + group):
        pass
    else:
        os.mkdir('.\\' + group)
    print(f"Downloading: {mess_id}")
    outfile = open('.\\' + group + '\\' + str(mess_id), 'a', encoding="utf-8")
    for line in info.lines:
        outfile.write(remove_non_ascii_2(str(line)) + '\n')
    outfile.close()


for g_tuple in groups_list_tuple:
    #print(g_tuple) # DEBUG_LINE
    # Parse group_list info
    group = g_tuple[0]
    last = g_tuple[1]
    first = g_tuple[2]
    flag = g_tuple[3]

    # Parse newsgroup info
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = executor.submit(download_nntp_file)

Solution

  • I can't test it with XSUseNet.

    I wouldn't use global variables because when processes work at the same time then they may get the same values from these variables.

    You should rather send values as parameters to functions.

    Something like this:

    def download_nntp_file(g_tuple):
        # ... code which uses `g_tuple` instead of global variables ...
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        for g_tuple in groups_list_tuple:
            executor.submit(download_nntp_file, g_tuple)
    

    But I would be simpler to use map() instead of submit() because it gets list with arguments and it doesn't need for-loop

    def download_nntp_file(g_tuple):
        # ... code which uses `g_tuple` instead of global variables ...
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_nntp_file, groups_list_tuple)