delphiindyararat-synapse

How to download faster?


What is the fastest way to download webpage source into a memo component? I use Indy and HttpCli components.

The problem is that I have a listbox filled with more than 100 sites, my program downloads source to a memo and parses that source for mp3 files. It is something like a Google music search program; it uses Google queries to make Google search easier.

I started reading about threads which lead to my question: Can I create a IdHttp instance in a thread with parsing function and tell it to parse half of the sites in the listbox?

So basically when a user clicks parse, the main thread should do:

for i := 0 to listbox1.items.count div 2 do
    get and parse

, and the other thread should do:

for i := form1.listbox1.items.count div 2 to form1.listbox1.items.count - 1 do
    get and parse.

, so they would add parsed content to form1.listbox2 in the same time. Or is it maybe easier to start two IdHttp instances in the main thread; one for first half of sites and other for second?

For this: should I use Indy or Synapse?


Solution

  • I would create a thread that can read a single url and process its content. You can then decide how many of those threads you want to fire at the same time. Your computer will allow quite a number of connections, so if those 100 sites have different hostnames, it is not a problem to run 10 or 20 at the same time. Too much is overkill, but too little is a waste of processor time.

    You can tweak this process even further by having separate threads for downloading and processing, so that you can have a number of threads constantly downloading content. Downloading is not very processor intensive. It is basically waiting for a response, so you can easily have a relatively large number of download threads, while a couple of other worker threads can grab items from the pool of results and process them.
    But splitting downloading and processing will make it a little bit more complex, and I don't think you're up to that challenge yet.

    Because currently, you got some other problems. At first, it is not done to use VCL components in a thread. If you need information from a listbox in a thread, you will either need to use Synchronize in the thread to make a 'safe' call to the main thread, or you will have to pass the information needed before you start the thread. The latter is more efficient, because code executed using Synchronize actually runs in the main thread, making your multi-threading less efficient.

    But my attention actually was drawn to the first line, "download webpage source into memo component". Don't do that! Don't load those results in a memo for processing. Automatic processing can best be done in memory, outside of visual controls. Using strings, streams, or even stringlists for processing a text is way faster than using a memo.
    A stringlist has some overhead as well, but it uses the same construction of indexing the lines (TMemoStrings, which is the Lines property of a Memo, and TStringList both have the same ancestor), so if you got code that makes use of this, it will be quite easy to convert it to TStringList.