multithreadingmulticoremultiprocessor

In a multithreaded app, would a multi-core or multiprocessor arrangement be better?


I've read a lot on this topic already both here (e.g., stackoverflow.com/questions/1713554/threads-processes-vs-multithreading-multi-core-multiprocessor-how-they-are or multi-CPU, multi-core and hyper-thread) and elsewhere (e.g., ixbtlabs.com/articles2/cpu/rmmt-l2-cache.html or software.intel.com/en-us/articles/multi-core-introduction/), but I still am not sure about a couple things that seem very straightforward. So I thought I'd just ask.

(1) Is a multi-core processor in which each core has dedicated cache effectively the same as a multiprocessor system (balanced of course for processor speed, cache size, and so on)?

(2) Let's say I have some images to analyze (i.e., computer vision), and I have these images loaded into RAM. My app spawns a thread for each image that needs to be analyzed. Will this app on a shared cache multi-core processor run slower than on a dedicated cache multi-core processor, and would the latter run at the same speed as on an equivalent single-core multiprocessor machine?

Thank you for the help!


Solution

  • The size of the cache is important. For the sake of this I'm assuming x86 processors and only using the L2 cache, which is shared on dual core processors.

    If you are comparing 2 single core processors with 1 dual core processor and the single core processors both have the same amount of data cache (running at the same speed), then you have more cache, so more portions of the images can fit into cache, and it is very likely that if the processing of the image data had to load and/or store to this data repeatedly that this would go more quickly at the same clock speeds.

    If you are comparing 2 single core processors with 1 dual core processor whose data cache is twice the size of each single core processor's data cache, then about half of the data cache will be used for each processor's work. It is quit likely that in addition to the image data that each independent thread has to use that there will be some shared data. If this shared data is stored in the shared data cache then it can be more easily shared between the two cores than on the 2xSingle core set up. On the 2xSingle core setup for each chunk of shared data one of the caches would store it and there would be a little bit of overhead when the other processor needed to use that data.

    Dual core machines also make it easier for threads to migrate from one core to another on the same processor module, because the cache of the thread's new processor does not need to be filled while the other has data that it doesn't need anymore taking up space.

    I'd suggest that whatever you end up with that you experiment with limiting the number of threads to 3 to 10 per-core at any time for general use. The threads will all be competing with each other for that cache space, so too many will make it so that all of the data from 1 thread is pushed out before that thread is rescheduled. Also, if each thread can loop over a few image files you gain a little by encouraging each thread's stack space to stay in cache because you have fewer stacks. You also reduce the amount of memory that the OS has to use to keep up with threads.

    You're biggest win is when you can overlap processing with slow access, such as disk, network, or human interaction, so just enough threads to keep the CPUs busy processing is what you need.