stackless python didn't take a good usage of multi-core, so where is the point it should be faster than python thread/multiprocessing ?
all the benchmark use stackless python tasklet to compare with python thread lock and queue, that's unfair, cause lock always has low efficiency
see, if use single thread function call without lock it should be as efficient as stackless python
Focus on functionality first, and performance second (unless you know you have the need).
Most of the time on a server is spent with I/O, so multi-cores do not help so much. If it is mostly I/O that you are working with, multi-threading python may be the simplest answer.
If the server requests are CPU intensive, then having a parent process (be it multi-threaded or not), and respective child processes does make a good bit of sense.
If you really want to scale, you could look at a different platform, like Erlang. If you really want to scale and still use python, you could look at distributed erlang with Python processes managed as Erlang ports on a distributed cluster.
Lots of options, but unless you are dealing with someting big big, you could most likely take a simple approach.
release early, release often.