I was curious as to what are the impacts of thread creation were on a netduino running the .net micro framework. It's commonly understood that threads have an inherent overhead to them but I was wondering if anyone knew if there were optimizations or not for .net micro on an embedded environment and if anyone can give me some detail as to what happens under the hood with a thread here (how much memory is allocated, how many cycles it takes to generate, etc).
In my experience there's a roughly 1K memory cost for each thread under NETMF. As for the time required to allocate a thread, if you're contemplating questions like that it's probably time to do a bit of reading on embedded systems best practice. I'm not mocking you, there's quite a bit of hard won lore that can save you heartache and hassle. Case in point, the thread thing. If you want reliability you have to guarantee maximum resource demand. If you're going to say "no more than 5 threads" then you may as well start all five as part of your initialisation process, and allocate all the resources they're going to want. If you can't do that then you can't guarantee the stability of your system under load. A side affect of this is that the time required to start them is irrelevant to the responsiveness of your system, although it does affect boot time slightly.
There is overhead for context switching. I can't give you quantified figures because I've never needed to benchmark it. NETMF is implemented right on the metal; more than likely you can get some insight from the SoC documentation which you can download from ATMEL. Or if you ask on the netduino forums there's a fair chance Chris can tell you off the cuff.
If this is a homework question then take Hans' advice and look at the source code. If you're looking to build something and assessing the platform suitability for an application then it may be of interest that I have never suffered from switching lag when doing timing sensitive things on different threads, but I never use more than three or four threads and one of them services a number of logical processes (all the timing insensitive stuff) in round robin fashion.
Once again, the key to long term stability is to avoid dynamic allocation of anything.
An advantage of explicitly coded round robin is that you have control of sequence for the logical processes.