Multiple process or threads? And, why?

adr1an@programming.dev · 1 month ago

Multiple process or threads? And, why?

jacksilver@lemmy.world · 1 month ago

Threads all run on the same core, processes can run on different cores.

Because threads run on the same core, the only time they can improve performance is if there are non-cpu tasks in your code - usually I/O operations. Otherwise the only thing multi threading can provide is the appearance of parallelism (as the cpu jumps back and forth between threads progressing each in small steps).

On the other hand, multiprocessing allows you to run code on different cores, meaning you can take full advantage of all your processing power. However, if youre program has a lot of I/O tasks, you might end up bottlenecked by the I/O and never see any improvements.

For the example you mentioned, it’s likely threading would be the best as it’s got a little less overhead, easier to program, and you’re task is mostly I/O bound. However, if the calculations are relatively quick, it’s possible you wouldn’t see any improvement as the cpu would still end up waiting for the I/O.

xia@lemmy.sdf.org · 1 month ago

I recall reading a white paper on how multi-processing is pretty easy to debug and get right, but that multi-threading was actually impossible due to cartesian explosion of possible states and multiple writers to the same memory space.

conrad82@lemmy.world · 30 days ago

Yes it is correct. TLDR; threads run one code at the time, but can access same data. processes is like running python many times, and can run code simultaneously, but sharing data is cumbersome.

If you use multiple threads, they all run on the same python instance, and they can share memory (i.e. objects/variables can be shared). Because of GIL (explained by other comment), the threads cannot run at the same time. This is OK if you are IO bound, but not CPU bound

If you use multiprocessing, it is like running python (from terminal) multiple times. There is no shared memory, and you have a large overhead since you have to start up python many times. But if you have large calculations you can do in parallell that takes long time, it will be much faster than threads as it can use all cpu cores.

If these processes need to share data, it is more complicated. You need to use special functions to share data, like queues and pipes. If you need to share many MB of data, this takes a lot of time in my experience (10s of milliseconds).

If you need to do large calculations, using numpy functions or numba may be faster than multiple processes, due to good optimizations. But if you need to crunch a lot of data, multiprocessing is usually the way to go

pelya@lemmy.world · 1 month ago

Speed-wise, multiple processes and multiple threads should be identical, if you are using the same primitives (shared memory, system-wide semaphore).

Threads are easier to use and use less RAM, because all your memory is shared automatically, and system-wide semaphores have complicated API.

dwt@feddit.org · 1 month ago

On python, because of the Gil, multi processing should always be preferred if possible.

logging_strict@programming.dev · 17 days ago

Also logging is not isolated. Bleeds all over the place. Which is a deal breaker

Not worth the endless time doing forensics

Agree! Lets stick with multiprocessing

one thread sounds nice. Lets do much more of that