Global Interpreter Lock (GIL) in Python
What is GIL?
GIL and its impact on performance
The full form of GIL is Global Interpreter Lock and the name itself is extremely instructive. Quite literally, it means a global lock on the interpreter. Let’s get into bit more specifics.
Suppose we have 10 CPU bound threads that have to be executed.
The Python Global Interpreter Lock or GIL, in simple words, is a mutex that allows only one thread to hold the control of the Python interpreter.
This means that only one thread can be in a state of execution at any point in time. Due to which Python is not able to leverage the parallelism with multiple threads. In fact, it gives poorer performance when we use multiple threads with single core and even poorer performance with multiple cores.
#NOTE: (MUTEX => Just to recall, in multi-threaded applications, different threads sometimes share a common resource. MUTEX ensures that only one thread has access to that resource at particular point of time)
Single threaded
Multi threaded
Why this happens
Since mutex allows only one thread to hold the control of the Python interpreter, I can understand the execution time is not faster for multithreaded implementation but why is it actually slower than the single threaded implementation as ideally it should be same as that of single threaded implementation.
The main culprit of this is GIL thread signaling.
After every 100 ticks*, the interpreter:
- Releases the GIL (mutex lock)
- Signals all the runnable threads that the GIL is free, and all threads including the one that released the GIL fight for acquiring the GIL
- Suppose thread 7 is successful in acquiring the GIL. This itself will have additional overhead time: the time spend in waking up the thread 7 from its suspended state and acquiring the GIL
#NOTE: For simplicity let us assume that ticks refer to a line of instruction in byte code.
Single Core Multithreaded implementation
And this process adds additional overhead in not only in terms of resources but also in terms of time due to which we find the overall execution time is higher. The frequency of thread switching is not frequent in this implementation. According to research done by David Beazley, GIL lock may be released hundreds of times before it is actually acquired by another thread.
Multi Core Multithreaded implementation
This scenario is even worse in multi core multithreaded implementation, because apart from the above factors there are certain additional factors also.
Here waiting thread usually have to go through a lot of GIL acquisition failure before any success, because:
Of the conflicting responsibilities of Python and OS. While the OS wants both the threads to be running, Python only allows one thread to run. This increases the GIL battle.
With multiple cores, runnable threads get scheduled simultaneously (on different cores) and battle over the GIL.
Thread 2 is repeatedly signaled, but when it wakes up, the GIL is already gone (reacquired).
Performance in other scenarios:
So far this discussion has mainly revolved around CPU bound threads. Let us see how the situation will play out if some threads are CPU bound while others and I/O bounds.
Single Core Multithreaded implementation:
There is misconception that since I/O is a thread safe library that it does not acquire GIL but it does. However, there is none-to-insignificant performance degradation because there is no significant wait time.
Multi Core Multithreaded implementation:
Here, we find that the even the performance of I/O based thread is degraded and the phenomena is “priority inversion”. A CPU-bound thread (low priority) is blocking the execution of an I/O-bound thread (high priority) and it occurs because the I/O thread can't wake up fast enough to acquire the GIL before the CPU-bound thread reacquires it.
General consensus about GIL
Since the GIL allows only one thread to execute at a time optimally even in a multi-threaded architecture with more than one CPU core, the GIL has gained a reputation as an “infamous” feature of Python.
(The impact of the GIL isn’t visible to developers who execute single-threaded programs, but it can be and has been shown to be performance bottleneck in CPU-bound and multi-threaded code)
So why does GIL prevent multi threading?
Scenario:
Let us suppose we are running a multi-threaded program and python uses reference counting for memory management. It means that objects created in Python have a reference count variable that keeps track of the number of references that point to the object. When this count reaches zero, the memory occupied by the object is released.
Issue:
The problem is that this reference count variable needed protection from race conditions in case of multi threaded implementation as two threads can increase or decrease its value simultaneously. It can cause either leaked memory that is never released or, incorrectly release the memory while a reference to that object still exists or corrupt the data which can crash your python program.
How to handle:
So there are multiple approaches to handle this problem:
This reference count variable can be kept safe by adding locks to all data structures that are shared across threads so that they are not modified inconsistently.
But adding a lock to each object or groups of objects means multiple locks will exist which can cause another problem—Deadlocks (deadlocks can only happen if there is more than one lock). Another side effect would be decreased performance caused by the repeated acquisition and release of locks.
The GIL is a single lock on the interpreter itself which adds a rule that execution of any Python bytecode requires acquiring the interpreter lock. This prevents the problem of memory leak as well as deadlocks (as there is only one lock) and doesn’t introduce much performance overhead.
Reasons why GIL was selected in the first place?
Python has been around since the days when operating systems did not have a concept of threads. The idea behind Python was to be easy-to-use in order to make development quicker and attract more and more developer.
A lot of extensions were being written for the existing C libraries whose features were needed in Python. These C extensions required a thread-safe memory management which the GIL provided.
The GIL is simple to implement and was easily added to Python. It provides a performance increase to single-threaded programs as only one lock needs to be managed.
C libraries that were not thread-safe became easier to integrate. And these C extensions became one of the reasons why Python was readily adopted by different communities.
GIL was a pragmatic solution to a difficult problem that the CPython developers faced early on in Python’s life.
Efforts to improve GIL
Conditions to improve GIL
All the attempts made previously to remove the GIL broke the existing C extensions which depend heavily on the solution that the GIL provides.
And there is huge protest in python community about multi threading aspect of GIL but creator of Python, Guido van Rossum specified that: “I’d welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) does not decrease”. However, this condition hasn’t been fulfilled by any of the attempts made since.
Why not in Python3?
Removing the GIL would have made Python 3 slower in comparison to Python 2 in single-threaded performance and it would have drastic impact in the Python community. And the single-threaded performance benefits of the GIL is non debatable and outweighs its limitation. So the result is that Python 3 still has the GIL.
However, there is a change
This is a patch uploaded by Antoine Pitrou. The simplicity makes you wonder why it was not attempted earlier. So, let us see step by step the logic behind the patch.
Earlier we saw how GIL was release after certain number of (100) ticks and then all the runnable threads competed with each other to acquire the GIL. It solves that problem to some extent.
Instead of ticks, there is now a global variable.
A thread runs until the value gets set to 1, after which the thread must drop the GIL.
Suppose there are two threads executing thread T1 and waiting thread T2
Waiting thread does a timed cv_wait on GIL. T2 waits to see if the GIL gets released voluntarily by T1 (e.g., if there is I/O or it goes to sleep for some reason or its execution is complete)
Now If timeout happens, T2 sets and send gil_drop_request, due to which T1 suspends its execution. Since, it is kind of a forced release T1 waits for an acknowledgement from the thread that sent gil_drop_request (T2) that it is in running state.
This eliminates the GIL Battle between all the runnable threads, which in turn does not negatively affect the performance in multi threaded scenarios. However, it still does not provide any performance boost.
But has its pitfalls, it impacts I/O performance
To handle I/O, a thread must go through the entire time-out sequence to get control which ignores the higher priority of I/O.
Workarounds to perform parallel programing
Multi-processing
The most popular way is to use a multi-processing approach where you use multiple processes instead of threads. Each Python process gets its own Python interpreter and memory space so the GIL won’t be a problem.
NOTE 1: Certain libraries while performing certain operations do not require GIL, for example certain modules of numpy do not use GIL, although I am not completely sure about the reason but my guess is it uses pure C implementation.
NOTE 2: There are multiple implementations of Python such as CPython, IronPython, RPython etc. GIL is implemented in some of them, like CPython.
References:
https://www.dabeaz.com/python/UnderstandingGIL.pdf
http://www.dabeaz.com/python/GIL.pdf
https://realpython.com/python-gil/
https://callhub.io/understanding-python-gil/