how will it be turned off

how will it be turned off

Python developers, as a rule, know well what the GIL is and why it is needed, questions about it are found in most interviews, and I myself like to ask them. But CPython won’t have it soon. Yes, the CPython core developers took a course to remove it.

You can’t just take and remove the GIL

My name is Denis and I am a Python developer. Today I would like to share with you the content of PEP 703 “Making the Global Interpreter Lock Optional in CPython”, one of the most interesting projects in the CPython world, work on which began in January 2023.

This article may be of interest to anyone dealing with Python, as well as people interested in the device of programming languages ​​in general. The article is not an exact translation, it is a concise and free presentation of concepts by the author without diving into the details of implementation. You can always read the original PEP 703 yourself.

What is PEP about?

You probably know that active work on CPython acceleration is currently underway – these are projects Faster CPython, Subinterpreters, Per-Interpreter GIL, No-GIL. We will consider part of the last project.

At the moment, the GIL is the main obstacle to true thread-level parallelism. Habré already has a translation of an excellent article about the current GIL implementation.

PEP 703 presented a detailed plan of what would be done to implement flag compilation --disable-gil. Thus, anyone who wants to will be able to compile from the Python source codes, which has GIL will be disabled. Governing Board (Steering council) approved the changes for implementation from version 3.13, but with the caveat that they may be partially or fully reversed. The document contains a larger motivational part, we will consider only the proposed techniques.

Main ideas

Removing the GIL affects many parts of the language, so all the changes have been divided into four categories:

  • link counting,

  • memory management,

  • flow safety of containers (means such data structures as list and dict),

  • locking and atomic APIs.

The PEP author doesn’t exactly follow this structure, so I’ll go through the points of interest simply in order.

Counting links

Each Python object has a reference counter (ob_refcnt). Up to version 3.12, reference counters were changed only when the GIL was captured by the executing thread, which prevented true concurrency.

The link counter only changes when the GIL is captured

How it worked:

  • one of the threads grabs the GIL on the next iteration of the computing cycle,

  • executes the following bytecode instruction,

  • changes the value of the counter during the execution of the instruction,

  • checks whether other threads are waiting for the GIL, switch interval and decides to release it or not.

This implementation eliminates the race between threads and ensures safe handling of Python objects, but prohibits execution of bytecode in more than 1 thread.

How does PEP 703 propose to remedy the situation? For this, the following approaches will be used:

  • separate link count (Biased reference counting, hereinafter – BRC),

  • perpetuation (Immortalization),

  • deferred link counting (Deferred reference counting, further – DRC).

Let’s analyze each of the techniques.

Separate link counting based on the observation that even in multi-threaded applications, most objects are only requested by the thread in which they were created. Knowing this fact, we can simplify the reference counting for the thread owning the object (eg not grabbing the GIL for it). Thus, for everyone PyObject 2 link counters will be entered:

  • local (ob_ref_local) – for the thread owning the object,

  • general (ob_ref_shared) – for other threads, it needs to be changed atomically (that is, capturing some mutex).

Short and long ways of changing counters by streams

Perpetuation is a technique in which statically allocated objects such as True, False, NoneNumbers from -5 to 255, interned strings, etc., are designated as immortal, and reference counter change operations for them will be no-op.

Counters do not change for immortal objects

Some objects in Python, such as global functions or imported modules, may not be available for the lifetime of the program (for example, you can remove them from scope via delThus reducing the link counter), while the appeals to them most often come from different streams.

Perpetuation in this case will not be applied. Then he goes on stage Deferred link counting.

Deferred link counting occurs during garbage collection

As a rule, the reference counter changes when an object is added to the interpreter stack or removed from the stack. For objects using DRC, some reference counter operations will be ignored, but the interpreter will notice them in some way. In this regard, the value of the counter for such objects ceases to be accurate. The actual value of the counter will be equal to the current value plus the number of all missed operations, in particular it can be negative. This value will be calculated directly during garbage collection.

As it already becomes obvious, all the changes listed above will require refinement of garbage collection.

Memory management

CPython currently uses its own allocator pymalloc (on this topic I recommend you the documentation of the awesome memory profiling Memray), which is well optimized for memory allocation for small objects, but not safe in a multi-threaded environment without a GIL. PEP suggests replacing it with the Mimalloc thread-safe allocator.

The most important thing about this point is that Python objects should be allocated only by the corresponding API, and vice versa, this API should be used only by Python objects.

Garbage collection

Garbage collector (Garbage collectorhereinafter – GC) will require the following changes:

  • using “stop-the-world” to enforce guarantees previously provided by GIL,

  • transition from GC with generations to GC without generations to reduce the number of stop-the-worlds

  • integration with DRC and BRC.

Since without the GIL we cannot guarantee that the value of the link counters will not change during garbage collection and the reference cycles will be determined, it becomes necessary to suspend all threads executing bytecode. The current GC implementation requires a double traversal to detect loops, so the new implementation will use two “stop-the-world“.

Stop this world.

To ensure the suspension of flows into the structure PyThreadState a new field will be added status which can take the following values: ATTACHED, DETACHED, GC.

The behavior of the first two is similar in terms of capture and release by GIL threads – before accessing or modifying an object, the thread will have to enter the appropriate state. The main difference is that now more than one thread can have access to the object, that is, be able to ATTACHED.

During stop-the-world, the thread performing the garbage collection must ensure that no other threads have accessed, modified, or entered the state GC (from the stateDETACHED). Threads in state ATTACHED receive a suspension request and automatically switch to the status GC. After collecting the garbage, the streams return to their fortunes.

Flow safety of containers

Thanks to the GIL, operations on built-in types such as list, set, dict, flow safe. Without it, we can get a situation where calls like list.extend(iterable) will be non-atomic since iterable can implement the iterator protocol in Python itself. To preserve the expected behavior, it is suggested to introduce a mutex at the level of each container. However, this approach cannot provide 100% the same guarantees as GIL. For example, the same operation list.extend(iterable) will require simultaneous locking of both containers.

In the new realities, the following construct, even at the C-code level, will be unsafe because another thread can modify it item between the specified calls:

PyObject *item = PyList_GetItem(list, idx);
Py_INCREF(item);

It is proposed to solve such a problem by introducing new functions that will return objects with already changed counters. The concept is named Borrowed references. For example, instead of PyDict_GetItem will use PyDict_FetchItem.

However, as many could have already guessed, bringing mutexes to the object level can cause mutual locks (Deadlock), since threads, as a rule, operate on more than one object at the same time (the same list.extend(iterable)).

This PEP introduces the concept of “Python Critical Sections”, in which one or another mutex will be implicitly freed and recaptured back under certain conditions. The main idea is that one thread should have only one mutex at a time. We will not dwell on this moment in detail.

Locking and Atomic APIs

For methods FetchItem and GetItem in dict and list will introduce a way not to capture an object mutex (that is, without using critical sections) if these objects are not being modified by other threads at the same time, and it is called Optimistic bypass of blockages (Optimistically Avoiding Locking).

It was decided to do so for the following reasons:

  • dictionaries are used to access global functions and methods of classes because they are widely used by many threads. Applying blocking in these cases will reduce the scaling efficiency of multi-threaded applications,

  • the need to reduce costs for single-threaded programs by capturing mutexes.

As we mentioned earlier, accessing the container object and changing its reference counter is a non-atomic operation and requires at least 2 actions, so they are closed with a mutex. In the above 2 cases, it is suggested to use conditional incrementwhich is executed only if the reference counter has not reached zero, and the mechanism is similar to Read-copy update (RCU).

If the container element could not be retrieved in an optimistic way, a retry will be made using the critical section.

What’s next?

As a next step in 2-3 years after the 3.13 release, i.e. in 2026-2027, the core developers propose to move disabling the GIL from the compilation flag to the runtime, for example, by adding an environment variable, but having the GIL enabled by default.

Then, after another 2-3 years (2028-2030), the GIL will be permanently disabled, but it can also be enabled by a flag or variable. This will allow all python projects to be smoothly migrated to the new paradigm. But, as you can see, it is not soon enough.

Conclusion

PEP 703 contains a lot more miscellaneous information. Personally, I was interested even just to get acquainted with the listed concepts. Let’s see what CPython 3.13 brings us from this, and whether the changes will have a positive effect. I am happy to protest --disable-gil on projects.

What do you think, what do you think about everything that has been said?

Related posts