impact on multitasking and productivity

impact on multitasking and productivity

Hello, dear readers!

The GIL, or Global Interpreter Lock, has been a topic of discussion and debate among pythonists for decades.

What GIL? GIL, short for Global Interpreter Lock, is an important concept in Python. It is a mutex that blocks access to the Python interpreter object in multithreaded environments, allowing only one instruction to be executed at a time. This mechanism, although it takes care of data security and integrity, at the same time becomes a stumbling block for those who want to maximize multitasking and use the full potential of multi-core processors.

When we talk about multitasking in Python, we mean using multiple threads or processes to perform different tasks. This is especially true in applications that require real-time data processing or simultaneous execution of a large number of tasks. However, the GIL introduces a limitation to this process, as only one thread can access the Python interpreter at a given time.

In the early versions of Python, the GIL did not exist. However, when Python began to be used for multi-threaded applications, it became apparent that there were problems with concurrent access to shared resources. That’s why Guido van Rossum and the development team implemented the GIL to ensure safe handling of Python’s memory and objects.

The GIL was not introduced as an intentional limitation, but rather as a necessary measure to ensure safety in the midst of multitasking.

Python was designed with an emphasis on simplicity and ease of development, and many of Python’s internal data structures, such as lists and dictionaries, can be modified during program execution. This makes Python easy to use, but also creates potential problems in a multi-threaded environment. Without the GIL, multiple threads could simultaneously modify and interact with these data structures, leading to unpredictable behavior and various data races.

An important milestone was the use of the GIL in version 1.5 of Python. From that point on, the GIL remained a fundamental part of the Python core. Over time, as the language evolved, developers made attempts to improve multitasking and make the GIL less restrictive.

Python 3.2 introduced a system to split GIL locks into multiple parts, which gave a small performance boost in certain cases.

How the GIL works

A GIL is a mutex that acts as a limiter that allows only one thread to execute Python bytecode at a time. This means that in a multitasking Python environment, only one thread can be actively executing Python code at the same time.

Example:

import threading

def worker():
    for _ in range(1000000):
        pass

# Создаем два потока
thread1 = threading.Thread(target=worker)
thread2 = threading.Thread(target=worker)

# Запускаем потоки
thread1.start()
thread2.start()

# Ждем, пока оба потока завершатся
thread1.join()
thread2.join()

In the given example, two threads perform a function workerwhich simply executes a loop. However, due to the GIL, only one of the threads will be active at a given time. This limitation can significantly affect performance, especially in multitasking applications.

Python provides a built-in module threading for working with streams. Importantly, the GIL exists only at the Python interpreter level and is operating system dependent. Therefore, even if your operating system supports multitasking, the GIL can limit the use of multiple processor cores.

To work with threads in Python, you can create instances of a class Thread from the module threading and run them. It’s important to remember that the GIL limits multitasking at the interpreter level, so Python threads are suitable for tasks that are more about waiting for I/O than data-intensive processing.

Example:

import threading

def print_numbers():
    for i in range(1, 6):
        print(f"Number: {i}")

def print_letters():
    for letter in 'abcde':
        print(f"Letter: {letter}")

# Создаем два потока
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

# Запускаем потоки
thread1.start()
thread2.start()

# Ждем, пока оба потока завершатся
thread1.join()
thread2.join()

In this example, we create two streams to output numbers and letters. Note that GIL locking does not affect this example because it involves waiting for screen output, which is an output-to-input operation.

Threads interacting with the GIL can lead to unexpected results, especially if blocking and multitasking are not taken into account. When several threads try to modify the same data, data races (race conditions) can occur.

Example:

import threading

counter = 0

def increment():
    global counter
    for _ in range(1000000):
        counter += 1

# Создаем два потока
thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)

# Запускаем потоки
thread1.start()
thread2.start()

# Ждем, пока оба потока завершатся
thread1.join()
thread2.join()

print("Counter:", counter)

In this example, two threads are trying to increment a common counter. Due to GIL blocking, the result of this operation can be indeterminate and depends on which thread accesses the counter at the moment.

GIL related issues

IV. GIL related issues

Now that we have a better understanding of how the GIL works, it’s time to look at a number of issues surrounding its presence and impact on multitasking and performance in Python. In this section, we’ll introduce ten common problems experienced by professional developers and provide code examples to illustrate each of them.

  1. Limited multitasking: One of the most well-known problems of the GIL is the multitasking limit. Despite having many threads, only one can be actively executing at any given time.

    Example:

    import threading
    
    def count_up():
        for i in range(1000000):
            pass
    
    thread1 = threading.Thread(target=count_up)
    thread2 = threading.Thread(target=count_up)
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
  2. Performance of multitasking applications: Multitasking programs that need to efficiently use many processor cores can experience performance issues because the GIL limits parallel execution.

    Example:

    import threading
    
    def compute_square(num):
        return num * num
    
    def main():
        numbers = list(range(1000))
        results = []
    
        for number in numbers:
            thread = threading.Thread(target=lambda num=number: results.append(compute_square(num)))
            thread.start()
    
        for thread in threading.enumerate():
            if thread != threading.current_thread():
                thread.join()
    
    if __name__ == "__main__":
        main()
    
  3. I/O problems: The GIL doesn’t restrict I/O operations that much, so programs that expect data from files, the network, and other sources can run relatively fine.

    Example:

    import threading
    import requests
    
    def download_url(url):
        response = requests.get(url)
        content_length = len(response.text)
        print(f"Downloaded {url} with {content_length} characters.")
    
    urls = ["https://example.com", "https://example.org", "https://example.net"]
    
    threads = []
    for url in urls:
        thread = threading.Thread(target=download_url, args=(url,))
        thread.start()
        threads.append(thread)
    
    for thread in threads:
        thread.join()
    
  4. Difficulties with data sharing: Splitting data between threads can be difficult due to the GIL. This can lead to data races and errors.

    Example:

    import threading
    
    shared_data = []
    lock = threading.Lock()
    
    def append_data(data):
        with lock:
            shared_data.append(data)
    
    thread1 = threading.Thread(target=append_data, args=("Hello",))
    thread2 = threading.Thread(target=append_data, args=("World",))
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
    print(shared_data)
    
  5. Unstable execution time: Due to GIL contention, execution time of code in threads can be unpredictable and vary from run to run.

    Example:

    import threading
    
    def count_up():
        total = 0
        for i in range(1000000):
            total += i
        print(f"Total: {total}")
    
    thread1 = threading.Thread(target=count_up)
    thread2 = threading.Thread(target=count_up)
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
  6. Resource limitations: The GIL also limits access to computer resources such as CPU time, which can be problematic for multitasking applications.

    Example:

    import threading
    import time
    
    def heavy_calculation():
        result = 0
        for _ in range(100000000):
            result += 1
        time.sleep(5)
    
    thread1 = threading.Thread(target=heavy_calculation)
    thread2 = threading.Thread(target=heavy_calculation)
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
  7. Features on multiprocessor systems: On multiprocessor systems, the GIL can lead to inefficient use of resources because multiple cores can be idle.

    Example:

    import threading
    
    def cpu_bound_task():
        total = 0
        for _ in range(100000000):
            total += 1
        print(f"Total: {total}")
    
    thread1 = threading.Thread(target=cpu_bound_task)
    thread2 = threading.Thread(target=cpu_bound_task)
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
  8. Inefficient use of multi-core processors: The GIL makes Python less efficient on multi-core processors because only one core can be active at a time.

    Example:

    import threading
    
    def compute_squares(numbers):
        return [x * x for x in numbers]
    
    numbers = list(range(1000000))
    
    thread1 = threading.Thread(target=compute_squares, args=(numbers,))
    thread2= threading.Thread(target=compute_squares, args=(numbers,))
    
    thread1.start() thread2.start()
    
    thread1.join() thread2.join()
    
    
    1. Difficulties with the implementation of real multitasking: Because of the GIL, implementing true multitasking in Python can be more complex and resource intensive.

    Example:

    import threading
    
    def perform_task(task_name):
        print(f"Performing task: {task_name}")
    
    tasks = ["Task 1", "Task 2", "Task 3"]
    
    threads = [threading.Thread(target=perform_task, args=(task,)) for task in tasks]
    
    for thread in threads:
        thread.start()
    
    for thread in threads:
        thread.join()
    
    1. Difficulties with parallel data processing: Parallel data processing can be difficult due to the GIL, especially when dealing with large amounts of data.

    Example:

    import threading
    
    def process_data(data):
        result = []
        for item in data:
            result.append(item * 2)
        return result
    
    data = list(range(1000000))
    
    thread1 = threading.Thread(target=process_data, args=(data,))
    thread2 = threading.Thread(target=process_data, args=(data,))
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    

It is important to understand that the GIL is not a bug, but a concept built into Python to provide security and simplify memory management. However, it also creates a number of limitations for multitasking applications, and developers should take this into account when designing and optimizing code.

Ways to bypass the GIL

One of the most effective ways to bypass the GIL is to use multiprocessing instead of multitasking threads. Since each process has its own Python interpreter and its own GIL, they can run in parallel on different CPU cores.

An example of using multiprocessing in Python using a module multiprocessing:

import multiprocessing

def worker(data):
    # Здесь происходит обработка данных
    result = data * 2
    return result

data = [1, 2, 3, 4, 5]

# Создаем пул процессов
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())

# Используем многопроцессорный пул для обработки данных
results = pool.map(worker, data)

# Завершаем пул
pool.close()
pool.join()

print("Результаты:", results)

This code creates a process pool and uses it to process data in parallel. This allows for efficient multitasking and bypassing GIL limitations.

Except multiprocessing, there are several libraries and frameworks that provide higher-level access to multiprocessing. Example, concurrent.futures allows the use of thread and process pools, providing a convenient interface for parallel tasks.

Example of use concurrent.futures with a thread pool:

import concurrent.futures

def worker(data):
    # Здесь происходит обработка данных
    result = data * 2
    return result

data = [1, 2, 3, 4, 5]

# Создаем пул потоков
with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(executor.map(worker, data))

print("Результаты:", results)

using concurrent.futuresyou can easily switch between thread pools and processes depending on your application requirements.

Another way to bypass the GIL is to use C extensions. Python allows you to create C extensions that can perform intensive operations without GIL blocking. These extensions can interact directly with the system calls of the operating system and take full advantage of multitasking.

An example of creating a C-extension for Python:

#include <Python.h>

static PyObject* my_extension_function(PyObject* self, PyObject* args) {
    // Здесь можно выполнять интенсивные вычисления
    int result = 0;
    // ...
    return Py_BuildValue("i", result);
}

static PyMethodDef my_extension_methods[] = {
    {"my_extension_function", my_extension_function, METH_VARARGS, "Описание функции"},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef my_extension_module = {
    PyModuleDef_HEAD_INIT,
    "my_extension",
    "Описание модуля",
    -1,
    my_extension_methods
};

PyMODINIT_FUNC PyInit_my_extension(void) {
    return PyModule_Create(&my_extension_module);
}

This C extension can then be used in Python, allowing for more efficient performance of intensive operations.

Tips for optimizing performance

If your threads block frequently, for example due to I/O operations, this can significantly degrade performance. Instead of thread blocking, you can use non-blocking I/O or asynchronous code to avoid idle threads.

An example of using non-blocking I/O operations:

import socket

def non_blocking_network_operation():
    # Создание неблокирующего сокета
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setblocking(0)
    
    try:
        # Попытка подключения без блокировки
        sock.connect(("example.com", 80))
    except BlockingIOError:
        pass
    
    # Продолжение выполнения кода без блокировки

To optimize performance, you can split your code into independent tasks and execute them in parallel. Instead of using Python threads, which can collide with the GIL, consider using lower-level mechanisms such as processes or asynchronous programming.

An example of using asynchronous code with the library asyncio:

import asyncio

async def async_task():
    await asyncio.sleep(1)
    print("Выполнение асинхронной задачи")

async def main():
    tasks = [async_task() for _ in range(10)]
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(main())

Asynchronous programming allows you to efficiently manage tasks without blocking threads.

Tips for optimizing performance

  1. Use built-in functions and methods:
    Python has many built-in functions and methods that are optimized and faster than manual counterparts. For example, instead of looping through the list foruse functions map(), filter(), sum() and other.

    numbers = [1, 2, 3, 4, 5]
    
    # Плохой способ
    total = 0
    for num in numbers:
        total += num
    
    # Хороший способ
    total = sum(numbers)
    
  2. Use generators:
    Python generators allow you to generate values ​​lazily and can save memory and increase performance.

    # Плохой способ
    squares = []
    for num in range(1, 1000000):
        squares.append(num ** 2)
    
    # Хороший способ
    squares = (num ** 2 for num in range(1, 1000000))
    
  3. Avoid redundant calculations:
    If you perform the same calculation multiple times, save the result and reuse it.

    # Плохой способ
    result1 = complex_computation(data)
    result2 = complex_computation(data)
    
    # Хороший способ
    result = complex_computation(data)
    result1 = result
    result2 = result
    
  4. Use set instead of list for fast searching:
    If you often need to search for elements in a collection, use sets, which have much faster access times than lists.

    # Плохой способ
    items = [1, 2, 3, 4, 5]
    if 3 in items:
        print("Найден!")
    
    # Хороший способ
    items = {1, 2, 3, 4, 5}
    if 3 in items:
        print("Найден!")
    
  5. Optimize work with files:
    When working with files, use context managers to automatically close files. Also, read and write data in batches to reduce memory usage.

    # Плохой способ
    file = open("data.txt", "r")
    data = file.read()
    file.close()
    
    # Хороший способ
    with open("data.txt", "r") as file:
        data = file.read(1024)
    
  6. Use functions from the standard library:
    Python has many functions and modules in the standard library for processing data, parsing XML, working with JSON, and other tasks. Instead of writing your own solutions, use already existing ones.

    # Плохой способ
    import my_custom_parser
    data = my_custom_parser.parse_xml(xml_data)
    
    # Хороший способ
    import xml.etree.ElementTree as ET
    root = ET.fromstring(xml_data)
    
  7. Avoid multiple I/O operations:
    I/O operations such as reading and writing files or network requests can be expensive. When performing many such operations, combine them and execute them in one request.

    # Плохой способ
    for url in urls:
        response = requests.get(url)
        process_data(response.text)
    
    # Хороший способ
    responses = [requests.get(url) for url in urls]
    for response in responses:
        process_data(response.text)
    
  8. Use algorithms with linear execution time:
    When choosing algorithms, try to use those with linear execution time (O(n)) to avoid long operations.

    # Плохой способ
    
    
    def find_max(numbers):
        max_num = numbers[0]
        for num in numbers:
            if num > max_num:
                max_num = num
        return max_num
    
    # Хороший способ
    max_num = max(numbers)
    
  9. Use a profile:
    Code profiling helps you identify the places where the most time is wasted and focus your efforts on optimizing the important parts.

    An example of using the module cProfile:

    import cProfile
    
    def my_function():
        # Код для профилирования
    
    cProfile.run("my_function()")
    
  10. Avoid using global variables:
    Global variables can make code less readable and manageable. Instead, use passing parameters to functions and returning results.

    # Плохой способ
    count = 0
    
    def increment_count():
        global count
        count += 1
    
    # Хороший способ
    def increment_count(count):
        return count + 1
    
    count = increment_count(count)
    

Conclusion

The GIL is a feature of the Python interpreter that limits the simultaneous execution of multiple threads of Python code in a single process. This limitation can be a challenge for developers, especially those dealing with multitasking and parallel data processing.

Learn more practical skills from the experts in the Python Developer online course. Professional. I also want to recommend webinars on asynchronous interaction in Python and Tracing in Python programs, which you can register for absolutely free.

Related posts