Complexity of algorithms and typical errors in Python

Complexity of algorithms and typical errors in Python

Hello everyone! I will tell you what the complexity of algorithms is and where it comes from, I will analyze typical mistakes and the most frequent mistakes of beginners. The material is intended primarily for beginner Python developers, as well as for those for whom Python is the first programming language.

What is the complexity of algorithms?

When talking about the complexity of algorithms, they often give an example of a graph on which functions are drawn, and say that this algorithm is equal in complexity to this function, and that algorithm is equal to another function, and so on. Typical examples:

  • Log(N) – binary search in an already sorted array;

  • N – operation of the code in one cycle;

  • N * Log (N) – sorting;

  • N**2 is a loop nested within another loop.

Note: assembler so close to the machine code that in fact one assembly statement corresponds to one machine instruction (1:1). Therefore, it is possible to estimate the actual complexity of the algorithm quite accurately.

Obviously, the more complex the algorithm, the faster the function grows. But what does this mean if you dig deeper? Let’s consider some algorithm with O(N) complexity on the example of C++, for example, creating an array, and see how this operation will look in disassemblers.

You can read more about how much each operation costs in the article “Complexity of algorithms and operations on the example of Python“.

Note: C/C++ is much closer to iron than high-level Python, so it’s much easier to disassemble. Clearly, a Python list works differently from the way an array works in C++, but the principles of how they work are exactly the same.

To create an array of 7 elements in C++:

int arr[] = {1, 2, 3, 4, 5, 6, 7};

you will need to perform 7 operations in the assembler:

mov     DWORD PTR [rbp-32], 1
mov     DWORD PTR [rbp-28], 2
mov     DWORD PTR [rbp-24], 3
mov     DWORD PTR [rbp-20], 4
mov     DWORD PTR [rbp-16], 5
mov     DWORD PTR [rbp-12], 6
mov     DWORD PTR [rbp-8], 7

This is what the algorithm complexity function O(N) shows – how many elements need to be processed, so many operations will be processed. For O(N*N) — the same, but the number of operations will already be squared. In Python, creating a list with 7 elements will also require at least seven operations:

l = [1, 2, 3, 4, 5, 6, 7]

Another example: to write one element of a C++ array:

arr[0] = 111;

Only one assembly operation is required:

mov     DWORD PTR [rbp-32], 111

That is why this operation has O(1) complexity. Same for Python:

l[0] = 111

This operation also has O(1) complexity by the same logic.

Now that you understand where the complexity of algorithms comes from, you can move on to common mistakes.

Example 1, working with the string type

Let’s consider the simplest demonstrative example, which results from a misunderstanding of the definition of algorithm complexity. Suppose we need to collect a string in a loop, for this we can write the following code:

line = ""
for i in range(10_000):
    line += "i"

Seemingly simple and working code that produces the correct output—in this case, a string of 10,000 “i” characters. The execution time of this code 5 ms.

At first glance, this is an algorithm with O(N) complexity, but if you look at the last line of code, you can see that the type string doesn’t actually add a character to the current value but converts the value with the new character because string is an immutable type. In other words, only one of these operations will have O(N) complexity, because to copy a value to a new string, you need to copy each element of the old string plus the new character. In addition, this operation is also performed in a loop, that is, the final complexity of this algorithm will be O(N*N).

Note: given that the variable line grows sequentially from 0 to N, it is correct to say that the complexity is O(N*M), where M

This algorithm can be improved by using an operation whose complexity is O(1), e.g append:

word = []
for i in range(10_000):
    word.append("i")
line = "".join(word)

Now the algorithm has worked for 2 msWhich is 2.5 times faster than the previous version, besides now its complexity is O(N).

It is worth noting that the append operation in Python is actually performed by shockproof O(1) as in other high-level languages ​​such as Java or c#. In other words, it is very rare, but this operation can be performed in O(N) time. Under the hood, the list contains an array that stores items. If the size of the list is not enough, before saving the new item, it will be increased twice, exactly in this case it will be O(N).

Let’s also consider an example with an assignment operation, which already really has a complexity of O(1):

N = 10_000
word = [None] * N
for i in range(N):
    word[i] = "i"
line = "".join(word)

Execution time reduced to 1.35 msWhich is a little faster than with append. Both of these algorithms will be executed in approximately the same time, the variant with assignment in practice will work more often quite a bit faster.

Example 2, array conversion

Let’s take a very large list and look for elements in it:

arr = list(range(10_000_000))
matcher = [500_000, 100_000, 1000_000_000] * 10

In our case, the array will consist of 10 million elements, and we will search for 30 numbers in it. In its simplest form, the algorithm will look like this: in a loop, we go through each of the 30 elements from matcher and we are looking for it in arr.

for i in matcher:
    if i in arr:
        pass

This algorithm has O(N*N) complexity: iterating through the loop is O(N) and searching the list is also O(N). Duration of his work 1.2 seconds.

This code can be improved since we know that the set search has O(1) complexity. How can a novice programmer rewrite the algorithm? For example, like this:

for i in matcher:
    if i in set(arr):
        pass

Yes, with clear thoughts, a programmer who is starting or does not know the complexity of algorithms can write like this. This example also has O(N*N) complexity, searching through a set is O(1), but converting a list to a set is O(N). Despite the O(N*N) complexity, as in the previous example, the algorithm is now performed by 15.2 seconds. The reason is that searching the list is O(N) in the worst case, but the conversion will always be O(N).

It would be correct to write like this:

arr_s = set(arr)
for i in matcher:
    if i in arr_s:
        pass

Now the code is executed 0.5 seconds. This error occurs surprisingly often, it is invisible, but it can slow down the algorithm and increase memory consumption. It is important to understand that unnecessary transformation of an array of elements consumes unnecessary resources.

Typical examples with conversion errors:

  • set(list(map(str, arr))) – It is necessary to immediately lead to a multitude: set(map(str, arr))

  • list(sorted(arr))sorted and so returns a list

And there can be many other errors and variations.

Example 3, errors when declaring variables

It is not for nothing that they say that it is desirable to declare variables at the beginning of a method. This is of course a recommendation, but it is useful for beginners. Often, an inexperienced programmer can make a similar mistake: suppose we have some method that performs a division operation:

def divider(a: int, b: int):
    res = a / b
    return res

Suppose that a QA engineer (aka tester) in the team found a bug in this code: he noticed that it was not divisible by zero, raised a bug, and a novice programmer fixed the code:

def divider(a: int, b: int):
    try:
        res = a / b
    except ZeroDivisionError:
        print("don't divide on zero!")
    return res

Yes, this is quite a common situation when a bug is masked by another error. In this case, division by zero will be processed correctly, but an error will immediately appear: UnboundLocalError: local variable 'res' referenced before assignmentthat is, a reference to a non-existent variable, since it will not be initialized in the block try/except due to division error.

The easiest way to avoid this is to declare variables in advance:

def divider(a: int, b: int):
    res = None
    try:
        res = a / b
    except ZeroDivisionError:
        print("don't divide on zero!")
    return res

Often variables are declared in a segment ifcode like this is not a good tone:

def divider(a: int, b: int, flag: bool):
    if flag:
        res = a / b
    return res

It is possible to declare such a variable in Python, but in Java or C# it will not work. Such code in these programming languages ​​causes an error at the compilation stage.

Example 4, input parameters

There is a well-known recommendation: do not change the input parameters. Why they say so, let’s see.

def func(l: list):
    l[0] = 10
l = [1, 2, 3]
func(l)

Seemingly simple code, but what will lie in the list after it is executed? And here’s what:

print(l)

[10, 2, 3]

How did it happen? After all, the method does not return anything, we do not redefine the list in the code, only inside the method. Consider another example:

def func2(a: int):
    a = 10
a = 0
func2(a)

Why will the variable be equal to:

print(a)

0

In the second case, the variable has not changed, why is this happening? The fact is that objects in Python are divided into those that are passed by reference. list, set, dictand those passed by value int, float, str, tuple.

Let’s consider another example:

l1 = [1, 2]
l2 = l1
l1.append(3)

In this case, both lists will be equal, even though we only changed one of them:

print(l1, l2)

([1, 2, 3], [1, 2, 3])

When passing arguments to a method, or if you want to save the initial state of the list so that there are no errors, the reference types must be copied explicitly, for example:

l2 = l1[:]
l2 = list(l1)
l2 = l1.copy()

In this case, the references will be to different memory areas, which will solve the problem.

Example 5, default value

This is probably one of the most common mistakes a beginner programmer makes. Consider an example:

def func3(val: int, l: list = []):
    l.append(val)
    print(l)

It seems like a normal method, but let’s call it several times in a row and see what happens:

func3(1)

[1]

func3(2)

[1, 2]

Yes, this is the catch: a new empty list is not created by default. That is why in subsequent calls, in which the programmer expects an empty array, as was laid down by the logic of the algorithm, elements from past calls to the method remain in the array. This is how Python already works: it always takes the same reference to the list in such a case.

If you want an empty list by default, you should write like this:

def func3(val: int, l: Optional[list] = None):
    if not l:
        l = []
    l.append(val)
    print(l)

The same rule applies to the set (set) and dictionary (dict), they also cannot be made empty in the default options.

Conclusion

In simple words and using examples, I explained what the complexity of an algorithm is, showed where this concept comes from, analyzed the typical mistakes that result from a misunderstanding of the complexity and are the most common among beginners.

Don’t be afraid to make mistakes if you are a beginner developer. Only the one who does nothing is not wrong.

Thank you for your attention.

The source code repository is on GitHub.

Related posts