Yandex’s secret party test – a night in fintech

Yandex’s secret party test – a night in fintech

Hello everybody!
I recently had a fintech night from Yandex, and since Yandex didn’t talk about it here, I will. In order to get to the event, it was necessary to solve a simple task.
The article will include 3 topics: prehistory, solution attempts, epilogue and my thoughts
PS the entire article has the author’s character and is written in a free style.

Brief history:

Towards night, I wrote code and scrolled through the VK feed while the computer compiled my code.
Suddenly, an interesting advertising post in the feed caught my attention. I was interested not so much in the post as in its format. “We invite (cool?) developers to a closed “pati” for friendly communication and hiring (unfortunately, I have not saved the link to the advertising post)– this is how you can convey the essence of the post and the colorful picture below.
Note: I wrote “cool” in italics, because the ad seemed to talk about medium levels, although the original did not say so.
Interestingly, Ya not only spends the budget on advertising vacancies (which is commonplace), but also organizes private parties, which is something new. A picture skilfully worked out by the designer and an equally bright separate page inspired confidence in the seriousness of the event. As a face control, there was a task that only gave excitement.

Task:

Cтранное сообщение мы тут написали, что-то в нем явно не так🤔
Давай попробуем расшифровать:

10000100 11001110 10101110 00000100 00100110 01110110 10101110 11110110 01001110 10000110 0010100 10 10011110 01000110 0000 0100 00100110 01110110 10000110 00000100 11001110 00101110 10010110 010

Answer field: The form will be sent only if the answer is filled in correctly

Link to the original:
https://forms.yandex.ru/surveys/13470877.31b3577ab565e9abd42a95b2f7808e408f978b8c/?utm_source=share2&utm_content=success

Solution attempts:

Throughout the process I followed the rule “the solution has to be very simple, 5, 15, 30 minutes max or they will lose the conversion”

Attempt 1 – courage

But these are ordinary ASCII characters, a classic, any first-year student can solve it!
ASCII – (American Standard Code for Information Interchange) is the most popular encoding and representation of text as numbers in the world.
We replace the numbers with letters, voila! The task is solved in 30 seconds!

Result:
�ή&v��N�Φ.�F&v��.�B
(Editors can display it in different ways, but the essence is the same – we do not read)

Your person when the response for a set of bytes is not ASCII characters

My brain probably went through all 5 stages of acceptance at this point:
denial, anger, bargaining, depression, acceptance.

More experienced colleagues noticed the trick right away:
ASCII has only 128 codes, while a byte contains 256 possible combinations.

ASCII character table

Attempt 2 – ignoring

And let’s just ignore the bytes that fall outside the range of 128? – Let’s go!
And another gibberish came out, made of letters and punctuation marks (official symbols), only shorter.

Attempt 3 – well, I guess you will have to read the condition of the task …

<Let’s try to decipher>>
If it is a cipher, then there must be a key and an encryption algorithm.
Where could the encryption key be?

Every cryptographer knows that there are no gaps in real ciphertext, only a fool would do that, so they’re here on purpose, that’s a hint.

And the bits, even if it’s not ASCII, but they are so well arranged, 8 pieces in a row, it’s exactly a sequence of bytes, and even columns, maybe it’s just a layout, or maybe it’s meant to be, in any case, check as soon as possible theory How to read the code or change the screen size.

Let’s try to apply some logical operation to each row or column or to everything at once, somehow in general, because what else do you need bits for, if you don’t apply Boolean operators to them?

It would be clearer if the task was a flat rectangle, but the lone byte in the last line breaks all the assumptions, what to do with it, ignore it, apply it to everything at once, make an exception for the first column, what is it, the line from the 1st byte then? In general, there are a lot of ambiguities, so I decide to look for only those sequences that “cook porridge” for one of the operators:

1. AND – И
2. OR – OR
3. XOR – Exclusive or
4. NAND – AND-NOT
5. NOR – OR-NOT
6. XNOR – Exclusive OR-NOT

Unfortunately, each of the operations provided False at the stage of comparison, that is. the equation did not work and the columns or rows are not connected to each other by logical operations.
The thought “what if we apply a combination of operations” ran through my head, but this contradicts the principle of the task, “the solution should be very simple.”

Attempt 4 – Oh, how I didn’t want to dive into coding!

Let’s say it’s not ASCII encoding, but UNICODE, and the most popular implementation of the UNICODE specification is UTF8, which contains the entire byte range, not just the range 0-127. The bad news was that UTF8 also does not use the range 128-256, rather it does, but for its own needs, such as specifying the number of bytes in a character, etc.
As for UTF16 and UTF32 – they are also a bummer, they use at least 2 and 4 bytes, respectively.

Attempt 5 – you will have to show cleverness and find more hints

What will statistical analysis give us?
Let’s write some Python code:

# Наша строка
In [1]: s="10000100 11001110 10101110 00000100 00100110 01110110 10101110 11110110 01001110 10000110 00000100 11001110 10100110 00101110 10011110 01000110 00000100 00100110 01110110 10000110 00000100 11001110 00101110 10010110 01000010"
# Сделаем список
In [2]: l = s.split()
In [3]: l
Out[3]:
['10000100',
'11001110',
'10101110',
'00000100',
'00100110',
'01110110',
'10101110',
'11110110',
'01001110',
'10000110',
'00000100',
'11001110',
'10100110',
'00101110',
'10011110',
'01000110',
'00000100',
'00100110',
'01110110',
'10000110',
'00000100',
'11001110',
'00101110',
'10010110',
'01000010']
# Переведем в десятичную систему (h - human readable)
In [4]: lh = [int(i, 2) for i in l]
In [5]: lh
Out[5]:
[132,
206,
174,
4,
38,
118,
174,
246,
78,
134,
4,
206,
166,
46,
158,
70,
4,
38,
118,
134,
4,
206,
46,
150,
66]
In [6]: from collections import Counter
In [7]: d = Counter(lh)
In [8]: d
Out[8]:
Counter({132: 1,
206: 3,
174: 2,
4: 4,
38: 2,
118: 2,
246: 1,
78: 1,
134: 2,
166: 1,
46: 2,
158: 1,
70: 1,
150: 1,
66: 1})

# Отсортируем
In [9]: ds = dict(sorted(d.items(), key=lambda item: item[1], reverse=True))
In [10]: ds
Out[10]:
{4: 4,
206: 3,
174: 2,
38: 2,
118: 2,
134: 2,
46: 2,
132: 1,
246: 1,
78: 1,
166: 1,
158: 1,
70: 1,
150: 1,
66: 1}

Not a lot, but there is some. In particular, we can see that the most frequent symbol is the symbol “4”, which means the end of the transmission. Maybe this is a coincidence, maybe not, let’s dig further…

And if the first bit of all bytes in our problem is a sign bit?!
Our method did not take into account the singed / unsigned bit case, it is unsigned by default!
# Здесь "int(i, 2)" означает unsigned, для signed нужен кардинально другой подход
lh = [int(i, 2) for i in l]

The devil knows what I will do with negative numbers, but maybe it is worth looking at them, because the problem is precisely with them, because they are out of range, if the first bit is interpreted as a sign, then all bytes will be exactly in the ASCII range. In addition, it will give a new field for ideas, maybe it is worth sorting them by growth or discarding the negative ones, who knows.

Google python unsigned int = https://stackoverflow.com/a/62696400
int.from_bytes(n.to_bytes(length=1, byteorder="little", signed=False), byteorder="little", signed=True)

Eureka! byteorder="little" – the gaze involuntarily caught on the parameter.
And if our bytes are simply written in little endian?! That is, they should be read from right to left, and not from left to right as we are used to!

Attempt 6 – solution:

We test our hypothesis:

# Здесь видно, что все наши байты начинаются с 0, т.е. не выходят за диапазон ASCII
# Конструкция с[::-1] просто перевренет строку (реверс), qwerty - ytrewq
# lr - list reversed
In [11]: lr = [с[::-1] for с in l]
In [12]: lr
Out[12]:
['00100001',
'01110011',
'01110101',
'00100000',
'01100100',
'01101110',
'01110101',
'01101111',
'01110010',
'01100001',
'00100000',
'01110011',
'01100101',
'01110100',
'01111001',
'01100010',
'00100000',
'01100100',
'01101110',
'01100001',
'00100000',
'01110011',
'01110100',
'01101001',
'01000010']

# Convert to the usual power of 10
In [13]: lrh = [int(c, 2) for c in lr]

In [14]: lrh
Out[14]:
[33,
115,
117,
32,
100,
110,
117,
111,
114,
97,
32,
115,
101,
116,
121,
98,
32,
100,
110,
97,
32,
115,
116,
105,
66]

# Convert numbers to ASCII characters
In [15]: [chr(n) for n in lrh]
Out[15]:
[‘!’,
‘s’,
‘u’,
‘ ‘,
‘d’,
‘n’,
‘u’,
‘o’,
‘r’,
‘a’,
‘ ‘,
‘s’,
‘e’,
‘t’,
‘y’,
‘b’,
‘ ‘,
‘d’,
‘n’,
‘a’,
‘ ‘,
‘s’,
‘t’,
‘i’,
‘B’]

# And the last touch
# Let’s reverse the direction of the text itself and make the row from the list back.
In [16]: ”.join([chr(n) for n in lrh][::-1])
Out[16]: ‘Bits and bytes around us!’

Well, “and the box just opened.” Indeed, there are different notations and interpretations of how to read, just like in the human world.

Epilogue

First impressions:

A cool, long-awaited solution, I was able to, let it be only a test for a fool, but none the less! It was interesting to go back in my mind to the distant days when I studied these topics: coding, cryptography, little/big endian, and refresh my knowledge.

The main thing:

It’s great that Yandex is not afraid to experiment and has found its way around the situation. This event looks like a really cool and bold move. I’m glad we have companies pushing the industry forward!
Despite the criticism in the next paragraph, I still see this move and event as purely positive. The implementation may be lame, but the strategy is 100% correct.

Criticism:

  1. And it’s called “decipher“?! It’s a shame and a shame not to know the terminology. The public task could have been formed correctly, brrr, it’s still working. It’s not a cipher, but at most steganography. The correct term would be”to transform” or “convertIf these words reveal too much, you can use the general description: “figure out” or “fix” (although the latter is also not quite correct, since the message is not “broken”).

  2. It’s hard to imagine how this task and its chain of solutions could help me and other developers in their normal work. Unfortunately, I realize that the practical benefit of this task is close to zero and objectively it is a waste of time. Personally, I learned Little / Big endian once as “fun to know” and have needed it 0 times since then until today (it’s a wonder I even remembered). There are many levels of abstraction between development and little/big endian. I won’t judge the value of this topic, I’ll just say that there are at least 100 more important things for a developer.
    Such a task is well suited as a single denominator for all areas: network engineers, devops, kernel developers, hardware, etc. (although the latter will still have an indisputable advantage).

Conclusion: bold minus for the task and its wording.

Are you criticizing?! – Offer yours!

My: to answer the question, you need to abstract from the superfluous and ask yourself: What kind of developer do I want to find and what aspect of development reflects this essence? For me, development is associated with three words, maybe it’s different for Yandex: architecture, speed, open source frameworks + ready-made solutions.
This sets the direction in which to think. I will not impose a specific solution, I will only give my example: code variants with violation of OOP / SOLID / code style (pep, black, eslint, etc.).

Hint for managers and HR:
In my experience, evaluating written code is the most effective and easiest way to evaluate a candidate. Jun will find some of the mistakes, Middle will find everything, and Senior will find even the ones you didn’t find yourself. A minimum of time and effort from all participants, all errors are known in advance, evaluations are formalized and objective.

Maybe Yandex had strict limits on the size of the text, the input field, terms, etc., but even under such conditions, I doubt the choice of the task.

Bonus how to do better

(Especially relevant because I am in search).

Hypothesis:
What if all the known good candidates are already hired? The market revolves around the well-known pool of developers, this reflects the term “shortage of personnel”. Perhaps HR should take a fresh look at recruitment. If you use the same metrics at the input, it is strange to wait for a new result at the output.

  1. Detailed list of frameworks and tools + examples of tasks.
    Description: PostgreSQL, FastAPI, Redis, ClickHouse, Linux, development of new features, API integration, code review, etc. narrows down the search area well, but it’s still too shallow. Remove the company from the title and all the jobs will be the same. Analyze what your developers are doing, isolate the commonalities, and post with the job. Ideally – links or screenshots from the task from the tracker.

  2. An opportunity to reveal my unique qualities and my skill leveling branch.
    Most metrics follow static characteristics: experience, algorithms, DB, OOP. It’s not bad, but it’s better. This approach does not allow you to show how quickly the candidate can understand the documentation, catch exceptions, integrate the API, use their 100+500 hotkeys, etc., which, by the way, is 60% of the work, and all this does not fall into the listed metrics above.

  3. Delays and bureaucracy.
    It’s clear here, hiring is a game of broken phone between development, HR, management, etc., any positive fervor breaks on these rocks. Personally, I am always drawn to do a couple of PRs, rather than reading the same type of vacancies and cold mailing resumes. At least my code will make the world a better place for sure. I think most proactive developers think so.

In general, hiring problems is a big topic for a separate article, if you like – I will post my thoughts and non-standard experience about it.

Remember, the world of development is an unplowed field, there are tasks for everyone, from juniors to seniors!

Right now you can make a PR in:

  1. FastAPI template – the most popular project for deploying a new service on the most modern Python web framework.

  2. Heroes of Might and Magic III – Open Source version of the famous game.

  3. Python Telegram Bot is the most popular framework for Python Telegram bots.

  4. The category of unstarted projects due to the healthy pessimism of maintainers.

  5. The category of outdated projects due to the healthy optimism of maintainers.

And there are literally thousands of such projects. Therefore, stagnation or regression in IT is exclusively a business phenomenon, the world of development itself is open to everyone!

Related posts