A disk is just a bunch of bits

A disk is just a bunch of bits

Have you ever heard the statement that a disk or memory is just a bunch of bits?

I don’t know exactly where this idea came from, but it is quite reasonable and to some extent dispels the mysterious aura around computers. For example, it disproves the theory that a very flat elf lives inside my PC.

Turns out no, it contains bits encoded in electrical components.

And yet computers, as before, retain their mystery. What are these bits? What do they mean? Can we play with them, help them, understand them?

Next, I will show you that all of this is definitely possible! For your entertainment, I’ll reach into my PC, pull out a bunch of bits, and we’ll study them with you.

But which bits are the best to learn? To do this, we will analyze the method of presenting the file on the disk.

Suppose we have a file /data/example.txt:

$ cat /data/example.txt

Hello, world!

And here a big question arises: And where is Hello, world!?

Among other things, you probably know that files have permissions (for example, a file can be executable), an owner, a creation timestamp, and so on. Where is this metadata stored?

I mean literally, where are the actual bits that store this information? Let’s find them and try to crack them.

But first a little theory.

▍ How do files work?

The following refers to the ext4 file system typically used in Linux (in fact, the entire article refers to ext4). Although these principles apply to most file systems.

What is it anyway? /data/example.txt? This is a so-called directory entry, which is just a name. example.txt.

Directory entries are stored on disk, but they are not particularly interesting because they are just names.

But the name says something, doesn’t it? What is it called? example.txt? The element named by him is called an index descriptor (inode).

Here index descriptors are already interesting. When we say: “The file is on disk”, we mean that “The index handle is on disk”. This is a collection of bits that describe the file.

The descriptor stores almost all information about the file, such as the previously mentioned metadata.

That’s almost the end of the theory. You should be aware that index descriptors, files, and directory entries are all elements of the file system. A file system is software that converts bits on a disk into the files and directories we know.

Now you can start practicing.

▍ Analysis of index descriptors

Let’s start by outputting the metadata of the index descriptor using the command

stat

.

$ stat /data/example.txt

  File: /data/example.txt
  Size: 14              Blocks: 8          IO Block: 4096   regular file
Device: 831h/2097d      Inode: 11          Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  dmitry)   Gid: ( 1000/  dmitry)
Access: 2023-07-18 13:53:20.808536879 +0100
Modify: 2023-07-10 15:18:48.199691583 +0100
Change: 2023-07-18 14:52:26.349625767 +0100
 Birth: 2023-07-10 15:18:48.199691583 +0100

Do not seek to understand this conclusion in detail. Just keep in mind that you are presented with metadata such as file size, owner name, and timestamps. All of this information came from the index descriptor (except for the name, which is taken from the directory entry,

as well as the numbers of the descriptor itself

).

▍ Analysis of index descriptor content

But we want to see exactly the bits of the descriptor in question. What should be done for this?

Experienced kernel developer

Ted TS’o

maintains a set of filesystem debugging tools called e2fsprogs. With one of these tools, debugfs, we can play with the index handle.

debugfs has a powerful command that will output the raw binary representation of the handle. Excerpt from the manual:

inode_dump filespec
Outputs the contents of the index descriptor in hexadecimal and ASCII formats.

So below I provide the promised binary representation, albeit not as zeros and ones like 0011000, as binary data is much easier to read when represented in

hexadecimal

in the form

$ sudo debugfs /dev/sdd1

debugfs:  inode_dump example.txt
0000  b481 e803 0e00 0000 408b b664 1a99 b664  [email protected]
0020  4813 ac64 0000 0000 e803 0100 0800 0000  H..d............
0040  0000 0800 0100 0000 0af3 0100 0400 0000  ................
0060  0000 0000 0000 0000 0100 0000 0082 0000  ................
0100  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0140  0000 0000 9933 e68b 0000 0000 0000 0000  .....3..........
0160  0000 0000 0000 0000 0000 0000 2349 0000  ............#I..
0200  2000 5e0c 9c76 5b53 fc34 9c2f bc2c c5c0   .^..v[S.4./.,..
0220  4813 ac64 fc34 9c2f 0000 0000 0000 0000  H..d.4./........
0240  0000 0000 0000 0000 0000 0000 0000 0000  ................
*

All clear? Class – thank you for your attention!

Just kidding, of course. Seeing the actual raw data of the index handle is really nice, but we still don’t know where on disk it is and what those bits mean.

▍ Where is the descriptor located on the disk?

So, let’s look for a descriptor on the disk. Here, debugfs will help us again:

imap filespec
Returns the location of an index descriptor (in the index descriptor table) on its filespec.

Now we will determine its location.

debugfs:  imap example.txt                                                                  
Inode 11 is part of block group 0
        located at block 73, offset 0x0a00

Let me explain: the file system is divided into blocks. In my case, the block size is 4096 bytes (the default for many Linux distributions). That is, this conclusion reports that: “From the beginning of the file system, you need to go 73 blocks, that is, 73 * 4096 bytes.” This, in a sense, tells us on which street the index handle being searched for is located. At the same time, the number of his house will be eliminated:

0x0a00

bytes In decimal format, this is 2560 bytes (

Why 2560?

).

So, to find our descriptor, we need to skip 4096*73+2560=301568 bytes from the beginning of the disk partition (which is also the beginning of the file system).

We will do so. Let’s extract the raw bits from the disk and see if they match the conclusion debugfs inode_dump.

$ sudo dd if=/dev/sdd1 bs=1 skip=301568 count=256 2>/dev/null | hexdump -C

00000000  b4 81 e8 03 0e 00 00 00  40 8b b6 64 1a 99 b6 64  |[email protected]|
00000010  48 13 ac 64 00 00 00 00  e8 03 01 00 08 00 00 00  |H..d............|
00000020  00 00 08 00 01 00 00 00  0a f3 01 00 04 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  01 00 00 00 00 82 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000060  00 00 00 00 99 33 e6 8b  00 00 00 00 00 00 00 00  |.....3..........|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 23 49 00 00  |............#I..|
00000080  20 00 5e 0c 9c 76 5b 53  fc 34 9c 2f bc 2c c5 c0  | .^..v[S.4./.,..|
00000090  48 13 ac 64 fc 34 9c 2f  00 00 00 00 00 00 00 00  |H..d.4./........|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100

This is where the ASCII representation comes in handy: visually, we can see that the content looks identical to the output

debugfs inode_dump

!

We found out where the index handle is on the disk!

This is already much cooler than just a conclusion inode_dump. In this case, we asked the program written by the kernel developer to tell us what the handle looks like. Here we found this information right on the disk ourselves.

But we still don’t know what these bits mean. Can they be split?

▍ Parsing bare bits

I sat on this question for several weeks. How do you get a computer to convert a bunch of bits into an index descriptor?

As a result, it came to me: “After all, this is what the structure is used for!”

You could meet with structures. These are such peculiar objects from dynamic programming languages, but they are not completely clear from the start.

Now I think of them like this: suppose you’re looking at a bunch of bits. In this case, the structure is simply a specification that explains the meaning of these bits.

So the Linux kernel has to define a structure for the descriptor somewhere, right? So it is!

/*
 * Структура индексного дескриптора на диске
 */
struct ext4_inode {
	__le16	i_mode;		/* File mode */
	__le16	i_uid;		/* Low 16 bits of Owner Uid */
	__le32	i_size_lo;	/* Size in bytes */
    /* ... */
}

I’ll reiterate: it’s because of the structure that the ext4 file system knows how to spread the bits we’ve seen before. This is a kind of universal key.

In essence, it says: “The first 16 bits are the permission of the file, the next 16 bits are its owner, the next 32 bits are the size of the file, and so on” (although everything is slightly complicated by the addition).

Let’s use this structure to parse the previously extracted bits.

We will write a small C program that will do the following:

  1. Ask the computer to allocate 256 bytes in memory (because this is the size of the ext4_inode structure).
  2. Ask him to copy 256 bytes from the allocated memory /dev/sdd1/ at the address 301568.
  3. Explain to him how to parse these bytes using a structure ext4_inode.

Here is the C program described (abbreviated to the main points):

// открываем файл раздела
int fd = open("/dev/sdd1/", O_RDONLY);

// перемещаем головку привода к местоположению дескриптора
lseek(fd, 301568, SEEK_SET);

// инициализируем структуру и копируем 256 байтов с диска в память
struct ext4_inode candidate_inode;
read(fd, &candidate_inode, sizeof(struct ext4_inode));

// теперь можно обращаться к полям дескриптора!
printf("User:  %u", inode->i_uid);

Here

the whole program

with error checking. You can assemble it and test it on your own PC.

Now let’s do it! Amazed?! If everything works fine, then we have successfully parsed the bit structure.

$ sudo ./parse /dev/sdd1 301568

Inode: 11   Mode:  0664
User:  1000   Group:  1000   Size: 14
Links: 1   Blockcount: 8
Inode checksum: 0x0c5e4923

Hooray! We got information about the index handle!

To make sure the information received is correct, we will look at the conclusion debugfs stat example.txt. See, all common fields, and the main checksum field, match.

debugfs: stat example.txt

Inode: 11   Type: regular    Mode:  0664   Flags: 0x80000
Generation: 2347119513    Version: 0x00000000:00000001
User:  1000   Group:  1000   Project:     0   Size: 14
File ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x64b6991a:535b769c -- Tue Jul 18 14:52:26 2023
 atime: 0x64b68b40:c0c52cbc -- Tue Jul 18 13:53:20 2023
 mtime: 0x64ac1348:2f9c34fc -- Mon Jul 10 15:18:48 2023
crtime: 0x64ac1348:2f9c34fc -- Mon Jul 10 15:18:48 2023
Size of extra inode fields: 32
Inode checksum: 0x0c5e4923
EXTENTS:
(0):33280

I think it’s super cool. We decided to find the bits of the index descriptor on the disk, found them and then figured out their meaning.

▍ Memory is also a bunch of bits

But that’s not all.

At the beginning of the article I said that disks and memory are a bunch of bits. Our program copies the bits of the index handle into memory, right?

So we have to be able to find those bits in memory and make sure they are the same bits that came to it from disk. (Note on supplementing the structure).

To do this, we will execute the program in the gdb debugger (similar to pdb in Python). With its help, we will stop the program process and trace it in memory.

$ sudo gdb parse
(gdb) break 167
(gdb) run /dev/sdd1 301568
(gdb) x/160xb &candidate_inode
0x7fffffffe410: 0xb4    0x81    0xe8    0x03    0x0e    0x00    0x00    0x00
0x7fffffffe418: 0x40    0x8b    0xb6    0x64    0x1a    0x99    0xb6    0x64
0x7fffffffe420: 0x48    0x13    0xac    0x64    0x00    0x00    0x00    0x00
0x7fffffffe428: 0xe8    0x03    0x01    0x00    0x08    0x00    0x00    0x00
0x7fffffffe430: 0x00    0x00    0x08    0x00    0x01    0x00    0x00    0x00
0x7fffffffe438: 0x0a    0xf3    0x01    0x00    0x04    0x00    0x00    0x00
0x7fffffffe440: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe448: 0x01    0x00    0x00    0x00    0x00    0x82    0x00    0x00
0x7fffffffe450: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe458: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe460: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe468: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe470: 0x00    0x00    0x00    0x00    0x99    0x33    0xe6    0x8b
0x7fffffffe478: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe480: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe488: 0x00    0x00    0x00    0x00    0x23    0x49    0x00    0x00
0x7fffffffe490: 0x20    0x00    0x5e    0x0c    0x9c    0x76    0x5b    0x53
0x7fffffffe498: 0xfc    0x34    0x9c    0x2f    0xbc    0x2c    0xc5    0xc0
0x7fffffffe4a0: 0x48    0x13    0xac    0x64    0xfc    0x34    0x9c    0x2f
0x7fffffffe4a8: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

That’s not easy to read, so I used it

script

which made gdb’s output look more like output

hexdump -C

:

b4 81 e8 03 0e 00 00 00 40 8b b6 64 1a 99 b6 64
48 13 ac 64 00 00 00 00 e8 03 01 00 08 00 00 00
00 00 08 00 01 00 00 00 0a f3 01 00 04 00 00 00
00 00 00 00 00 00 00 00 01 00 00 00 00 82 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 99 33 e6 8b 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 23 49 00 00
20 00 5e 0c 9c 76 5b 53 fc 34 9c 2f bc 2c c5 c0
48 13 ac 64 fc 34 9c 2f 00 00 00 00 00 00 00 00

Let’s compare it with the bits on the disk:

b4 81 e8 03 0e 00 00 00  40 8b b6 64 1a 99 b6 64
48 13 ac 64 00 00 00 00  e8 03 01 00 08 00 00 00
00 00 08 00 01 00 00 00  0a f3 01 00 04 00 00 00
00 00 00 00 00 00 00 00  01 00 00 00 00 82 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00 00 00 00 99 33 e6 8b  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 23 49 00 00
20 00 5e 0c 9c 76 5b 53  fc 34 9c 2f bc 2c c5 c0
48 13 ac 64 fc 34 9c 2f  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00

Everything matches!

An asterisk simply means that the string has been filled with all zeros. The observant reader will also notice that the last 16 bytes of zeros in gdb’s output are missing – I believe this is a consequence of optimization performed by the compiler.

Here we can see that the bits on the disk and in the memory match. It may seem obvious in retrospect, but we saw it firsthand.

▍ And where are the contents of the file?

I know you might be thinking, “We haven’t seen the actual contents of the file.”

Right. The index handle does not store this information. She is somewhere else.

I will briefly explain why. You can think of a file system as consisting of two components: a set of boxes where the contents of the files are stored, and a database that manages those boxes. This is a kind of distributed system where you store records in a database (all metadata), but put the actual content of the files in storage like S3 or disk.

So the descriptor does not contain the contents of the file, it only points to it.

Let’s pair this information from our descriptor. Excerpt from the manual:

blocks filespec
Outputs to stdout the blocks used by the index descriptor specification.

debugfs:  blocks example.txt
33280

It says that the contents of the file occupy 33280 blocks of 4 KiB from the beginning of the file system. (

And was it possible to get this location directly from the descriptor structure?

)

Let’s make a dump of the corresponding disk area.

$ sudo dd if=/dev/sdd1 skip=33280 bs=4096 count=1 2>/dev/null | hexdump -C
00000000  48 65 6c 6c 6f 2c 20 77  6f 72 6c 64 21 0a 00 00  |Hello, world!...|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Here he is! Our Hello, world! straight from disk.

▍ Why did we eventually learn?

So what did we learn and do?

We started with the common statement that disk and memory are a bunch of bits.
Next, we set ourselves the task of getting acquainted with these bits, in particular with those that encode disk files: index descriptors.

As a result, our acquaintance turned out to be very close: we found them on disk, parsed them with a program that loaded them into memory and applied a structure to them, then looked into the corresponding area of ​​\u200b\u200bmemory and saw there exactly the same bits that were on the disk.

In parallel, we learned a little about the ext4 file system (and file systems in general).

For me personally, this experiment became one of the most useful computer revelations of my entire experience. After him, the mystique of the computing field dissipated a little for me, and I hope for you too.

▍ Footnotes

And also the numbers of the descriptor itself

. The number of the descriptor itself (in the case of 11) is also stored in it. Instead of a number, it indicates the position in the table of index descriptors.

Why 2560? Let me remind you that the descriptor number is 11. This means that 10 other descriptors precede it on the disk. Each descriptor is 256 bytes in size, meaning they occupy 2560 bytes. ↩

Everything is a little complicated with an addition. Technically, the computer complements the structure, that is, it inserts an empty space into it. Because of the war, the order of the bits in the structure is determined imprecisely. However, given that the handle was generated on the same computer it will be read on, this means that the structure is effectively a universal key to the random bits that appear. ↩

A note on supplementing the structure. Earlier I said that the compiler adds to the structure by inserting bytes between the fields. As a result, the representation of data in memory is difficult to compare with its representation on disk. In this regard, I have reduced the addition as much as possible by adding to the definition of the structure __attribute__((__packed__)). That is why the memory dump I showed showed only 160 bytes — this sizeof(struct ext4_inode), when the app is disabled. ↩

And it was possible to get the location directly from the index descriptor? We could also parse the location from the handle we loaded into memory using a field i_block. But the content is an obscure array that would require additional code to decipher. It was easier to refer to debugfs which did it for us. If anyone is interested, this array looks like this:

(gdb) p candidate_inode->i_block
$1 = {127754, 4, 0, 0, 1, 33280, 0, 0, 0, 0, 0, 0, 0, 0, 0}

Here is the value 33280 that we saw in the debugfs output.

Telegram channel with prize draws, IT news and posts about retro games 🕹️

Related posts