Making Rust binaries smaller by default / Habr

Making Rust binaries smaller by default / Habr

Have you ever tried to compile a helloworld program on Rust in mode --release? If so, then you saw what is the size of the resulting binary file? Suffice it to say that she is not very small. Or at least wasn’t until recently. In this post, I will explain how I found out about this problem and how I tried to fix it in Cargo.

Analysis of the binary file

I’m a member of the newly organized #wg-binary-size working group, which is looking for ways to reduce the size of binary files for programs and libraries on Rust. Since I also maintain the Rust benchmark suite, my main task in the working group is to improve the tooling for measuring and controlling the size of Rust application binaries.

As part of this work, I recently added a new command to the benchmark suite that allows you to examine and compare the sizes of individual sections and symbols in a Rust binary (or library) between two versions of the compiler.

The result of the team’s work looks like this:

While testing the command, I noticed something interesting. When compiling a test binary in release mode (--release) analysis command revealed that the binary contained debug characters (DWARF). At first I figured there was a bug in my command and that I was accidentally compiling in debug mode. After all, Cargo will not by default add debugging symbols to the binary file in release mode. Is it so?

I spent about fifteen minutes looking for a bug until I realized that there was none in my code. In every Rust binary compiled in release mode, indeed there are debugging symbols and this is already happening long time. There is actually an old Cargo issue (almost seven years old) that mentions this problem.

Why is this happening?

This is a consequence of how the Rust standard library is distributed. When compiling a Rust crate, the standard library is not compiled. [Если только вы не используете build-std, которая, к сожалению, по-прежнему нестабильна.] It comes pre-compiled (usually with Rustup) in a component rust-std. To reduce the amount of downloaded data [и в то же время, как оказалось, чтобы увеличить размер двоичных файлов Rust на диске]it does not come in two versions (with and without debugging symbols), but only in the most general version with debugging symbols.

On Linux (and other platforms), by default, debug symbols are embedded directly in object files or in the library itself (rather than distributed in separate files). So when you reference the standard library, you get a “load” of these debugging symbols in the compiled binary, which inflates the size of the binary.

In fact, this contradicts Cargo’s own documentation, which states that when using debug = 0 (which is the default for release builds), the resulting binary will not contain debug characters. But something completely different is happening.

Addendum: I would like to clarify that Cargo put the Rust standard library in your debuginfo program by default. By default, it is in release mode not included debuginfo of your custom crate.

Why is this a problem?

If you look at the helloworld Rust binary compiled in release mode on Linux, you can see that it is about 4.3 MiB in size. [Протестировано с rustc 1.75.0.] Although storage volumes are much larger today than in the past, it’s still a lot.

You may decide that this is not a problem because those who need smaller binaries will simply cut this information out. This is a good argument – actually, after cutting the debug symbols out of the helloworld binary [например, при помощи strip --strip-debug <binary>] its size is reduced to just 415 КиБi.e. up to approximately 10% of the original. But he hides in the details; in this case – in the details of the default settings.

AND default settings are very important! Rust advertises itself as a language that produces highly efficient and optimal code, but the helloworld application, which takes up more than 4 megabytes of disk space, gives a slightly different impression. I can well imagine the situation, how an experienced developer in C or C++ decides to try Rust, compiles a small program in release mode, pays attention to the size of the resulting file, immediately gives up learning this language and goes to laugh about it on the forums.

Even though this problem can be solved with one call strip, In my opinion, it still remains a problem. Rust tries to appeal to programmers with varying prior development experience, and not everyone knows that there is a process for minifying binaries. Therefore, it is important that Default everything worked better.

It is worth noting that the size of the debug characters libstd on Linux is about 4MB and this size is constant, so while it takes up about 90% of the binary size in helloworld, it will have less effect in larger applications. But still, 4 megabytes is no small thing considering that by default it is added to every compiled Rust binary.

Proposal for changes to Cargo

Realizing that this is standard Cargo behavior, I remembered that I had encountered the same problem again for the third time. It’s just that I never did anything with her before, so I always managed to forget about her.

This time I was determined to make a difference. But where to start? It’s usually enough to ask for advice on Rust Zulip, so I did. It turned out that I was not the first to ask this question, and that it had come up many times over the years. The proposed solution was to cut debug symbols by default from Rust applications in release mode, which would solve the problem of bloated binaries. However, this has been hindered in the past by stabilizing support strip in Cargo, but this task was already solved at the beginning of 2022.

So why was this proposal never implemented? Are there any major obstacles or challenges? Actually no. When I asked Zulip, almost everyone thought it was a good idea. And while there have been attempts to do this in the past, no one has usually pushed hard enough.

So, it was not done, because no one has done it yet. So I decided to remedy the situation. To test if clipping the default debug symbols would work, I created a PR for the compiler and ran a performance benchmark. The binary results (for small crates) looked pretty good, and that gave me hope that cutting out the default debug symbols might work.

Interestingly, this change also almost doubled the compilation speed of small crates (like helloworld) on Linux! How can this be if we do more work by adding a clipping operation to the compilation process? Well, it turns out that the standard Linux linker (bfd) is terribly slow [однако недавно была попытка ускорить его], so by cutting the debug symbols out of the binary, we actually reduce the work of the linker, which speeds up compilation. Unfortunately, this effect is only noticeable on very small crates.

Work is currently underway to make Linux use a faster linker by default (lld) (Again, the default settings are important).

After showing these results to the Cargo maintainers, they asked me to write a proposal for the original Cargo issue. In this mini-proposal, I explained what changes I want to make to Cargo’s default settings, how they will be implemented, and what these changes will affect.

For example, it has been observed that if we cut out the debug symbols by default, the backtraces of release images… will not contain debug information such as line numbers. This is indeed the case, but I would argue that they were still not useful. If your binary contains debug symbols only for the standard library and not for your own code, then even though the backtrace will contain line numbers from stdlibIt won’t give you any useful context (difference can be compared here). There were also implementation issues, such as how to handle situations where only some of the target platform’s dependencies need debug symbols. Details can be read in the sentence.

After the proposal was made, it underwent a process FCP. The Cargo team voted on it, and after its acceptance and a ten-day waiting period for final questions (FCP) I was able to implement the proposal, which turned out to be a surprisingly simple process.

The PR was broken a week ago and now the fix is ​​in the nightly build!

In short, the change is that Cargo will now default to using strip = "debuginfo" for the profile release (unless you explicitly request debuginfo for some dependency):

[profile.release]
# v Теперь используется по умолчанию, если не указано иное
strip = "debuginfo"

In fact, the new default setting will be used in all profilesin the dependency chains of which debuginfo is not included anywhere, not only in the profile release.

There is one outstanding usage issue strip on macOS as it seems to be having some issues. The change has been in the nightly build for about a week now, and I haven’t noticed any issues, but if they do, we can also selectively cut debug symbols only on some platforms (like Linux and Windows). Let us know if you have any trouble clipping with Cargo on macOS!

Conclusion

It ended up being another example of the “if you want something done, do it yourself” approach that is often found in open source projects. I’m glad the change was made and I hope we don’t find any major issues with it and it stabilizes in the coming months.

If you have any comments or questions, feel free to post them on Reddit.

Related posts