acceleration of C++ compilation

acceleration of C++ compilation

This article is about increasing compilation speed

libraries {fmt}

to the level of the C input-output library



The day began with a little theory. {fmt} is a popular open source C++ library that provides a more efficient alternative to the C++ library iostreams and the library of Si stdio. She bypassed the latter in a number of aspects:

  • Type safety with format string checks at compile time. These checks are enabled by default since ++20, and are present as an add-on for ++14/17. Format the lines of the runtime environment in {fmt} also turn out to be safe, which cannot be achieved in printf.
  • Extensibility. A user-defined type can be made formatted. However, most types of standard libraries, such as datetime containers and packages, offer the ability to format first.
  • Productivity. {fmt} much faster than any common implementation printfsometimes by several orders of magnitude (for example, in floating-point number formatting).
  • Portability of Unicode support.

However, one of the areas in which


ahead as before


was compile time.

We put a lot of effort into compile time optimization {fmt}by applying type erasure at the argument and output levels, restricting templates to a small top-level API layer, and adding fmt/core.h with a minimum number of dependencies.

In the end {fmt} began to compile faster than such C++ alternatives as iostreams, Boost Format and Folly Formatbut to speed stdio still didn’t make it. We understood that the bottleneck is dependency <string>but it was required for the main API, fmt::format.

Later it became clear that in some cases the use std::string is not necessary. To quote Sean Middleditch’s comment from GitHub:

If I don’t use std::string (and it is), then I don’t want to involve heavy dependencies for this header and for each broadcast unit that can perform some formatting (and therefore requires access to specializations) formatter<>).


became more and more frequent

be used for I/O

and logging libraries where objects


may appear only as arguments at some points in the call.

And the most important use case of them all is naturally the Godbolt project, in which {fmt} often used for derivation, especially unsupported printfand here several hundreds of overhead milliseconds are noticeable.

On the other hand, C++ is hard to avoid <string>. When using any part of the library, it will probably be pulled in transitively. In addition, the compilation time turned out to be quite tolerable, and since I had other tasks, I did not deal with this issue for a long time.

However, with the release of C++20, the situation has changed a lot. Take a look at the following Hello World program with simple formatted output (

#include <fmt/core.h>

int main() {
fmt::print("Hello, {}!\n", "world");

In the case of C++11, it took ~225ms to compile via Clang on my M1 MacBook Pro (here and below I show the best result of three executions):

% time c++ -c -I include -std=c++11
c++ -c -I include -std=c++11 0.17s user 0.04s system 90% cpu 0.225 total

Now when working in C++20, the same process takes ~319 ms, that is, it turns out to be 40% longer:

% time c++ -c -I include -std=c++20
c++ -c -I include -std=c++20 0.26s user 0.05s system 95% cpu 0.319 total

For comparison, here is an equivalent C program (



#include <stdio.h>

int main() {
printf("Hello, %s!\n", "world");

And it compiles in just ~33ms:

% time cc -c hello-stdio.c
cc -c hello-stdio.c 0.01s user 0.01s system 68% cpu 0.033 total

It turns out that due to the uncontrolled bloat of the standard library between C++11 and C++20, the compilation became about 10 times slower compared to


– And all because of inclusion


. Can something be done about it?

As it turned out, the erasure of types minimized the presence of fmt/core.h dependence on std::stringso I decided to try to remove it. But first, let’s look at the compilation process in more detail by tracing:

c++ -ftime-trace -c -I include -std=c++20

We will also open


in Chrome using



Time spent alone fmt/core.hIt is only 7.5 ms and mainly consists of:

  • <iterator>: ~71 ms;
  • <memory>: ~37 ms;
  • <string>: ~122 ms (highlighted in the trace above).



indeed takes the longest, but what about the others? Unfortunately, removing the other components will not change the situation, since the volume of material being transitively tightened will remain approximately the same. These header files are shown in the trace only because they are included in the



Googled the question well, I found out that, thanks _LIBCPP_REMOVE_TRANSITIVE_INCLUDESsomething can be done in libc++. Let’s try:

% time c++ -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -c -I include -std=c++20
c++ -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES -c -I include -std=c++20 0.18s user 0.03s system 91% cpu 0.231 total

So this reduced the compile time to ~231ms, almost C++11 level. Not bad, although to


still far

But in the absence of transitive dependencies, it now makes sense to get rid of them <iterator> and <memory>.

<memory> is used in only one place for std::addressof as a workaround for the broken implementation std::vector<bool>::reference in libc++which provides an innovative way of overloading the unary operator &. This is the place:

custom.value = const_cast<value_type*>(std::addressof(val));

We can replace it with several pointer operations, at the cost of losing the ability to format directly


at compile time, which I’m perfectly fine with:

if constexpr (std::is_same<decltype(&val), T*>::value)
custom.value = const_cast<value_type*>(&val);
if (!is_constant_evaluated())
custom.value = const_cast<char*>(&reinterpret_cast<const char&>(val));

Now that we don’t have any more


(I would prefer


about this workaround (pun intended,

don’t have memory of

, – note trans.)), the compilation time was reduced to ~195 ms, already better than the initial indicator of C++11.

Removal will prove to be a trickier task since we use back_insert_iterator to detect and optimize formatting in unbreakable containers. Unfortunately, this cannot be detected even with SFINAE, because back_insert_iterator has the same API form as and front_insert_iterator. This problem has various solutions, for example moving the optimization to fmt/format.h. I’ve added a simple local replacement for now, fmt::back_insert_iterator. Without <iterator> compilation time reduced to ~178ms.

This is the right time to get started <string>but as it turns out, we also inadvertently included <string_view>or <experimental/string_view> (Breath). It does not add direct costs because it is still drawn from <string>but we need to remove one to get rid of the other. We already have a trait class in scopes for API detection, similar to std::string_viewand we can apply it with some simplification:

template <typename T, typename Enable = void>
struct is_string_like : std::false_type {};

// Эвристика для обнаружения std::string и std::string_view.
template <typename T>
struct is_string_like<T, void_t<decltype(std::declval<T>().find_first_of(
typename T::value_type(), 0))>> : std::true_type {

This can give false positives, but they will turn out to be innocent, because at worst it will cause a type that looks like a string to be formatted as a string. If you can always opt out.

Here we come to the final boss, <string >. IN fmt/core.h there were very few references to std::string. However, we also had one std::char_traitswhich we used in the fallback implementation string_viewnecessary compatibility with C++11. char_traits didn’t have much value, so it was easily replaced by C functions such as strlen and its backup options for constexpr.

The only API I used std::stringwas fmt::format. One of the options was to move it to fmt/format.h. But that would be a critical change, so I decided to take the terrible but not disruptive step of pre-announcing std::basic_string. Such actions are not approved, but it is not worse than what we had to do in {fmt}to get around the limitations of the C and C++ standard libraries. Here’s a slightly simplified version:

template <typename Char>
struct char_traits;
template <typename T>
class allocator;
template <typename Char, typename Traits, typename Allocator>
class basic_string;
# include <string>




are determined depending on the implementation. Both leading standard libraries are currently supported,





Of course, with our definition fmt::format this didn’t work:

template <typename... T>
FMT_NODISCARD FMT_INLINE auto format(format_string<T...> fmt, T&&... args)
-> basic_string<char> {
return vformat(fmt, fmt::make_format_args(args...));

And we got this error:

In file included from
include/fmt/core.h:2843:31: error: implicit instantiation of undefined template 'std::basic_string<char, std::char_traits<char>, std::allocator<char>>'
FMT_NODISCARD FMT_INLINE auto format(format_string<T...> fmt, T&&... args)

As is often the case in C++, the solution was to use extras

levels of redirection


template <typename... T, typename Char = char>
FMT_NODISCARD FMT_INLINE auto format(format_string<T...> fmt, T&&... args)
-> basic_string<Char> {
return vformat(fmt, fmt::make_format_args(args...));

Now let’s check if it was worth it:

% time c++ -c -I include -std=c++20
c++ -c -I include -std=c++20 0.04s user 0.02s system 81% cpu 0.069 total

We’ve reduced the compile time from ~319ms to ~69ms and no longer need it


. As a result of all the optimizations


became comparable with


in terms of compilation time – testing showed only a 2-fold difference in speed. I think it’s a reasonable price to pay for increased security, speed and expandability.

▍ PS

After optimization


became the second most important inclusion, increasing the compilation time by a full 5ms.

Discounts, raffle results and news about the RUVDS satellite — in our Telegram channel 🚀

Related posts