Bazel, stamping, remote cache (part 2)

Bazel, stamping, remote cache (part 2)

Bazel has two very useful features:

  • stamping – allows you to embed in the artifact data about which commit you can collect a similar artifact from;

  • remote cache and remote build – allows you to have a shared cache between collectors or even collect artifacts on a farm.

Previously, unfortunately, these features were mutually exclusive, but starting with Bazel 7.0, you can use stamping from the remote cache using scrubbing. Bazel 7.1 version was released today, in which it became possible to use stamping from remote build.

I wrote more about this problem earlier in the article Bazel, stamping, remote cache.

What is stamping?

Stamping allows you to add to an artifact information about which version a similar artifact can be collected from.

Not the version for which the build was started, but the one from which you can get an analogue.

For example, there are two commits that differ only in the README file. An executable compiled from these commits can then contain the same commit as information about which revision it can be compiled from, since changes between these revisions do not affect it in any way.

This allows, on the one hand, to have information about which revision an equivalent artifact can be assembled from, and on the other hand, not to reassemble it for each commit.

How does stamping work?

Internally, stamping is implemented simply: the files transferred for embedding the version in the artifact are excluded (in the case of Bazel: bazel-out/volatile-status.txt) from the caching key.

Thus, the reassembly of the artifact occurs only if at least one of the input parameters, except for the version data file, has changed.

What is the problem with remote cache?

Bazel has several caches. Bazel internal cache and remote cache have different caching keys. Bazel uses the same cache key for disk cache/remote cache/remote build (disk cache is a special case of remote cache).

The problem is that the action caching key for the farm build or remote cache is the hash from the build task. This hash is affected by all incoming data and the semantics of files for stamping are not extended. That is, the files for stamping affect the hash of the assembly task.

This way we get a situation where any build always gets a unique version information data file and never gets cached.

The most unpleasant thing is that even marking the rule with stamping for local assembly through tags does not correct the situation – we will receive the same artifact only if we get into the cache with a previous assembly on the same collector.

What is scrubbing?

Bazel 7.0 introduced scrubbing. It allows you to generate a caching key for the remote cache.

Example:

  • add salt when hashing;

  • replace assembly arguments;

  • exclude incoming files from the cache key.

In the case of stamping, a file can be excluded from the caching key bazel-out/volatile-status.txt and we will get the same behavior when using remote cache as when compiling locally.

In addition, scrubbing allows you to solve the problem when you need to use some derivative of bazel-out/volatile-status.txt to embed version data.

An example of using scrubbing

To use scrubbing, you need to create a scrubbing configuration file, for example:

rules {
  matcher {
    kind: "stamping"
    mnemonic: "Example"
  }
  transform {
    omitted_inputs: "^bazel-out/volatile-status\\\\.txt$"
  }
}

The list of valid fields can be viewed here: https://github.com/bazelbuild/bazel/blob/master/src/main/protobuf/remote_scrubbing.proto

The transformation of the first rule that fits the stated criteria is applied to the composite action.

In order for the scrubbing configuration to be used during assembly, it must be passed as a parameter --experimental_remote_scrubbing_config.

What is the problem with scrubbing and remote cache?

Bazel 7.0 when trying to use a parameter --experimental_remote_scrubbing_config with remote assembly, we get the error: Cannot combine remote cache key scrubbing with remote execution

Fortunately, in Bazel 7.1 the behavior has changed (https://github.com/bazelbuild/bazel/pull/21384): instead of a global error, scrubbing actions are performed on the local host.

This allows you to use stamping and folding on the farm, but you need to pick it up very carefully matcher-s for rules:

  • it is necessary that everything that uses stamping falls under them, because otherwise there will be a constant miss past the cache (regardless of whether this transformation changes the value of the caching key);

  • it is necessary that the excess does not fall under them, because it will stop being collected on the farm and will be collected locally (but the remote cache will be used).

Let’s summarize

With the Bazel 7.1 version, it became possible to use stamping and remote assembly, although not without problems.

I hope that after some time the transformation interface for scrubbing will be fixed and its support will appear in the protocol for remote assembly. This should remove the local execution restriction and allow all build tasks to run on the farm.

Related posts