Practical Bazel: Part 1

19 Apr, 2022

Bazel is a great build system, which can accelerate the development of large-scale software. It practices the principle of trading space for time by caching the results of expensive build computation.

This article won’t cover what it is or how it works in detail. Instead, I will demonstrate how to get started with Bazel quickly with extremely simple code. We don’t need the Java or C++ rules of bazel. For this article, we will mainly use shell scripts to mock compilers.

Install `bazelisk`

Bazelisk (https://github.com/bazelbuild/bazelisk.git) is a launcher for Bazel which can handle different versions of Bazel.

After installing it, we only need to specify the version of Bazel we want to use in the .bazelversion file of the project root.

Create workspace

We can create a new directory with any name and create a new Bazel workspace (a WORKSPACE file) in it.

$ mkdir practical-bazel-1
$ cd practical-bazel-1
$ touch WORKSPACE

After that, we can test whether bazelisk would work:

$ bazel build
Starting local Bazel server and connecting to it...
WARNING: Usage: bazel build <options> <targets>.
Invoke `bazel help build` for full description of usage and options.
Your request is correct, but requested an empty set of targets. Nothing will be built.
INFO: Analyzed 0 targets (0 packages loaded, 0 targets configured).
INFO: Found 0 targets...
INFO: Elapsed time: 1.578s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

Create a simple build rule

The next step is to create a BUILD file and add a simple genrule in it:

$ touch BUILD

in this BUILD file

genrule(
  name = 'hello',
  outs = ['hello.txt'],
  cmd = 'echo "Hello World" > $@',
)

You could infer that the $@ is the output file name, which is hello.txt in this context. genrule means the rule that generates files, using bash or other supported shells. In this shell, it doesn’t matter what you execute. You can even use wget or curl to download a file remotely. But it is not recommended since the output is not guaranteed to be the same (for more details, please read Hermeticity).

To test this build rule:

$ bazel build //:hello

Why we need Bazel

The build looks pretty fast in the example above. So why we need Bazel?

Here’s a minimal example. If you add a new block to the BUILD file:

genrule(
  name = 'hello-slow',
  outs = ['hello-slow.txt'],
  cmd = 'sleep 5; echo "Hello World" > $@',
)

Without build cache, bazel build //:hello-slow will always take 5 seconds to execute. But if we use Bazel with build cache, it will only take less than 1 second:

# This is the second time
$ bazel build //:hello-slow
INFO: Analyzed target //:hello-slow (1 packages loaded, 1 target configured).
INFO: Found 1 target...
Target //:hello-slow up-to-date:
  bazel-bin/hello-slow.txt
INFO: Elapsed time: 0.055s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

If you’d like to retry a clean build:

$ bazel clean --expunge

A bit into the real world

If you’re working on a real-world project, a shell command won’t be sufficient to build the product. The compiler would build the source code, and then the linker would link the object files together into a library or executable (binary or artifact). It is like:

$$ binary = Linker(Compiler(code)) $$

What Bazel is doing here is is nothing magic. It is like a wrapper with extra features (distributed cache, build cache, parallelism, etc.):

$$ binary = Bazel(Linker(Compiler(code))) $$

(Note: this is for quick understanding, not for accuracy.)

What will determine whether an existing build cache is valid? Bazel can compute the hash of the source code and use it as the cache key. If the hash is different, the cache will be invalid. Thus, a clean build will always be triggered.

Let’s make the example a bit more complex. We will include srcs to genrule:

genrule(
  name = "hello-slow-dummy-srcs",
  srcs = ["src/dummy-src.txt"],
  outs = ["hello-slow-dummy-srcs.txt"],
  cmd = "sleep 5; echo $$(cat src/dummy-src.txt) > $@",
)

We need to create the dummy-src.txt file:

$ mkdir src
$ touch dummy-src.txt
$ echo "content of dummy-src.txt" > dummy-src.txt

We can try it now:

$ bazel build //:hello-slow-dummy-srcs

For the first time, it would take 5 seconds. Then the second time, it would be much faster. However, when you change the content of src/dummy-src.txt, it would be “slow” again.

This is basically how Bazel works locally.

For a project with hundreds or thousands of modules, the clean build would be very slow. However, most commits or Pull Requests won’t change many of these modules. Bazel can cache these build results, with Remote Caching (https://bazel.build/docs/remote-caching). In this case, the incremental build will be much faster.

This picture from https://bazel.build/basics/distributed-builds demonstrates how it works in practice:

distributed-build-remote-cache

What can we go from here?

Bazel is a great tool and many teams in the industry use it. However, the learning curve is very steep. While applying Bazel in an existing large project, it is better to use a bottom-up approach. We can start by building the leaf nodes in our dependency graph.

In addition to Remote Caching, Bazel is also designed to build projects in a distributed manner. This means that the build will be distributed across multiple (inexpensive) machines. Bazel’s slogan is “{ Fast, Correct } — Choose two”. In fact, I believe we can “Choose Three”, Fast, Correct, Inexpensive.

You can find the source code of this article on this GitHub repository.

#Client-Tech #Server-Tech