Faster Spring Boot Startup With The New JVM Checkpoint API

Benchmarks and a project template you can use.

Michael Böckling 2023-08-29

Contents

# Introduction

Spring Boot does a lot of reflection, which means it can take a while to startup. Before you rewrite your application in Quarkus, consider a brand-new toy the JVM creators have given us: CRaC - Coordinated Restore at Checkpoint. This is a new Java API to make snapshots of running instances and the ability to restore them. The goal is to be able to significantly reduce startup and warmup times.

Spring Boot just got support for this, so lets see how it works and how much faster it is.

The CRaC library relies on CRIU, which does most of the heavy lifting. If you have never heard about it, it is a pretty impressive piece of tech that was originally created to live-migrate processes, but this also means this functionality is limited to Linux - for now.

This means that when you are on a Mac, you need a Linux environment, and so your first idea is probably to use Docker containers. This is a bit tricky, because CRIU needs elevated privileges to create a snapshot, and by default Docker doesn't just hand those out. It is also not straightforward to create a checkpoint in a Dockerfile as part of the image building process, since the RUN command in a Dockerfile doesn't (easily) support elevated privileges. There is a way to do this using DinD (Docker in Docker), and you can check out how Spring did it, but we're using a simpler method.

# Project template

There is a bare-bones demo, but I wanted to test it with a more or less fully-blown Spring Boot setup containing HTTP networking and a database setup. To that end I have modified the Spring Petclinic example project and added the missing parts, so that you can check out the whole project and steal it for yourself.

The ingredients:

  • a vanilla Dockerfile that does nothing special except
    • install the CRaC-enabled Azul JDK
    • create a directory to store snasphots in
  • an entrypoint.sh that, depending on whether there is a snapshot present, boots in either "snapshot" or "restore" mode
  • a checkpoint.sh shell script that
    • builds the application
    • builds the Docker container
    • runs the container in "snapshot" mode using the --privileged flag
    • commits the container using the checkpoint tag
  • a restore.sh script that simply runs the previously created container (no --privileged necessary)

Some error messages you will get from the JVM when trying to create a checkpoint are pretty unspecific and obscure, as they bubble up from the underlying CRIU library, but thankfully Azul has published some helpful docs that will get you unstuck. The most important part is getting the privileges right, after that it should be smooth sailing.

I hope this little example will help you get started, thanks to Sébastien Deleuze for laying the foundation.

# Benchmarks

I have not yet answered the question from the beginning: how fast is it?

Measuring with hyperfine, these are the results of the test runs on my MBP 2015:

Benchmark Benchmark

And to hammer the point home even less subtly, have a bar graph:

Benchmark

There is a little overhead of running Spring in Docker on a Mac, but it affects both runs so don't get hung up on the absolute numbers, but the relation between them.