Gagallium : A library to record OCaml backtraces

Sometimes using OCAMLRUNPARAM="b" returns a disappointingly unhelpful backtrace, because the program you're trying to debug has a non-trivial usage of exceptions that destroys the backtrace information you were expecting. Here is a new library that should solve this problem, by allowing you to reliably register backtraces. Read more for the source, and the juicy internal details: weak tables, semi-manual garbage collection logic, and an optional change to the OCaml runtime to improve the library performances.

The problem

When debugging OCaml programs, it is sometimes hard to get useful backtraces for exceptions. Let's take an example: you are hacking the Coq implementation, and your patch launches an exception (any resemblance to real persons, living or dead, is purely coincidental). Of course, you have no idea why this exception has been raised, so a little bit of context would be appreciated: getting a backtrace would be very useful, because it tells you where the exception has been raised, and what were the state of the call stack at that time.

There is already some support for getting backtraces in OCaml. If you compile in debug mode (eg. with the -g option), and then run your code with OCAMLRUNPARAM="b" in the environment, then you can get exception backtraces. They will be automatically printed for uncaught exceptions (those that stop the program execution), and inside your programs you can use the Printexc module of the OCaml standard library, to print the backtrace from the "raise" of the last exception that was raised up to the "try..with" that caught it (or the end of the program).

However, this system is not always satisfactory: in OCaml, exceptions are very efficient, and it is a common idiom to use them for control flow (rather than only to handle errors). As a consequence, the exception handling mechanism in big OCaml programs can be complicated, because it has to filter exceptional exceptions (which are a sign of an error), and exceptions used for control. It can also be the case that the error handlers themselves uses exceptions internally, for example some error messages formatter do this. This is problematic, because it erases the original, relevant backtrace. Asking for a backtrace after the fact will show you traces for exceptions that were raised and caught internally in the message formatter, and then you're lost, which (in the good case) makes you write frustrated blog posts such as this one.

The solution of other languages

In some other languages, like Java, every exception stores its own backtrace. So, even if you do strange things with exceptions internally, as long as you have access to the exception that has been raised, you can print its backtrace. However, this has a non-negligible memory and performance cost, because the runtime has to copy the exception backtrace into the heap.

In OCaml, it would be unpleasant to do that in the general case, because exception can be heavily used. We could imagine to do that only in debug mode and when backtraces are active, but this poses non-trivial questions, because we have to choose at compile time whether to allocate a backtrace-compliant exception object or not, for example. Besides, it would require quite invasive changes to the compiler and runtime, probably add bugs, etc. Not something you want to try over a week-end.

A less invasive approach: a user library

If we cannot add a pointer from the exception to the backtrace, we can consider building an global map from exceptions to backtraces. Each time we catch an exception, we can store its backtrace in the map. Then, when needed, we can lookup the backtrace from the exception in this map.

This has the advantage of not requiring any change to the compiler. You can implement this big table as a library, and let users call a library function, in their exception handler, to add their current exception to the table.

But this requires user intervention to work well: the user must explicitly call the registration function in each error handler. This is good, as it allows to decide whether to store traces in a fine-grained way, making performance/traceability compromises or simply not tracing exceptions that you know do not correspond to bugs. But this can be painful for legacy or third-party code for which you just want a backtrace quick: you don't really want to go over all the `try..with` handlers in Coq's implementation to add a call to our registration function, by hand. Gabriel Scherer implemented a small preprocessor (as a Camlp4 extension and a `-ppx` rewriter) to add them automatically at each handler if desired.

Implementing the map itself is more challenging: inserting in it should be efficient (while lookup operations can be slow); it should operate with physical equality, because two occurrences of Not_found should see their backtraces distinguished; it should not leak memory, that is, it should throw away the backtrace when the exception is no longer accessible -- and it should not be considered as accessible by the Garbage Collector (GC) trough this map. Finally, we need to be able to store multiple backtrace for the same exception, because an exception can be re-raised: in that case, we want to be able to get all the backtraces.

Our solution is to use two arrays: one for exceptions, the other for backtraces. At the same indices, we store corresponding elements. Inserting is very cheap in amortized time, even if it involves resizing the arrays if necessary. The lookup is in linear time, but we almost never need it. The array of exceptions is a weak array: this data structures makes the GC able to collect exceptions even if they are still referenced in the map. At the end of each major GC cycle, we compact those arrays, manually removing traces corresponding to collected exceptions.

Performances considerations

Storing all traces in this way is very slow for raise-happy programs (which are the one for which you need better traces in the first place). The performances of Coq are divided by 3 to 4 by this modification. The main reason for this cost is that the only way in the OCaml standard library to get the backtrace information is to convert it to a string, with English sentences, file names, line numbers, and so on. It takes a long time to generate such strings for each caught exception, and uses a lot of memory, stressing the GC. But still, it works, and if you really need good backtraces you are ready to pay the performance price... which only exists when you turn on backtrace recording.

The next idea was to modify the OCaml runtime and standard library, in order to expose a more low-level representation of backtraces without the processing costs. Actually, this already exists: when using the existing mechanism for recording backtraces in OCaml, the runtime generate a compact representation of the backtrace in a global variable for each raised exception. The only thing we have to do is to split the collecting-and-formatting function in two, and expose an API for copying on demand this information in the ocaml heap, for a possible future conversion to the string representation.

We implemented this as a small change to the OCaml runtime system and Printexc module -- much less invasive than the compiler-wide change we are avoiding with our user-side solution. Using this new feature, Coq only get 10% slower (tested on the compilation of CompCert). It still uses more memory (about 30%). We think this is very reasonable, given that Coq uses quite a lot of exceptions for backtracking, and that there is no cost when backtrace recording is not activated.

Of course, performances could be improved even further by letting users decide which exception handlers really correspond to errors that needs to store traces, and which are purely control-flow logic (loop exit, backtracking...) that does not need to record anything. But getting good traces out the box with reasonable performance impact is already satisfying enough for now.

The actual code

Our library for recording backtraces can be found in exceptionHandling.ml (slow version using strings), or exceptionHandling.ml.new (fast version using the new runtime functions). Both share a single interface exceptionHandling.mli.

The rewriter that adds exception handling logic to all try .. with .. in your code (to use with care) is available for Camlp4, add_smartcatch.ml, or for the new -ppx AST rewriting mechanism available in OCaml trunk, smartcatch_ppx.ml.

The patch to the available runtime is available as abstract-backtraces.patch.txt. It is mostly of historical curiosity, as the maintainers found the change non-invasive enough to accept it into the OCaml trunk, so you should be able to get fast backtrace recording in the next OCaml version.

Our quick-and-dirty patch for Coq 8.4pl1 is coq-8.4pl1.patch (it mostly hacks around the Coq build system to add an additional camlp4 preprocessing step to all files). In order to use it, you must add either version¹ of the exception handling library in the lib subdirectory, under the name exceptionHandling.ml. Then, simply type:

./configure -debug
make

And voilà, pixel-perfect backtraces!

¹: Note that for obscure camlp5 reasons, Coq will not build easily against the OCaml trunk, so you should apply the runtime patch to the 4.00 branch instead; as the patch adds a new primitive, you will need to throw a make coreboot somewhere in the OCaml compilation process. Even then, building Coq won't be for the faint of heart.