A library to record OCaml backtraces
- March 7, 2013
      Sometimes using
      OCAMLRUNPARAM="b" returns a disappointingly
      unhelpful backtrace, because the program you're trying to debug
      has a non-trivial usage of exceptions that destroys the
      backtrace information you were expecting. Here is a new library
      that should solve this problem, by allowing you to reliably
      register backtraces. Read more for the source, and the juicy
      internal details: weak tables, semi-manual garbage collection
      logic, and an optional change to the OCaml runtime to
      improve the library performances.
    
The problem
When debugging OCaml programs, it is sometimes hard to get useful backtraces for exceptions. Let's take an example: you are hacking the Coq implementation, and your patch launches an exception (any resemblance to real persons, living or dead, is purely coincidental). Of course, you have no idea why this exception has been raised, so a little bit of context would be appreciated: getting a backtrace would be very useful, because it tells you where the exception has been raised, and what were the state of the call stack at that time.
      There is already some support for getting backtraces in
      OCaml. If you compile in debug mode (eg. with
      the -g option), and then run your code
      with OCAMLRUNPARAM="b" in the environment, then you
      can get exception backtraces. They will be automatically printed
      for uncaught exceptions (those that stop the program execution),
      and inside your programs you can use
      the Printexc
      module of the OCaml standard library, to print the backtrace
      from the "raise" of the last exception that was raised up to the
      "try..with" that caught it (or the end of the program).
    
However, this system is not always satisfactory: in OCaml, exceptions are very efficient, and it is a common idiom to use them for control flow (rather than only to handle errors). As a consequence, the exception handling mechanism in big OCaml programs can be complicated, because it has to filter exceptional exceptions (which are a sign of an error), and exceptions used for control. It can also be the case that the error handlers themselves uses exceptions internally, for example some error messages formatter do this. This is problematic, because it erases the original, relevant backtrace. Asking for a backtrace after the fact will show you traces for exceptions that were raised and caught internally in the message formatter, and then you're lost, which (in the good case) makes you write frustrated blog posts such as this one.
The solution of other languages
In some other languages, like Java, every exception stores its own backtrace. So, even if you do strange things with exceptions internally, as long as you have access to the exception that has been raised, you can print its backtrace. However, this has a non-negligible memory and performance cost, because the runtime has to copy the exception backtrace into the heap.
In OCaml, it would be unpleasant to do that in the general case, because exception can be heavily used. We could imagine to do that only in debug mode and when backtraces are active, but this poses non-trivial questions, because we have to choose at compile time whether to allocate a backtrace-compliant exception object or not, for example. Besides, it would require quite invasive changes to the compiler and runtime, probably add bugs, etc. Not something you want to try over a week-end.
A less invasive approach: a user library
If we cannot add a pointer from the exception to the backtrace, we can consider building an global map from exceptions to backtraces. Each time we catch an exception, we can store its backtrace in the map. Then, when needed, we can lookup the backtrace from the exception in this map.
This has the advantage of not requiring any change to the compiler. You can implement this big table as a library, and let users call a library function, in their exception handler, to add their current exception to the table.
But this requires user intervention to work well: the user must explicitly call the registration function in each error handler. This is good, as it allows to decide whether to store traces in a fine-grained way, making performance/traceability compromises or simply not tracing exceptions that you know do not correspond to bugs. But this can be painful for legacy or third-party code for which you just want a backtrace quick: you don't really want to go over all the `try..with` handlers in Coq's implementation to add a call to our registration function, by hand. Gabriel Scherer implemented a small preprocessor (as a Camlp4 extension and a `-ppx` rewriter) to add them automatically at each handler if desired.
      Implementing the map itself is more challenging: inserting in it
      should be efficient (while lookup operations can be slow); it
      should operate with physical equality, because two occurrences
      of Not_found should see their backtraces
      distinguished; it should not leak memory, that is, it should
      throw away the backtrace when the exception is no longer
      accessible -- and it should not be considered as accessible by
      the Garbage Collector (GC) trough this map. Finally, we need to
      be able to store multiple backtrace for the same exception,
      because an exception can be re-raised: in that case, we want to
      be able to get all the backtraces.
    
Our solution is to use two arrays: one for exceptions, the other for backtraces. At the same indices, we store corresponding elements. Inserting is very cheap in amortized time, even if it involves resizing the arrays if necessary. The lookup is in linear time, but we almost never need it. The array of exceptions is a weak array: this data structures makes the GC able to collect exceptions even if they are still referenced in the map. At the end of each major GC cycle, we compact those arrays, manually removing traces corresponding to collected exceptions.
Performances considerations
      Storing raise-happy programs (which are the one for
      which you need better traces in the first place). The
      performances of Coq are divided by 3 to 4 by this
      modification. The main reason for this cost is that the only way
      in the OCaml standard library to get the backtrace information
      is to convert it to a string, with English sentences, file
      names, line numbers, and so on. It takes a long time to generate
      such strings for each caught exception, and uses a lot of
      memory, stressing the GC. But still, it works, and if you really
      need good backtraces you are ready to pay the performance
      price... which only exists when you turn on backtrace recording.
    
The next idea was to modify the OCaml runtime and standard library, in order to expose a more low-level representation of backtraces without the processing costs. Actually, this already exists: when using the existing mechanism for recording backtraces in OCaml, the runtime generate a compact representation of the backtrace in a global variable for each raised exception. The only thing we have to do is to split the collecting-and-formatting function in two, and expose an API for copying on demand this information in the ocaml heap, for a possible future conversion to the string representation.
We implemented this as a small change to the OCaml runtime system and Printexc module -- much less invasive than the compiler-wide change we are avoiding with our user-side solution. Using this new feature, Coq only get 10% slower (tested on the compilation of CompCert). It still uses more memory (about 30%). We think this is very reasonable, given that Coq uses quite a lot of exceptions for backtracking, and that there is no cost when backtrace recording is not activated.
Of course, performances could be improved even further by letting users decide which exception handlers really correspond to errors that needs to store traces, and which are purely control-flow logic (loop exit, backtracking...) that does not need to record anything. But getting good traces out the box with reasonable performance impact is already satisfying enough for now.
The actual code
Our library for recording backtraces can be found in exceptionHandling.ml (slow version using strings), or exceptionHandling.ml.new (fast version using the new runtime functions). Both share a single interface exceptionHandling.mli.
      The rewriter that adds exception handling logic to all try
      .. with .. in your code (to use with care) is available
      for Camlp4, add_smartcatch.ml,
      or for the new -ppx AST rewriting mechanism
      available in OCaml
      trunk, smartcatch_ppx.ml.
    
The patch to the available runtime is available as abstract-backtraces.patch.txt. It is mostly of historical curiosity, as the maintainers found the change non-invasive enough to accept it into the OCaml trunk, so you should be able to get fast backtrace recording in the next OCaml version.
      Our quick-and-dirty patch for Coq 8.4pl1
      is coq-8.4pl1.patch (it mostly
      hacks around the Coq build system to add an additional camlp4
      preprocessing step to all files). In order to use it, you must
      add either version¹ of the exception handling library in
      the lib subdirectory, under the
      name exceptionHandling.ml. Then, simply type:
    
./configure -debug
makeAnd voilà, pixel-perfect backtraces!
      ¹: Note that for obscure camlp5 reasons, Coq will not build
      easily against the OCaml trunk, so you should apply the runtime
      patch to the 4.00 branch instead; as the patch adds a new
      primitive, you will need to throw a make coreboot
      somewhere in the OCaml compilation process. Even then, building
      Coq won't be for the faint of heart.