A few days ago, Hugo Heuzard asked about the details of the problem with OCamlbuild’s parallelism today on mantis: PR#5754. My mantis comment quickly grew into the blog post below.

I worked last September on better parallelization for OCamlbuild, but didn’t finish the thing (I couldn’t get something mergeable in time for ICFP) and got distracted with tons on other things to do since. I just uploaded the state of my branch:

https://github.com/gasche/ocaml/tree/ocamlbuild-indirect-style

Why ocamlbuild’s parallelization is often disappointing today

Below is an explanation of my understanding of the current problem with ocamlbuild’s parallelization. OCamlbuild central form of abstraction is the “rule action”, an OCaml function that knows how to build a target. In fact, the action

  • is passed a “building function” to invoke whenever it needs to build a dependency of the target
  • returns exactly one command (think of a command-line invocation) that will finally produce the target

For example, the rule to produce a .cmo from a .ml file whose .mli file is also present is defined here in ocaml_specific.ml:

rule "ocaml: ml & cmi -> cmo"
  ~prod:"%.cmo" ~deps:["%.mli"; "%.ml"; "%.ml.depends"; "%.cmi"]
  (Ocaml_compiler.byte_compile_ocaml_implem "%.ml" "%.cmo");;

This is a library call that registers the rule action along with some metadata about it (what we know, statically, about its targets and dependencies; this is used by the ocamlbuild “solver” to understand when to invoke the rule).

The action itself is Ocaml_compiler.byte_compile_ocaml_implem. It reads as follows (with added comments):

(* to build foo.ml *)
let byte_compile_ocaml_implem ?tag ml cmo env build =
  let ml = env ml and cmo = env cmo in

  (* builds the dependencies in foo.ml.depends *)
  prepare_compile build ml;

  (* then produce the actual "ocamlc -c ..." command-line *)
  ocamlc_c (Tags.union (tags_of_pathname ml) (tags_of_pathname cmo)++"implem"+++tag) ml cmo

ml, cmo are explicitly passed to the function in ocaml_specific.ml. The action proper is the function fun env build -> ...: it takes an environment and a builder function, and returns a command. This type of actions can be found in signatures.mli:

  (** This is the type for rule actions. An action receive as argument, the
      environment lookup function (see {!env}), and a function to dynamically
      build more targets (see {!builder}). An action should return the command
      to run in order to build the rule productions using the rule dependencies.
  *)
  type action = env -> builder -> Command.t

env allows to translate from the static patterns %.ml, %.ml.depends, etc. to concrete values foo.ml, foo.ml.depends depending on which target matching the patterns is actually requested. The important part is in builder -> Command.t.

The prepare_compile function, when invoked during the action for foo.ml -> foo.cmo, will look at foo.ml.depends (which is marked as a static dependency of the rule, and hence has already been build when the action is invoked) and build all the modules it requests:

let prepare_compile build ml =
  let dir = Pathname.dirname ml in
  let include_dirs = Pathname.include_dirs_of dir in

  (* reads the module names from foo.ml.depends *)
  let modules = path_dependencies_of ml in

  (* invoke the `build` function to build them all (in parallel) *)
  let results =
    build (List.map (fun (_, x) -> expand_module include_dirs x ["cmi"]) modules) in

  (* inspects the build results to sometimes complain; returns () *)
  List.iter2 begin fun (mandatory, name) res ->
    match mandatory, res with
    | _, Good _ -> ()
    | `mandatory, Bad exn ->
        if !Options.ignore_auto then
          dprintf 3 "Warning: Failed to build the module \
                     %s requested by ocamldep" name
        else raise exn
    | `just_try, Bad _ -> ()
  end modules results

After this function is executed, all dependencies of the rules (the static ones, that were known in advance, and the dynamic ones, which are whichever was passed to build and actually succeeded to build) are present, and we only needs to return the final command:

let ocamlc_c tags arg out =
  let tags = tags++"ocaml"++"byte" in
  Cmd (S [!Options.ocamlc; A"-c"; T(tags++"compile");
          ocaml_ppflags tags; ocaml_include_flags arg; A"-o"; Px out; P arg])

(Note that the command-line has some tags, some built-in passed here and some depending on the pathname (that is, looked up in _tags), passed in the byte_compile_ocaml_implem function), which will get translated into actual command-line options by the flags declaration (built-in or in the user’s myocamlbuild.ml). That’s orthogonal to the current topic.)

The opportunities for parallelization come from the fact that the build rule takes a list of targets to build (in fact it takes a list of lists: with input [[A;Abis]; [B;Bbis]] it will try to build (either A, or Abis if it fails) and-in-parallel (either B, or Bbis if it fails)). If you ask ocamlbuild to build bar.cmo, baz.cmo and foobar.cmo, it will:

  • call the relevant rule for each, to get a command-line for each subtarget
  • call all those command-lines in parallel (this is the ocamlbuild_executor business)

Any dynamic dependency (call to build) executing before returning the final command will not be parallelized, only the final production step will.

In practice, this works reasonably well for rules that don’t take a lot of time to return a command, but whose command runs long enough (eg. .cmx production when dependency chains are flat enough). But it doesn’t work very well when a good share of the work, for each subtarget, is performed before returning that final command-line to execute.

My draft of a solution

Consider the following scenario: the target I’m currently building has dependencies A and B, and:

  • A’s action calls build on [A1; A2], and then returns the command comA
  • B’ action calls build on [B1; B2], and then returns the command comB

Today [A1; A2] are run in parallel, then [B1; B2] are run in parallel (or maybe B’s before A’s), finally and [comA; comB] are executed in parallel. What we would rather like is to have [A1; A2; B1; B2] to be built in parallel, then [comA; comB] in parallel.

We can’t do that with the current interface because actions are just functions from build to Command.t: they do not give to their caller any information about the dynamic dependencies they may decide to build. Instead of just being a function that takes a build and returns a Command.t, we need an action to produce a data-type that says either:

  • I’m done, here is the final Command.t to produce my build
  • Not yet, I want you to build those dependencies, and then resume my build with this next action (“continuation”)

As the caller of both A and B in parallel, I would get two “Not Yet” answers, with a request to build [A1;A2] on one side and [B1;B2] on the other; I can build them all in parallel, and continue with the two continuations in parallel.

That means that instead of writing rules in direct style (calling build to produce dependencies), actions should revert their control-flow to pass their dependencies to their caller. You probably have heard the music before: this screams of monads.

What I started doing in the aptly named ocamlbuild-indirect-style is exactly this: define a type of actions that is data instead of a blackbox function, use it for all built-in rule descriptions, and implement an “execution” function that evaluates such a rule to actually produce a result.

The nice thing with this change is that it can be done in a very incremental way: it’s easy to still expose the old interface where rule expects a function (as action), and on-the-side expose the new interface with a indirect_rule function whose action is the new action description format. You can then let users convert their rules if they want to, at their own leisure (it’s even possible to mix both styles in the same action description); just converting the builtin OCaml-specific rules to the indirect style should already give a nice parallelization boost.

If I remember correctly, when I stopped working on that (expecting to resume soon, of course), the missing piece was a good “evaluation” function that would take an indirection action and execute it correctly, exploiting the extra parallelism information. The naive way to do it doesn’t quite work, and you have to be careful about sharing work when two rules end up requiring the same subtarget at different times in the build – building an explicit task-dependency graph could be a good way to do it properly.

In the branch, the code I’ve shown previously looks as follows (ocamlc_c is strictly unchanged):

let prepare_compile ml =
  let dir = Pathname.dirname ml in
  let include_dirs = Pathname.include_dirs_of dir in
  let modules = path_dependencies_of ml in
  let module_build_order (mandatory, name) =
    (expand_module include_dirs name ["cmi"],
     fun result -> match mandatory, result with
       | _, Good _ -> ()
       | `mandatory, Bad exn ->
         if !Options.ignore_auto then
           dprintf 3 "Warning: Failed to build the module \
                      %s requested by ocamldep" name
         else raise exn
       | `just_try, Bad _ -> ())
  in
  build_order (List.map module_build_order modules)

let byte_compile_ocaml_implem ?tag ml cmo env =
  let ml = env ml and cmo = env cmo in
  bind (prepare_compile ml) & fun () ->
  let tags =
    Tags.union (tags_of_pathname ml) (tags_of_pathname cmo)
    ++"implem"+++tag in
  return (ocamlc_c tags ml cmo)

build_order is one of the constructors of those new indirect-style actions; it takes a pair of the dependency that has to be built (here expand_module ...), and a cleanup/validation function to be executed on the result (same checking stuff as before). The byte_compile_ocaml_implem function uses a monadic bind instead of ;, but that’s about the only change.

Remark

If you heard about Jenga, you may have noticed the remarkable similarity between the monadic version I describe here and Jenga’s rule design (or maybe you know about Shake and also found a strong link). I noticed myself when I first saw a talk about Jenga… a few months after doing this work. The indirect-style rules are only a small (and extremely natural) improvement over OCamlbuild’s current state, which is in fact quite close to Jenga’s design to start with.

In particular, one thing I am curious about is whether it would be possible to reuse Jenga’s runtime (the part that gets the dependency graph and runs it as fast as possible or does clever thing with cache of previous runs and filesystem notifiers). But this experiment would be more invasive than the work strictly necessary to just improve parallelization on OCamlbuild (which has no reason to be particularly slow on its own), so it’s not a priority.

Remark 2

If you’re interested in contributing to OCamlbuild, there is certainly interesting work to be done in the area of parallelization. My gut feeling, however, is that other issues may be more pressing (and less time-consuming), such as satisfying reasonable feature requests, or helping improve the documentation.