Conclusion and future work

7 Conclusion and future work

In join-calculus, a name definition, all receptors on that name and all possible synchronizations on that name are grouped altogether in a single join definition. This enables the compilation of synchronization between concurrent or even distributed processes, using finite-state automata. In the distributed case, a message transport phase to the machine that currently hosts the join definition (and hence the automaton) is first performed. This strengthens our point of view that the join-calculus is the core of a distributed programming language that can be compiled in practice, mainly because it restricts reception on a channel to statically known parts of the program. The same argument applied to à la ML polymorphic typing in [6].

   fib    afib    pat    qsort    count

join    32.0    14.5    37.2    9.9    16.4

jocaml    5.7    3.5    5.4    1.4    4.2

Bologna    11.9    6.2    9.4    16.8    5.3

Table 1: Some performance measures (wall-clock time, in seconds)

We performed experiments on a 200Mhz Pentium-Pro PC, taking a few benchmarks as a set of sensible join programs: fib is computing fib(27) using synchronous channels, afib is computing fib(20) using asynchronous channels and continuation-passing style, pat tests pattern-matching, qsort sorts a one-hundred elements list (lists are encoded as lists of messages), while count increments the counter of section 5.1 100000 times. We compared the join, jocaml compilers and another join implementation from the University of Bologna [12]. Note that we cannot offer a full analysis of these performance figures at the moment, however such data can be useful to other developers (the benchmarks are available at http://join.inria.fr/speed/).

Table 1 shows that jocaml and the Bologna implementation exhibit similar performance, jocaml being slightly faster, while join is noticeably slower. This is easily explained by core implementation choices: the join runtime performs bytecode interpretation in Caml, while the jocaml and Bologna bytecode interpreters are written in C. Additionally, jocaml benefits from the finely tuned Objective Caml bytecode interpreter. The table and other consideration also show that both the join and the jocaml pattern matching compilation schemes prove adequate. In particular, none of the two schemes falls into the pitfall associated to the compilation technique used, since all figures show similar variations. However, join performs poorly in the pat test and we cannot tell for the moment whether pattern-matching compilation is responsible for it or not.

In the join case, one can be afraid of code size, the technique exposed in section 5.3 successfully avoids code size explosion in practical cases. The jocaml technique appears expensive in runtime checks and thus a priori produces slow code. We choose such a scheme of implementing automata using generic code, because it can be implemented simply by adding code to the Objective Caml bytecode interpreter. Using bytecode specialized for automata manipulation would have implied more important modifications of the Objective Caml bytecode interpreter. Moreover, the jocaml system runs faster than the join system, even for pure join programs, showing that its weaker compilation of join definitions is more than compensated by its total integration in the Objective Caml system. Whether jocaml performance would benefit significantly from a more sophisticated implementation of pattern-matching automata or not remains an open question.

Comparison with the Bologna implementation [12] of the join-calculus is also instructive. This system also produces bytecode, which is interpreted by a C program. It proves faster than join and slower that jocaml on all the examples, except for qsort, where it is outperformed by both our implementations. Taking a glance at the Bologna source code reveals that it uses a scheme very similar to the one of jocaml, with two slight differences. First, status is systematically encoded as an array of integers. Second when a message arrives on a name x with an empty queue, all patterns are tested (whereas in jocaml only the patterns that contain x are tested).

Performance of a given join system depends on many factors. In particular, scheduling policy and message queue management have a dramatic impact on it, which accounts for the poor performance of the Bologna implementation on the qsort test. Furthermore, a policy that gives good results on one benchmark can be defeated by another. For these reasons, we cannot tell which pattern-matching compilation technique is the best by comparing different implementations.

Clearly, we now need to integrate all our compilation techniques in the same system, in order to compare them more thoroughly and to experiment further. However, the definition of reactivity status and the automata of section 3 provide a sound basis for these future investigations. Apart from future language development and fully implementing the failure semantics of the join-calculus, we also plan to investigate more on the implementation of threads, attempting to minimize thread suspension and creation.

The chemical semantics does not fit compilation needs. For instance, this very paper gets imprecise when it comes to relate pattern-matching by automata and join-definition firing. Or, name management by α-conversion at dilution-time does not faithfully render closure-based implementations. Thus, another important direction for future research is defining a semantics for the implementations. Such a semantics should be more deterministic than the chemical semantics, it should also handle synchronism directly. This would be a first step toward developing precise and proved semantical analyses, we plan to follow [13] for designing both concrete and abstract semantics for the implementations.

	`fib`	`afib`	`pat`	`qsort`	`count`
`join`	32.0	14.5	37.2	9.9	16.4
`jocaml`	5.7	3.5	5.4	1.4	4.2
`Bologna`	11.9	6.2	9.4	16.8	5.3