Over 70 years, semanticists¹ have developped a range of technics to
define the meaning of programming languages: small²-or-big³-step, by
substitution or environment, by continuation, through abstract machine,
or even denotational or axiomatic. This proliferation of subtle
trade-offs only increased in abundance as mechanization in proof
assistant became common. Indeed, new criteria entered: ease of
formalization in a given meta-theory, ease of instrumenting for testing,
ease of automating reasoning, ...
In simple cases, we are fairly familiar with the trade-offs. Notably,
the history of CompCert, recreated at small scall in Leroy's lecture
notes,⁴ neatly illustrates the benefits of big step over small step for
inductive reasoning, the benefits of small steps for characterizing
expressing termination-preserving transformations, and the benefits
of a semantics by continuation for capturing bounded stuttering arguments.
But that is not the case in general. Notably, some styles have become
associated with specific proof assistants, or even projects---it the
case for instance of functional big step,⁵ whose use is at the core of
the CakeML verified compiler. It is however extremely difficult to know
whether another style would have been as, if not more, effective, and
whether the nature of the proof assistant in use, Isabelle/HOL in this
case, bears responsibility in the efficacity of functional big step.
Through this internship, we therefore propose to design and build in the Rocq
Proof Assistant a semantic zoo, a carefully crafted test suite of
formalization helping to rationnally compare a well chosen suit of
styles of semantics. In particular, designing adequate criteria for
comparison and curating the right collection of approches
will be as part of the objective as carefully implementating them.
References: [1] : https://www.cin.ufpe.br/~if721/intranet/TheFormalSemanticsofProgrammingLanguages.pdf [2] : https://www.lix.polytechnique.fr/~fvalenci/papers/sos.pdf [3] : https://link.springer.com/chapter/10.1007/BFb0039592 [4] : https://github.com/xavierleroy/cdf-mech-sem [5] : https://cakeml.org/esop16.pdf
Semantic Foundations for Aeneas
Supervision and collaboration:
Yannick Zakowski (Inria Paris, Cambium Team) and Son Ho (Microsoft Research, Azure Research Team)
Aeneas is a tool for translating (a fragment of) safe Rust programs to purely functional ones, with a backend for the Lean proof assistant in particular.
Its design has two main benefits: (1) simplifying verification by alleviating the need for reasoning about the memory, and (2) leverage Lean's infrastructure for interactive proof and extensibility via custom automation. See ([1], [3]).
Aeneas is used in increasingly many contexts, notably at Microsoft, for cryptography: SymCrypt, the cryptographic library used in the Windows kernel and in Azure Linux, is being ported from its current C implementation to Rust, and being verified via Aeneas in the process [5].
This state of affair raises notably two major axes for evolution:
Applicability.
Rust is a complex language, scaling Aeneas entails to scale the translation to a variety of program features, but also to enhance its ecosystem and its automation in particular.
Translating safe Rust to purely function programs is sufficient for a wide range of applications, including cryptography. It is however not adapted when the code uses unsafe blocks, internal mutability, I/O, or concurrency.
These features are however necessary for system programming, a target of choice for verification. Initial investigations target the development of an hybrid method, sticking to functional translations wherever possible, but combining it with separation-logic-based reasoning where necessary.
Soundness.
A paper proof of soundness that Aeneas's translation implements a borrow checker exists (see [2]: if the translation succeeds, then the code is memory safe), and a mecanisation of this proof is ongoing. However, this proof does not justify the correctness of the transaltion itself, and focuses on a restricted fragment of safe Rust (notably, no nested borrows).
Extending this result to justify an hybrid translation as sketched above is likely to require a radically different approach---possibly by investigating the development of a semantic interpretation of types à la RustBelt ([4]).
A Model of Cooperative Scheduling with Monadic Trees
Supervision:
Yannick Zakowski (Inria Paris, Cambium Team)
Localisation:
Inria Paris, Cambium Team
Context:
Monadic Trees [1] are a library under development providing tools to define denotation semantics of effectfuls programs.
They extend Interaction Trees [2] and Choice Trees [3] to provide support for a wider range of effects.
Abstractly, they define the initial iterative monad supporting user specified effects, for any effect represented as polynomial monad.
Concretely, one can think of them as possibly infinite trees such that the computation of each layer of the tree may perform a specified monadic effect.
Yannick Zakowski (Inria Paris, Cambium Team) with collaboration with Steve Zdancewic (University of Pennsylvania)
Localisation:
Inria Paris, Cambium Team
Context:
Vellvm [1,2] is a formal semantics for LLVM IR, the intermediate representation at the heart of the LLVM compiling infrastructure.
The semantics is defined in a quite original fashion: it a denotation defining a monadic interpreter based on Interaction Trees.
The resulting model enjoys nice compositional reasoning properties, and is executable by extraction, leading to a certified interpreter.
Recently, Choice Trees [3] have been introduced to extend Interaction Trees with support for non-determinism.
This addition allows for the definition of an interleaving semantics for concurrency, as has been explored on
a minimal language in [4].
In this internship, we propose to adapt these techniques to first replace the existent non-deterministic parts of Vellvm with CTrees,
then extend it with a support for threads.
References: [1] : https://github.com/vellvm/vellvm [2] : https://dl.acm.org/doi/10.1145/3473572 [3] : https://dl.acm.org/doi/10.1145/3571254 [4] : https://dl.acm.org/doi/10.1145/3703595.3705890