Internship topics

Mechanized Semantics: Comparative Study

Supervision and collaboration:

Yannick Forster (Inria Paris, Cambium Team) and Yannick Zakowski (Inria Paris, Cambium Team)

Localisation:

Inria Paris, Cambium Team

Context:

Over 70 years, semanticists¹ have developped a range of technics to define the meaning of programming languages: small²-or-big³-step, by substitution or environment, by continuation, through abstract machine, or even denotational or axiomatic. This proliferation of subtle trade-offs only increased in abundance as mechanization in proof assistant became common. Indeed, new criteria entered: ease of formalization in a given meta-theory, ease of instrumenting for testing, ease of automating reasoning, ...
In simple cases, we are fairly familiar with the trade-offs. Notably, the history of CompCert, recreated at small scall in Leroy's lecture notes,⁴ neatly illustrates the benefits of big step over small step for inductive reasoning, the benefits of small steps for characterizing expressing termination-preserving transformations, and the benefits of a semantics by continuation for capturing bounded stuttering arguments.
But that is not the case in general. Notably, some styles have become associated with specific proof assistants, or even projects---it the case for instance of functional big step,⁵ whose use is at the core of the CakeML verified compiler. It is however extremely difficult to know whether another style would have been as, if not more, effective, and whether the nature of the proof assistant in use, Isabelle/HOL in this case, bears responsibility in the efficacity of functional big step.
Through this internship, we therefore propose to design and build in the Rocq Proof Assistant a semantic zoo, a carefully crafted test suite of formalization helping to rationnally compare a well chosen suit of styles of semantics. In particular, designing adequate criteria for comparison and curating the right collection of approches will be as part of the objective as carefully implementating them.
References:
[1] : https://www.cin.ufpe.br/~if721/intranet/TheFormalSemanticsofProgrammingLanguages.pdf
[2] : https://www.lix.polytechnique.fr/~fvalenci/papers/sos.pdf
[3] : https://link.springer.com/chapter/10.1007/BFb0039592
[4] : https://github.com/xavierleroy/cdf-mech-sem
[5] : https://cakeml.org/esop16.pdf

Semantic Foundations for Aeneas

Supervision and collaboration:

Yannick Zakowski (Inria Paris, Cambium Team) and Son Ho (Microsoft Research, Azure Research Team)

Localisation:

Inria Paris, Cambium Team

Details:

Internship Aeneas

Context:

Aeneas is a tool for translating (a fragment of) safe Rust programs to purely functional ones, with a backend for the Lean proof assistant in particular. Its design has two main benefits: (1) simplifying verification by alleviating the need for reasoning about the memory, and (2) leverage Lean's infrastructure for interactive proof and extensibility via custom automation. See ([1], [3]). Aeneas is used in increasingly many contexts, notably at Microsoft, for cryptography: SymCrypt, the cryptographic library used in the Windows kernel and in Azure Linux, is being ported from its current C implementation to Rust, and being verified via Aeneas in the process [5]. This state of affair raises notably two major axes for evolution:

Applicability.
- Rust is a complex language, scaling Aeneas entails to scale the translation to a variety of program features, but also to enhance its ecosystem and its automation in particular.
- Translating safe Rust to purely function programs is sufficient for a wide range of applications, including cryptography. It is however not adapted when the code uses unsafe blocks, internal mutability, I/O, or concurrency. These features are however necessary for system programming, a target of choice for verification. Initial investigations target the development of an hybrid method, sticking to functional translations wherever possible, but combining it with separation-logic-based reasoning where necessary.
Soundness. A paper proof of soundness that Aeneas's translation implements a borrow checker exists (see [2]: if the translation succeeds, then the code is memory safe), and a mecanisation of this proof is ongoing. However, this proof does not justify the correctness of the transaltion itself, and focuses on a restricted fragment of safe Rust (notably, no nested borrows). Extending this result to justify an hybrid translation as sketched above is likely to require a radically different approach---possibly by investigating the development of a semantic interpretation of types à la RustBelt ([4]).

This internship takes place in this context and propose to contribute to the semantic foundations of Aeneas. Notably, we propose here to investigate the development of a semantic interpretation of types in the style of RustBelt to guarantee that the translation remains sound in presence of unsafe code, concurrency, internal mutability, or I/O.
References:
[1] : https://dl.acm.org/doi/10.1145/3547647
[2] : https://dl.acm.org/doi/abs/10.1145/3674640
[3] : https://www.sonho.fr/papers/thesis-manuscript.pdf
[4] : https://people.mpi-sws.org/~dreyer/papers/rustbelt/paper.pdf
[5] : https://www.microsoft.com/en-us/research/blog/rewriting-symcrypt-in-rust-to-modernize-microsofts-cryptographic-library/

A Model of Cooperative Scheduling with Monadic Trees

Supervision:

Yannick Zakowski (Inria Paris, Cambium Team)

Localisation:

Inria Paris, Cambium Team

Context:

Monadic Trees [1] are a library under development providing tools to define denotation semantics of effectfuls programs. They extend Interaction Trees [2] and Choice Trees [3] to provide support for a wider range of effects. Abstractly, they define the initial iterative monad supporting user specified effects, for any effect represented as polynomial monad. Concretely, one can think of them as possibly infinite trees such that the computation of each layer of the tree may perform a specified monadic effect.

The groundwork is done for the general meta-theory: the structure is equipped with guarded iteration (for strong bisimilarity) and general iteration (for a flavour of weak bisimilarity). Little applications have been developped yet: we propose through this internship to fill this gap by studying the theory of mtrees with a combination of state and nondeterminism, and use them to give a denotational model to a language with cooperative scheduling in the style of [4].
References:
[1] : https://gitlab.inria.fr/yzakowsk/mtrees
[2] : https://dl.acm.org/doi/10.1145/3371119
[3] : https://dl.acm.org/doi/10.1145/3571254
[4] : https://homepages.inf.ed.ac.uk/gdp/publications/Semantics_Cooperative_Threads_journal.pdf

Extending Vellvm to Concurrency

Supervision:

Yannick Zakowski (Inria Paris, Cambium Team) with collaboration with Steve Zdancewic (University of Pennsylvania)

Localisation:

Inria Paris, Cambium Team

Context:

Vellvm [1,2] is a formal semantics for LLVM IR, the intermediate representation at the heart of the LLVM compiling infrastructure. The semantics is defined in a quite original fashion: it a denotation defining a monadic interpreter based on Interaction Trees. The resulting model enjoys nice compositional reasoning properties, and is executable by extraction, leading to a certified interpreter.

Recently, Choice Trees [3] have been introduced to extend Interaction Trees with support for non-determinism. This addition allows for the definition of an interleaving semantics for concurrency, as has been explored on a minimal language in [4]. In this internship, we propose to adapt these techniques to first replace the existent non-deterministic parts of Vellvm with CTrees, then extend it with a support for threads.
References:
[1] : https://github.com/vellvm/vellvm
[2] : https://dl.acm.org/doi/10.1145/3473572
[3] : https://dl.acm.org/doi/10.1145/3571254
[4] : https://dl.acm.org/doi/10.1145/3703595.3705890