# «Abstract. Considerable eﬀort has gone into the techniques of extracting executable code from formal speciﬁcations and animating them. We show how ...»

Animating the Formalised Semantics

of a Java-like Language

Andreas Lochbihler1 and Lukas Bulwahn2

Karlsruher Institut f¨r Technologie, andreas.lochbihler@kit.edu

u

Technische Universit¨t M¨nchen, bulwahn@in.tum.de

a u

Abstract. Considerable eﬀort has gone into the techniques of extracting executable code from formal speciﬁcations and animating them. We

show how to apply these techniques to the large JinjaThreads formalisation. It models a substantial subset of multithreaded Java source and bytecode in Isabelle/HOL and focuses on proofs and modularity whereas code generation was of little concern in its design. Employing Isabelle’s code generation facilities, we obtain a veriﬁed Java interpreter that is suﬃciently eﬃcient for running small Java programs. To this end, we present reﬁned implementations for common notions such as the reﬂexive transitive closure and Russell’s deﬁnite description operator. From our experience, we distill simple guidelines on how to develop future formalisations with executability in mind.

1 Introduction In the last few years, substantial work has been devoted to the techniques and tools for executing formal speciﬁcations from Isabelle/HOL, on the levels of both prover infrastructure [5,8,9] and formalisations of foundational notions and concepts [6,11,18]. But so far, applications (e.g. [4,19,20]) have been designed for executability and restricted to purely functional speciﬁcations. A benchmark to test whether the aforementioned techniques mix well and scale to large formalisations has been missing.

In this work, we study how to apply code generation techniques to the JinjaThreads project [15,16,17], which formalises a substantial subset of multithreaded Java source and bytecode. JinjaThreads constitutes a good benchmark for three reasons: (i) It is a large formalisation (70k lines of deﬁnitions and proofs) that involves a broad range of advanced Isabelle features. (ii) As a programming language, type system, and semantics, it has a built-in notion of execution. This sets the goal for what should be executable. (iii) It focuses on proofs and modularity rather than code generation, i.e. complications in speciﬁcations and proofs for the sake of direct code generation were out of the question. Hence, it tests if code generation works “in the wild” and not only for specialised developments.

Our main contribution here is to discuss what was needed to automatically generate a well-formedness checker and an interpreter for JinjaThreads programs from the Isabelle formalisation, and what the pitfalls were. Thereby, we demonstrate how to combine the diﬀerent techniques and tools such that changes to the existing formalisation stay minimal. Our contributions fall into two parts.

On the system’s side, we enhanced Isabelle’s code generator for inductive predicates (§2.1) to obtain a mature tool for our needs. It now compiles inductive deﬁnitions and ﬁrst-order predicates, interpreted as logic programs, to functional implementations. Furthermore, we present a practical method to overcome the poor integratability of Isabelle’s code generator into Isabelle’s module system (§2.2). Finally, we describe a tabled implementation of the reﬂexive transitive closure (§2.3) and an executable version of Russell’s deﬁnite description operator (§2.4), which are now part of the Isabelle/HOL library.

On JinjaThreads’ side, we animated the formalisation (see §3.1 for an overview) through code generation: Many of its inductive deﬁnitions, we had to reﬁne for compilation or, if this was impossible, implement manually (§3.2 and §3.3).

To obtain execution traces of JinjaThreads programs, we adapted the state representation and formalised two schedulers (§3.4). In §3.5, we explain how to add memoisation to avoid frequently recomputing common functions, e.g. lookup functions, without polluting the existing formalisation. Clearly, as the generated code naively interprets source code programs, we cannot expect it to be as eﬃcient as an optimising Java virutal machine (JVM). Nevertheless, we evaluated the performance of the generated interpreter (§3.6). Simple optimisations that we describe there speed up the interpreter by three orders of magnitude. Hence, it is suﬃciently eﬃcient to handle Java programs of a few hundred lines of code.

We conclude our contributions by distilling our experience into a few guidelines on how to develop formalisations to be executable ones. Rather than imposing drastic changes on the formalisation, they pinpoint common pitfalls. §4 explains why and how to avoid them.

The interpreter and the full formalisation is available online in the Archive of Formal Proofs [17]. To make the vast supply of Java programs available for experimenting and testing with the semantics, we have written the (unveriﬁed) conversion tool Java2Jinja as a plug-in to the Eclipse IDE. It converts Java class declarations into JinjaThreads

**Abstract**

syntax. The latest development version is available at http://pp.info.uni-karlsruhe.de/git/Java2Jinja/.

1.1 Related work Code generation (of functional implementations) from Isabelle/HOL is a wellestablished business. Mari´ [19] presents a formally veriﬁed implementation of c a SAT solver. In the CeTA project, Thiemann and Sternagel [20] generate a self-contained executable termination checker for term rewriting systems. The Flyspeck project uses code generation to compute the set of tame graphs [4]. All these formalisations were developed with executability in mind. Complications in proofs to obtain an eﬃciently executable implementation were willingly taken and handling them are large contributions of these projects.

Code generation in Coq [13] has been used in various developments, notably the CompCert compiler [12] and the certiﬁcate checkers in the MOBIUS project [3]. Like in Isabelle, functional speciﬁcations pose no intrinsic problems.

Although code extraction is in principle possible for any Coq speciﬁcation, mathematical theories can lead to “a nightmare in term of extracted code eﬃciency and readability” [13]. Hence, Coq’s users, too, are facing the problem of how to extract (roughly) eﬃcient code from speciﬁcations not aimed towards executability. ACL2 and PVS translate only functional implementations to Common Lisp.

In [5], we have reported on generating code from non-functional speciﬁcations. Recently, Nipkow applied code generation for inductive predicates to animate the semantics and various program analyses of an educational imperative language (personal communication). All these applications were tiny formalisations compared to JinjaThreads.

Some formalisations of the JVM in theorem provers are directly executable.

The most complete is the M6 model of a JVM by Lui and Moore [14] in ACL2, which covers the CLDC speciﬁcation. Farzan et al. [7] report on a JVM formalisation in Maude’s rewriting logic. ACL2’s and Maude’s logics are directly executable, i.e., they force the user to write only executable formalisations. While JinjaThreads studies meta-language properties like type safety for a uniﬁed model of Java and Java bytecode, these JVM formalisations aim at verifying properties of individual programs. Atkey [1] presents an executable JVM model in Coq. He concentrates on encoding defensive type checks as dependent types, but does not provide any data on the eﬃciency.

1.2 Background: the Code Generator Framework and Reﬁnement Isabelle’s code generator [9] turns a set of equational theorems into a functional program with the same equational rewrite system. As it builds on equational logic, the translation guarantees partial correctness by construction and the user may easily reﬁne programs and data without aﬀecting her formalisation globally.

Program reﬁnement can separate code generation issues from the rest of the formalisation. As any (executable) equational theorem suﬃces for code generation, the user may locally derive new (code) equations to use upon code generation.

Hence, existing deﬁnitions and proofs remain unaﬀected, which has been crucial for JinjaThreads.

For data reﬁnement, the user may replace constructors of a datatype by other constants and derive equations that pattern-match on these new (pseudo-)constructors. Neither need the new constructors be injective and pairwise disjoint, nor exhaust the type. Again, this is local as it aﬀects only code generation, but not the logical properties of the reﬁned type. Conversely, one cannot exploit the type’s new structure inside the logic. Only type constructors can be reﬁned;

some special types (such as a ⇒ b option for maps) must ﬁrst be wrapped in an (isomorphic) type of their own (e.g. ( a, b) mapping).

Isabelle’s standard library deﬁnes such special-purpose types for sets and maps with standard operations. Associative lists and red-black trees implement them via data reﬁnement. FinFuns [18] are almost-everywhere constant functions; they provide an executable universal quantiﬁer thanks to data reﬁnement to associative lists. The Isabelle Collections Framework (ICF) [11] advocates dealing with reﬁnement in the logic instead of hiding it in the code generator. Locales, i.e. Isabelle modules, specify the abstract operations, concrete implementations interpret them. This allows for executing truly underspeciﬁed functions.

2 Code Generation in Isabelle In this section, we present our contributions that JinjaThreads has motivated, but that are generally applicable. Consequently, they have been integrated into Isabelle’s main system and library. First, we present the code generator for inductive predicates and our improvements to it (§2.1). Then, we describe our approach to overcome the problematic situation with code generation and locales (§2.2). Finally, we sketch formalisations for enhanced implementations for the reﬂexive transitive closure (§2.3) and the deﬁnite description operator (§2.4), which are employed in JinjaThreads’ type system, for example.

2.1 The Predicate Compiler The predicate compiler [5] translates speciﬁcations of inductive predicates, i.e. the introduction rules, into executable equational theorems for Isabelle’s code generator. The translation is based on the notion of modes. A mode partitions the arguments into input and output. For a given predicate, the predicate compiler infers the set of possible modes such that all terms are ground during execution.

Lazy sequences handle the non-determinism of inductive predicates. By default, the equations implement a Prolog-style depth-ﬁrst execution strategy. Since its

**initial description [5], we improved the predicate compiler in four aspects:**

First, mode annotations restrict the generation of code equations to modes of interest. This is necessary because the set of modes is exponential in the number of arguments of a predicate. Therefore, the space and time consumption of the underlying mode inference algorithm grows exponentially in that number; for all applications prior to JinjaThreads, this has never posed a problem. In case of many arguments (up to 15 in JinjaThreads), the plain construction of this set of modes burns up any available hardware resource. To sidestep this limitation, modes can now be declared and hence they are not inferred, but only checked to be consistent.

Second, we also improved the compilation scheme: The previous one sequentially checked which of the introduction rules were applicable. Hence, the input values were repeatedly compared to the terms in the conclusion of each introduction rule by pattern matching. For large speciﬁcations, such as JinjaThreads’ semantics (contains 88 rules), this naive compilation made execution virtually impossible due to the large number of rules. To obtain an eﬃcient code expression, we modiﬁed the compilation scheme to partition the rules by patterns of the input values ﬁrst and then only compose the matching rules – this resembles similar techniques in Prolog compilers, such as clause indexing and switch detection.

We report on the performance improvements due to this modiﬁcation in §3.6.

Third, the predicate compiler now oﬀers non-intrusive program reﬁnement, i.e., the user can declare alternative introduction rules. For an example, see §3.3.

Fourth, the predicate compiler was originally limited to the restricted syntactic form of introduction rules. We added some preprocessing that transforms deﬁnitions in predicate logic to a set of introduction rules. Type-safe method overriding (§3.2) gives an example.

2.2 Isabelle Locales and Code Generation Locales [2] in Isabelle allow parametrised theory and proof development. In other words, locales allow to prove theorems abstractly, relative to a set of ﬁxed parameters and assumptions. Interpretation of locales transfers theorems from their abstract context to other (concrete) contexts by instantiating the parameters and proving the assumptions. JinjaThreads uses locales to abstract over diﬀerent memory consistency models (§3.3) and schedulers (§3.4), and to underspecify operations on abstract data structures.

As code generation requires equational theorems in the (foundational) theory context, theorems that reside in the context of a locale cannot serve as code equations directly, but must be transferred into the theory context. For example, consider a locale L with one parameter p, one assumption A p and one deﬁnition f =... that depends on p. Let g be a function in the theory context for which A (g z) holds for all z. We want to generate code for f where p is instantiated to g z.

**The Isabelle code generator tutorial proposes interpretation and deﬁnition:**

One instantiates p by g z and discharges the assumption with A (g z), for arbitrary z. This yields the code equation f (g z) =... which is ill-formed because the left-hand side applies f to the non-constructor constant g. For code generation, one must manually deﬁne a new function f’ by f’ z = f (g z) and derive f’ z =... as code equation. This approach is unsatisfactory for two reasons: It requires to manually re-deﬁne all dependent locale deﬁnitions in the theory context (and for each interpretation), and the interpretation must be unconditional, i.e., A (g z) must hold for all z. In JinjaThreads, the latter is often violated, e.g. g z satisﬁes A only if z is well-formed.