Symbolic Execution of Core-LLZK Programs

The goal of symbolic execution of core-LLZK is to generate an SMT2 formula that faithfully represents the big-step semantics of the witness program (also referred to simply as the witness or the program). This formula can then be used, for example, to prove that the witness and the corresponding circuit are equivalent — that is, that they compute the same function.

To illustrate the idea, consider a program that consists of two instructions: y = felt.mul y x followed by z = felt.add y z. Assuming that x, y and z are represented by constraint variables $X_{0}$ , $Y_{0}$ , and $Z_{0}$ , the first instruction is encoded as $Y_{1} = Y_{0} \cdot X_{0}$ , where $Y_{1}$ is a fresh variable representing the updated value of y. The second instruction is then encoded as $Z_{1} = Y_{1} + Z_{0}$ . The formula that represents the program is then

$Y_{1} = Y_{0} \cdot X_{0} \land Z_{1} = Y_{1} + Z_{0}$

Note that, for clarity, we write constraints using standard (mathematical) format logical formulas, where the arithmetic operations are interpreted over a finite field. Translating them into SMT2 format is straightforward.

In what follows we will describe the encoding of the different instructions of the language, how they are composed, and how a function can be encoded in a modular way as SMT2 macros. Then, in the last section, we explain where the implementation of the symbolic execution can be found, and how it can be used.

[!NOTE]

All concrete values and operations used in the rest of this document are over a finite-field with respect to a given prime p that fits in exactly k bits, unless stated otherwise. We will thus sometimes drop the term finite-field.

Assigning constraint variables to program variables

The first thing we need to model is how to relate program variables to their corresponding constraint variables. For this we will use a mapping $T$ (referred to as a symbolic environment) such that $T [x]$ , where $x$ is a program variable, returns one of the following values:

A concrete finite-field value.
A constraint variable.
A symbolic (array) environment $T^{'}$ such that $T^{'} [i]$ represents the symbolic value of the $i$ -th position of the array $x$ (the value can be either a concrete finite-field value or a constraint variable; it cannot be an array again).

Why we need the concrete finite-field value in the symbolic environment $T$ ?

This is needed in order to achieve one of the powerful features of the translation, which executes the instructions when all variables have concrete values and thus avoid generating new constraint variables and corresponding formulas. It also handles aliasing, i.e., if we have an instruction like x = y, then we will not generate a new variable for x and add a formula stating the equality, but rather will assign to x the current value of y and avoid generating a new formula.

We will use the syntax $T^{'} = T [x \mapsto X_{i}]$ for a symbolic environment that is obtained from $T$ by setting $T [x]$ to $X_{i}$ , replacing its current value if any.

For simplicity, when we have a simple expression sexp, which can be a variable or a value, then abusing notation we assume that $T [sexp]$ returns sexp itself when it is a finite-field value.

Encoding of expressions

Encoding of an expression, in the context of a symbolic environment $T$ , generates a formula $F$ that encodes the result of evaluating the expression.

Encoding of expressions does not modify the symbolic environment $T$ , it simply uses the constraint variables of $T$ and binds the result to a given output variable $V_{o}$ . Note that all variables that appear in an expression do not correspond to arrays, otherwise the program is ill-typed (array accesses are handled using dedicated instructions).

Next we describe the encodings of the different expressions as they are defined in the core-LLZK language.

Arithmetic

`sexp` (Identity)

The encoding is the formula $V_{o} = T [sexp]$

`felt.neg sexp` (Negation)

The encoding is the formula $V_{o} = - T [sexp]$

`felt.add sexp1 sexp2` (Addition)

The encoding is the formula $V_{o} = T [sexp1] + T [sex2]$

`felt.sub sexp1 sexp2` (Subtraction)

The encoding is the formula $V_{o} = T [sexp1] - T [sex2]$

`felt.mul sexp1 sexp2` (Multiplication)

The encoding is the formula $V_{o} = T [sexp1] \cdot T [sexp2]$

`felt.div sexp1 sexp2` (Multiplication by modular inverse)

The encoding is the formula $V_{o} \cdot T [sexp2] = T [sexp1]$

Bitwise

Encoding of bitwise operations heavily relies on the binary expansion of a given constraint variable $X$ (it also works when $X$ is a finite-field value). This operation is denoted by $bitify (X, n)$ , i.e., the binary expansion of $X$ using $n$ bits. We assume that it generates a formula that is a conjunction of the following constraints:

$X = \sum_{i = 0}^{n - 1} 2^{i} \cdot X_{b_{i}}$ , where $X_{b_{0}}, \dots, X_{b_{n - 1}}$ are fresh finite-field variables representing the bits of $X$ .
$⋀_{i = 0}^{n - 1} X_{b_{i}} \cdot (1 - X_{b_{i}}) = 0$ to state that the bits can bey either $0$ or $1$ . The constraint $X_{b_{i}} \cdot (1 - X_{b_{i}}) = 0$ can also be replaced by $range (X_{b_{i}}, 0, 1)$ when range constraints are allowed.

Recall the the finite-field is with respect to a prime $p$ that fits in $k$ bits.

`bit.and sexp1 sexp2` (Bitwise AND)

Let $F_{1}$ , $F_{2}$ , and $F_{3}$ be the encodings corresponding to $bitify (T [sexp1], k)$ , $bitify (T [sexp2], k)$ , and $bitify (V_{o}, k)$ . Let $X_{b_{i}}$ denote the $i$ -th bit of $T [sexp1]$ and $Y_{b_{i}}$ the $i$ -th bit of $T [sexp2]$ , produced by the respective $bitify$ calls.

The encoding is

$F_{1} \land F_{2} \land F_{3} \land (i = 0 ⋀ k - 1 V_{o_{b_{i}}} = X_{b_{i}} \cdot Y_{b_{i}})$

As an optimization, when sexp1 is a constant that fits in $m$ bits, the encoding of $T [sexp1]$ and $V_{o}$ can be done with respect to $m$ bits instead of $k$ bits, and then use the following encoding

$F_{1} \land F_{2} \land F_{3} \land (i = 0 ⋀ m - 1 V_{o_{b_{i}}} = X_{b_{i}} \cdot Y_{b_{i}})$

This is valid because all bits $V_{o_{b_{i}}}$ with $i \geq m$ are $0$ . Here we save $k - m$ variables, which can be important for scalability during the verification phase. We can apply a similar optimization for the case when sexp2 is a constant.

`bit.or sexp1 sexp2` (Bitwise OR)

Let $F_{1}$ , $F_{2}$ , and $F_{3}$ be the formulas corresponding to $bitify (T [sexp1], k)$ , $bitify (T [sexp2], k)$ , and $bitify (V_{o}, k)$ . Let $X_{b_{i}}$ denote the $i$ -th bit of $T [sexp1]$ and $Y_{b_{i}}$ the $i$ -th bit of $T [sexp2]$ , produced by the respective $bitify$ calls.

The encodingis

$F_{1} \land F_{2} \land F_{3} \land (i = 0 ⋀ k - 1 V_{o_{b_{i}}} = X_{b_{i}} + Y_{b_{i}} - X_{b_{i}} \cdot Y_{b_{i}})$

`bit.xor sexp1 sexp2` (Bitwise XOR)

The encoding is

$F_{1} \land F_{2} \land F_{3} \land (i = 0 ⋀ k - 1 V_{o_{b_{i}}} = X_{b_{i}} + Y_{b_{i}} - 2 \cdot X_{b_{i}} \cdot Y_{b_{i}})$

`bit.not sexp` (Bitwise NOT)

Let $F_{1}$ and $F_{2}$ be the formulas corresponding to $bitify (T [sexp], k)$ and $bitify (V_{o}, k)$ . Let $X_{b_{i}}$ denote the $i$ -th bit of $T [sexp]$ , produced by the respective $bitify$ call.

The encoding is

$F_{1} \land F_{2} \land (i = 0 ⋀ k - 1 V_{o_{b_{i}}} = 1 - X_{b_{i}})$

`bit.shl sexp1 sexp2` (Left shift)

The encoding of left-shift considers two cases: the first handles the case when sexp2 is a value, and the other when it is not. In practice, we rarely find the second case. Next we explain the two cases.

The case when `sexp2` is a constant

Let $F_{1}$ and $F_{2}$ be the formulas corresponding to $bitify (T [sexp1], k)$ and $bitify (V_{o}, k)$ . Let $X_{b_{i}}$ denote the $i$ -th bit of $T [sexp1]$ , produced by the respective $bitify$ call. Let the value of sexp2 be $m$ . The encoding is

$F_{1} \land F_{2} \land (i = 0 ⋀ m - 1 V_{o_{b_{i}}} = 0) \land (i = m ⋀ k - 1 V_{o_{b_{i}}} = X_{b_{i - m}})$

The general case

The second case is more elaborated, and is based computing the binary expansion of $T [sexp2]$ using $⌈ lo g_{2} k ⌉$ bits, i.e., $bitify (T [sexp2], ⌈ lo g_{2} k ⌉)$ , and then iteratively left-shift by $i$ position when the corresponding bit of $T [sexp2]$ is 1.

`bit.shr sexp1 sexp2` (Right shift)

The encoding of right-shift considers two cases: the first handles the case when sexp2 is a value, and the other when it is not. In practice, we rarely find the second case. Next we explain the two cases.

The case when `sexp2` is a constant

$F_{1} \land F_{2} \land (i = 0 ⋀ k - m - 1 V_{o_{b_{i}}} = X_{b_{i + m}}) \land (i = k - m ⋀ k - 1 V_{o_{b_{i}}} = 0)$

The general case

It is based on the same idea as the general case of bit.shl.

Boolean

We will rely on formulas of the form $ite (F, V_{1}, V_{2})$ , interpreted as: if $F$ holds then $V_{1}$ otherwise $V_{2}$ . Note that when $F$ is a bit variable this can be expressed arithmetically as $F \cdot V_{1} + (1 - F) \cdot V_{2}$ , however, keeping the $ite$ form may provide important explicit information during the verification process (the one that uses the encoding of the witness program).

Note that Boolean values are simulated using finite-field values, where $0$ represents false and any other value is true.

`bool.eq sexp1 sexp2` (Equality)

The encoding is

$V_{o} = ite (T [sexp1] = T [sexp2], 1, 0) \land V_{o} \cdot (1 - V_{o}) = 0$

`bool.neq sexp1 sexp2` (Inequality)

The encoding is

$V_{o} = ite (T [sexp1] = T [sexp2], 0, 1) \land V_{o} \cdot (1 - V_{o}) = 0$

`bool.and sexp1 sexp2` (Logical AND)

The encoding is

$V_{o} = ite (T [sexp1] = 0 \lor T [sexp2] = 0, 0, 1) \land V_{o} \cdot (1 - V_{o}) = 0$

`bool.or sexp1 sexp2` (Logical OR)

The encoding is

$V_{o} = ite (T [sexp1] = 0 \land T [sexp2] = 0, 0, 1) \land V_{o} \cdot (1 - V_{o}) = 0$

`bool.not sexp` (Logical NOT)

The encoding is

$V_{o} = ite (T [sexp] = 0, 1, 0) \land V_{o} \cdot (1 - V_{o}) = 0$

`bool.lt sexp1 sexp2` (Signed less than)

First recall that we deal with signed values. Thus, comparisons interpret field elements as signed integers. The order of the field elements is defined as

$mid, ..., p - 1, 0, ..., mid - 1$

where $mid = \frac{p}{2} + 1$ . The idea is that $mid, ..., p - 1$ represent negative numbers.

There are two special cases that we consider, which improve the overall performance of the verification process. They arise when sexp1 or sexp2 are constant values. We describe these cases first, followed by the general case when both are variables. Note that the case when both sexp1 and sexp2 are constant is handled when executing the corresponding command (since both are constant, the comparison is simply evaluated).

The case when `sexp1` is a constant

Let $v$ be the value of sexp1, so we want to encode the signed comparison $v < sexp2$ . The encoding is divided into several cases; in all of them the encoding is

$V_{o} = ite (F^{'}, 1, 0) \land V_{o} \cdot (1 - V_{o}) = 0$

where $F^{'}$ is:

if $v =_{N} mid - 1$ , then $F^{'}$ is false, because $mi d - 1$ represents the largest non-negative value.
if $v <_{N} mid - 1$ , then $F^{'}$ is $range (T [sexp2], v + 1, mid - 1)$ , because $T [sexp2]$ can be any positive value larger than $v$ .
if $v =_{N} p - 1$ , then $F^{'}$ is $range (T [sexp2], 0, mid - 1)$ , because $T [sexp2]$ can be any non-negative number.
if $v \geq_{N} mi d$ , then $F^{'}$ is $range (T [sexp2], v + 1, p - 1) \lor range (T [sexp2], 0, mid - 1)$ , because $v$ is negative, so $T [sexp2]$ can be any positive or negative value larger than $v$ .

The case when `sexp2` is a constant

Let $v$ be the value of sexp2, so we want to encode the signed comparison $sexp1 < v$ . The encoding is divided into several cases; in all of them the encoding is

$V_{o} = ite (F^{'}, 1, 0) \land V_{o} \cdot (1 - V_{o}) = 0$

where $F^{'}$ is defined separately for each case:

if $v =_{N} mid$ , then $F^{'}$ is false, because $mi d$ represents the smallest negative value.
if $v >_{N} mid$ , then $F^{'}$ is $range (T [sexp1], mid, v - 1)$ , because $v$ is negative and thus $T [sexp1]$ is a negative number smaller than $v$ .
if $v =_{N} 0$ , then $F^{'}$ is $range (T [sexp1], mid, p - 1)$ , because $T [sexp1]$ must be negative.
if $0 <_{N} v <_{N} mid$ , then $F^{'}$ is $range (T [sexp1], 0, v - 1) \lor range (T [sexp1], mid, p - 1)$ , because $v$ is positive so $T [sexp1]$ can be negative or non-negative smaller than $v$ .

The general case

We assume that both sexp1 and sexp2 are not constant values, so we want to encode the signed comparison $sexp1 < sexp2$ . Let $F_{1}$ and $F_{2}$ be the formulas corresponding to $bitify (T [sexp1], k)$ and $bitify (T [sexp2], k)$ . Let $X_{b_{i}}$ denote the $i$ -th bit of $T [sexp1]$ and $Y_{b_{i}}$ the $i$ -th bit of $T [sexp2]$ , produced by the respective $bitify$ calls.

The idea is to compare the bits from the most to the least significant, until we find $i$ such that $X_{b_{i}} = 0 \land Y_{b_{i}} = 1$ , in which case the comparison is true, otherwise it is false. This can be done using the encoding

$F_{1} \land F_{2} \land V_{o} = G_{k} \land V_{o} \cdot (1 - V_{o}) = 0$

where $G_{i}$ is recursively defined as:

$G_{0} = 0$
$G_{i} = ite (X_{b_{i}} = 0 \land Y_{b_{i}} = 1, 1, G_{i - 1})$

`bool.gt sexp1 sexp2` (Signed greater than)

This is done by using the encoding of bool.lt sexp2 sexp1.

`bool.le sexp1 sexp2` (Signed less or equal)

It is computed as the negation of bool.lt sexp2 sexp1. Suppose $F_{1}$ is the encoding of bool.lt sexp2 sexp1 using an auxiliary output variable $V_{o}^{'}$ , then the encoding is

$F_{1} \land V_{o} = 1 - V_{o}^{'} \land V_{o} \cdot (1 - V_{o}) = 0$

`bool.ge sexp1 sexp2` (Signed greater or equal)

This is done by using the encoding of bool.le sexp2 sexp1.

Encoding of commands

Next we describe how commands and lists of commands are encoded. Any encoding of a command receives as input a command $C$ and a symbolic environment $T$ , and produces $(T, F, T^{'})$ where $F$ is the corresponding formula, and $T^{'}$ is a new symbolic environment that results from $T$ by modifying the values of some variables (due to the symbolic execution of $C$ ).

Executing a list of commands $[C_{1}, \dots, C_{n}]$ is done recursively as follows:

The symbolic execution of an empty list generates $(T, true, T)$ .
The symbolic execution of $[C_{1}, \dots, C_{n}]$ is done in two steps. We first execute $C_{1}$ using $T$ and obtain $(T, F, T^{'})$ , then recursively execute $[C_{2}, \dots, C_{n}]$ using $T^{'}$ and obtain $(T^{'}, F^{'}, T^{''})$ ; the overall encoding is then $(T, F \land F^{'}, T^{''})$ .

The symbolic execution of a function $foo$ is supposed to generate a macro that we denote as

$foo (I, O, L) = F$

where $I$ and $O$ are sequences of constraint variables obtained from the input and output parameters of function $foo$ , and $L$ is a sequence of local variables used in the formula $F$ (i.e., all variables used in $F$ that do not appear in $I$ or $O$ ). We will explain how this encoding is generated later, but for now a brief description suffices since we will rely on it when encoding function calls.

Next we describe the encodings of the different commands as they are defined in the core-LLZK language.

Assignment

The encoding of an assignment id = exp starts by trying to concretely evaluate exp, and if all variables used in exp have constant values in $T$ , the evaluation succeeds and results in a value $v$ . We then generate $T^{'} = T [id \mapsto v]$ , and the encoding is $(T, true, T^{'})$ .

If exp cannot be concretely evaluated, we symbolically evaluate exp using $T$ and a fresh output variable $V_{o}$ and obtain the encoding $F$ . Then we generate $T^{'} = T [id \mapsto V_{o}]$ , and the encoding is $(T, F, T^{'})$ .

Arrays

Creating a new array

Creating an array is done using the command array.new sexp id.

To symbolically execute this command, we first evaluate sexp to a concrete value $n$ that represents the size of the array (the size of an array must be known during symbolic execution). Then we generate a new symbolic array environment $T_{id}$ such that $T_{id} [i] = 0$ for all $i \in [0.. n]$ , and set $T^{'} = T [i d \mapsto T_{id}]$ . The encoding is then $(T, true, T^{'})$ .

Accessing an array element

Accessing an array element is done using the command array.read id1[sexp] id2, which retrieves the value at position sexp from array id1, and stores it in variable id2.

To symbolically execute this command, we first let $T_{id} = T [id1]$ , which is the symbolic environment of the array id1. Then we handle two cases separately: the first when the index $T [id2]$ is constant, and the other when it is not.

The case of a constant index

If $T [sexp]$ evaluates to a constant index $v$ , we generate $T^{'} = T [id2 \mapsto T_id1 [v]]$ , and the encoding is then $(T, true, T^{'})$ .

The case of a non-constant index

If the index $T [sexp]$ evaluates to a variable $V_{se x p}$ , we have to consider all possible values for the index. We let $n$ be the size of the array, which is supposed to be known during symbolic execution (it is part of the environment $T [id1]$ ).

Considering all possible values for the index can be done using $G_{n}$ where:

$G_{0} = false$
$G_{i} = ite (V_{se x p} = i - 1, V_{o} = T_{id1} [i - 1], G_{i - 1})$

where $V_{o}$ is a fresh variable. Note that this simulates an if-then-else to identify which index was accessed.

Next we generate the output symbolic environment $T^{'} = T [id2 \mapsto V_{o}]$ , and let the encoding be $(T, G_{n}, T^{'})$ .

Updating an array element

Updating an array element is done using the command array.write sexp1 id[sexp2], which updates the value at position sexp2 to the value of sexp1.

To symbolically execute this command, we first let $T_{id} = T [id]$ , which is the symbolic environment of the array id. Then we handle two cases separately: the first when the index $T [sexp2]$ is constant, and the other when it is not.

The case of a constant index

If $T [sexp2]$ evaluates to a constant index $v$ , we generate $T_{id}^{'} = T_{id} [v \mapsto T [sexp1]]$ , then $T^{'} = T [id \mapsto T_{id}^{'}]$ , and finally the encoding is $(T, true, T^{'})$ .

The case of a non-constant index

If $T [sexp2]$ evaluates to a variable $V_{sexp2}$ , we have to consider all values for the index. We let $n$ be the size of the array, which is supposed to be known during symbolic execution.

We first generate new fresh variables for all positions of the array, to represent the values after the update. Let us name them $V id 0, \dots, V id n - 1$ . Let $T^{'}_id$ be a new array environment such that $T_{id}^{'} [i] = V_{id_{i}}$ for all $i \in [0.. n - 1]$ .

We denote by $U_{i}$ a formula that simulates an update to the $i$ -th position of the array, i.e., assigns $T [sexp1]$ to $V_{id_{i}}$ , and the rest of position keep their old values. This can be modeled as follows:

$V_{id_{i}} = T [sexp1] \land (j \neq = i \in [0.. n - 1] ⋀ V_{id_{j}} = T_{id} [j]) .$

Then, to consider all possible cases, we can use an if-then-else structure as in the following recursive definition:

$G_{0} = false$
$G_{i} = ite (V_{sexp2} = i - 1, U_{i - 1}, G_{i - 1})$

The encoding is then $(T, G_{n}, T^{'})$ .

Copying an array

Copying an array from one variable to another is done using the command array.copy id1 id2. The encoding simply updates the value of id2 (in $T$ ) to that of id1. Let $T^{'} = T [id2 \mapsto T [id1]]$ , then the encoding is $(T, true, T^{'})$ .

Conditionals

A conditional statement is of the form if sexp1==sexp2 { tb } else { te }, where tb and te are sequences of commands. The encoding is done by combining the encodings of tb and te.

Let $(T, F_{1}, T_{1})$ and $(T, F_{2}, T_{2})$ be the encodings of tb and te respectively. The encoding starts by creating a new environment $T^{'}$ that merges $T_{1}$ and $T_{2}$ for the variables that are live immediately after the if-statement (we infer live variables using liveness analysis). For each such live variable $x$ : if $T_{1} [x]$ and $T_{2} [x]$ agree, then $T^{'} [x] = T_{1} [x]$ ; otherwise we introduce a fresh variable $V_{x}$ , add $V_{x} = T_{1} [x]$ to $F_{1}$ and $V_{x} = T_{2} [x]$ to $F_{2}$ , and set $T^{'} [x] = V_{x}$ . Assuming that at the end of this process we obtain $T^{'}$ , $F_{1}^{'}$ , and $F_{2}^{'}$ , the encoding is $(T, F_{1}^{'} \lor F_{2}^{'}, T^{'})$ .

As an important optimization, if the condition sexp1==sexp2 can be concretely evaluated to $v$ , i.e., all used variables have concrete values, then we can use the encoding of tb or eb depending on $v$ .

Bounded Loops

A bounded loop is of the form repeat sexp { body }, and executes body for sexp iterations. Note that the value of sexp must be known statically.

Assume that $T [sexp] = n$ , i.e., the loop is executed $n$ times. The encoding of the loop is computed using the following recursive definition of $G_{i}$ , which represents the execution of the loop for $i$ iterations:

$G_{0}$ is simply $(T, true, T)$ since nothing is executed.
For $G_{i}$ , we first compute the encoding of body starting from $T$ , which results in $(T, F, T^{'})$ ; then we compute $G_{i - 1}$ with respect to $T^{'}$ , which results in $(T^{'}, F^{'}, T^{''})$ ; and the value of $G_{i}$ is $(T, F \land F^{'}, T^{''})$ .

The encoding of the loop is then defined as the result of $G_{n}$ .

Function Calls

A function call is of the form call foo(sexp1, ..., sexpn) to id1,...,idm, where sexp1, ..., sexpn are the input parameters and id1,...,idm are the output parameters. Recall that we have assumed that a function is encoded as a macro of the form

$foo (I, O, L) = F$

where $I$ is a sequence of constraint variables corresponding to the formal input parameters of $foo$ , $O$ is a sequence of constraint variables corresponding to the formal output parameters of $foo$ , and $L$ is a sequence of auxiliary variables (those used in $F$ that are not in $I$ or $O$ ).

The function call is encoded as call to the above macro according to the following steps:

We generate the actual input variables $I_{call}$ by concatenating the values of $T [sexp1], \dots, T [sexpn]$ . If any $T [sexp_i]$ is an array, then all its elements are inserted into $I_{call}$ .
We generate $T^{'}$ from $T$ by inserting a fresh variable for each output variable idi. For an output variable that is of array type, it is assigned an array of fresh variables.
We generate the actual output variables $O_{call}$ by concatenating the values of $T^{'} [id1], \dots, T^{'} [idm]$ . If any $T^{'} [id_i]$ is an array, then all its elements are inserted into $O_{call}$ .
We generate a sequence of fresh variables $L_{call}$ of the same length as $L$ (these are, in principle, existential variables).

The encoding of the call is then $(T, foo (I_{call}, O_{call}, L_{call}), T^{'})$ . Note that we keep it as a call to a macro, which is important when translating the formulas into SMT2 format to allow modular verification.

Encoding of functions

A function is of the form

def foo(x1:t, ...,xn:t) -> y1:t, ..., ym:t {
  body
}

and as explained earlier, its encoding generates a corresponding macro according to the following steps:

Generate an initial symbolic environment $T$ , where each xi is mapped to a fresh variable or an array of fresh variables, depending on its type. Let $I$ be the sequence of constraint variables corresponding to x1,...,xn.
Symbolically execute body starting from $T$ , which results in $(T, F_{1}, T^{'})$ .
Generate $T^{''}$ from $T^{'}$ by inserting a fresh variable for each output variable yi. For an output variable of array type, it is assigned an array of fresh variables. Let $F_{2}$ be a conjunction of equalities of the form $T^{'} [yi] = T^{''} [yi]$ (or $⋀_{j = 0}^{l - 1} T^{'} [yi] [j] = T^{''} [yi] [j]$ when yi is an array of size $l$ ) for $i \in [1.. m]$ . Let $O$ be the sequence of constraint variables corresponding to y1,...,ym taken from $T^{''}$ .
Let $L$ be the sequence of all variables used in $F_{1} \land F_{2}$ that are not in $I$ or $O$ .

The encoding is then:

$foo (I, O, L) = F_{1} \land F_{2}$

Encoding of a program

The encoding of a program is done by encoding all its functions, as macros, and adding a top-level formula that simulates a call to the main function.

Implementation

A symbolic execution engine has been implemented in Lean following the ideas described above, and can be found under translator/lean/llzk/Llzk/SymExec.

To compile it, first move to the directory translator/lean/llzk and run lake build. It can then be executed using the following command:

.lake/build/bin/llzk_cli -zk g64 -se -o output.smt2 input.core

This generates the SMT2 encoding of input.core into output.smt2. The g64 parameter selects the prime 18446744069414584321 with 64 bits. For debugging purposes, it can be replaced by f11 to use the prime 11 with 4 bits. Omitting the -o option prints the result to standard output.

The following command pretty-prints the input program:

.lake/build/bin/llzk_cli -zk g64 -pp -o output.smt2 input.core

and is useful for debugging. Full usage information can be obtained with:

.lake/build/bin/llzk_cli --help

Keyboard shortcuts

AVAZAR Project