Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Core LLZK Specification

The purpose of Core LLZK is to provide a language simple enough to allow a clean formalization of transforming witness generators to SMT formulas, yet rich enough to model LLZK programs.

Scope and Restrictions

To ensure feasible formalization, LLZK programs must adhere to the following restrictions:

  • Static Array Sizes: The size of an array cannot depend on input values.

  • Type Consistency: A variable accessed at a specific program point must always have the same type, regardless of the control flow path taken to reach that point. This ensures, for example, that if we access y[i], the dimensions of y is known statically.

  • No Recursion: Recursion is forbidden. This is enforced by requiring a total order on function definitions (a function may only call functions defined prior to itself).

  • Unrolled Loops: The language does not support dynamic loops. It supports bounded loops, which are intended to be unrolled during processing.

Informal Definition

We assume a prime number P and an architecture width of k bits, such that P < 2^k. For any number x, its FF-value (Finite Field value) is an integer in the range [0, P-1], calculated as x mod P. We first give the grammar, and then explain each part separately.

// numbers
N := a natural number
Z := an integer (will be interpreted as a finite field value)

// identifiers: sequence of _,a-z,A-Z,0-9,%,@,# or . (dot) that 
// does not start with # or a digit
id  := [_,a-z,A-Z,%,@,.] [_,a-z,A-Z,0-9,%,@,#,.]* 

// zero or more id separated by comma
ids := (id ("," id)*)?

// simple expression
sexp := id | Z

// zero or more simple expressions separated by comma
sexps := (sexp ("," sexp)*)?

// finite field operations
felt_bin_op := "felt.add" | "felt.sub" | "felt.mul" | "felt.div"
felt_unary_op := "felt.neg"

// bitwise operations
bit_bin_op := "bit.shl" | "bit.shr" | "bit.and" | "bit.or" | "bit.xor"
bit_unary_op := bit.not

// boolean operations -- use 0/1 for false and true
bool_bin_op := "bool.eq" | "bool.neq" | "bool.lt" | "bool.gt" | "bool.le" | "bool.ge" | "bool.and" | "bool.or"
bool_unary_op := "bool.not"

// binary, unary and no-operand operations
bin_op := felt_bin_op | bit_bin_op | bool_bin_op
unary_op := felt_unary_op | bit_unary_op | bool_unary_op

// expressions
exp := bin_op sexp sexp | unary_op sexp | sexp

// assignment
assignment := id "=" exp

// if statement
if  := "if" "(" sexp "==" sexp ")" "{" cmd* "}" [else "{" cmd* "}"]

// bounded loops
for := "repeat" sexpr "{" cmd* "}"

// array operations
narray := "array.new" sexp id
rarray := "array.read" id "[" sexp "]" id
warray := "array.write" sexp id "[" sexp "]"
carray := "array.copy" id id

// function call
fcall  := "call" id "(" sexps ")" ["to" ids]

// command
cmd := assignment | if | for | wconst | narray | rarray | warray | carray | fcall

// types
type := ff | arr<N>

// parameter
param := id ":" type

// zero or more parameters separated by comma
params := (param ("," param)*)?

// function definition
function := "func" id "(" params ")" ["->" params] "{" cmd* "}"

// program
prog := func*

Types

There are two primary types:

  1. ff: A scalar variable over the finite field.
  2. arr<N>: An array of N finite field elements (N is a constant).

Types must be declared for function input and output parameters. Local variable types are inferred dynamically upon assignment and checked upon usage.

[!NOTE]

Machine integers and booleans are simulated using the ff type.

Structure

A program consists of a set of functions. One function is designated as %main, serving as the entry point for generating the SMT formula.

Functions

A function definition follows this syntax:

def id(id1:t, ...,idn:t) -> id1:t, ..., idk:t {
  body
}
  • Parameters: Formal and return parameters follow the format %name:type.
  • Uniqueness: All parameter names are distinct. All return names are distinct.
  • Body: body is a sequence of commands.

Expressions

In what follow we explain the supported expressions by category.

Arithmetic

Semantics correspond to standard operations in the finite field .

  1. sexp (Identity)
  2. felt.neg sexp (Negation)
  3. felt.add sexp1 sexp2 (Addition)
  4. felt.sub sexp1 sexp2 (Subtraction)
  5. felt.mul sexp1 sexp2 (Multiplication)
  6. felt.div sexp1 sexp2 (Multiplication by modular inverse)
Bitwise

Semantics: The operands sexpi are converted to k-bit vectors (standard unsigned integer representation), the operation is applied, and the result is converted back to a finite field element (modulo P).

  1. bit.shl sexp1 sexp2 (Left shift)
  2. bit.shr sexp1 sexp2 (Right shift)
  3. bit.and sexp1 sexp2 (Bitwise AND)
  4. bit.or sexp1 sexp2 (Bitwise OR)
  5. bit.xor sexp1 sexp2 (Bitwise XOR)
  6. bit.not sexp1 (Bitwise NOT)

[!NOTE]

How conversion from bit-vectors to finite field should be done? Calculate the corresponding non-negative integer x and then compute x mod P?

Boolean

Comparisons interpret field elements as signed integers. The order is defined as mid, ..., P-1, 0, ..., mid-1, where mid = P/2+1.

[!NOTE]

The semantics on the right is not an encoding to SMT, it is just to write it a bit formally.

  1. bool.eq sexp1 sexp2: Equality. (sexp1=sexp2 -> result=1) and (~(sexp1=sexp2) -> result=0).
  2. bool.neq sexp1 sexp2: Inequality. (sexp1=sexp2 -> result=0) and (~(sexp1=sexp2) -> result=1).
  3. bool.gt sexp1 sexp2: Signed greater than. sexp1>sexp2 -> result=1) and (~(sexp1>sexp2) -> result=0).
  4. bool.lt sexp1 sexp2: Signed less than. (sexp1<sexp2 -> result=1) and (~(sexp1<sexp2) -> result=0).
  5. bool.ge sexp1 sexp2: Signed greater or equal. (sexp1>=sexp2 -> result=1) and (~(sexp1>=sexp2) -> result=0).
  6. bool.le sexp1 sexp2: Signed less or equal. (sexp1<=sexp2 -> result=1) and (~(sexp1<=sexp2) -> result=0).
  7. bool.not sexp: Logical NOT. (sexp=0 -> result=1) and (~(sexp=0) -> result=0).
  8. bool.or sexp1 sexp2: Logical OR. ((sexp1=0 and sexp2=0) -> result=0) and ((~(sexp1=0) or ~(sexp2=0)) -> result=1).
  9. bool.and sexp1 sexp2: Logical AND. (~(sexp1=0) and ~(sexp2=0)) -> result=1) and (((sexp1=0) or (sexp2=0)) -> result=0).

Commands

Next we describe the possible commands supported in the language.

Assignment
id = exp

Assigns the result of exp to id.

Arrays
  1. array.new sexp id: Allocates an array of sexp elements (type ff), and stores it in variable id. sexp must be a constant simple expression.
  2. array.read id1[sexp] id2: Reads the array id1 at index sexp into variable id2.
  3. array.write sexp1 id[sexp2]: Updates array id at index exp2 with value ofsexp1.
  4. array.copy id1 id2: Copies array id1 into id2. The previous value of id2 is overwritten. id1 must be an array.

[!IMPORTANT]

The size of an array must be computable at symbolic execution time, i.e., does not depend on input parameters.

[!IMPORTANT]

Efficient translation of array access/update id[sexp] to SMT formulas is only possible if the index sexp is computable at symbolic execution time.

Conditionals

if sexp1==sexp2 { body } else { body }

The else part is optional.

[!NOTE]

Types of all live variables at exit of both branches must coincide.

Bounded Loops
repeat sexp  { body }

Repeats body for sexp times. There is no loop counter, should be taken care of independently (i.e., initializing a variable before the loop and increment it inside body).

[!IMPORTANT]

'sexp' must be computable at symbolic execution time, i.e., does not depend on input parameters, otherwise we cannot unfold the loop.

Semantics:

  1. Evaluate sexp to a value N.
  2. Repeat body for N times.
Function Calls
call id(sexp1, ..., sexpn) to id1,...,idk

Executes function with name id. The return values (which may include arrays) are assigned to id1, ..., idk. The to keyword is optional if the functions does not return values.

SSA Support

The language does not natively support Static Single Assignment (SSA). It is structured code. However, the translator can simulate SSA by inserting phi-functions at control flow merge points (e.g., after if/else blocks).

Symbolic Execution of Core-LLZK Programs

The goal of symbolic execution of core-LLZK is to generate an SMT2 formula that faithfully represents the big-step semantics of the witness program (also referred to simply as the witness or the program). This formula can then be used, for example, to prove that the witness and the corresponding circuit are equivalent — that is, that they compute the same function.

To illustrate the idea, consider a program that consists of two instructions: y = felt.mul y x followed by z = felt.add y z. Assuming that x, y and z are represented by constraint variables , , and , the first instruction is encoded as , where is a fresh variable representing the updated value of y. The second instruction is then encoded as . The formula that represents the program is then

Note that, for clarity, we write constraints using standard (mathematical) format logical formulas, where the arithmetic operations are interpreted over a finite field. Translating them into SMT2 format is straightforward.

In what follows we will describe the encoding of the different instructions of the language, how they are composed, and how a function can be encoded in a modular way as SMT2 macros. Then, in the last section, we explain where the implementation of the symbolic execution can be found, and how it can be used.

[!NOTE]

All concrete values and operations used in the rest of this document are over a finite-field with respect to a given prime p that fits in exactly k bits, unless stated otherwise. We will thus sometimes drop the term finite-field.

Assigning constraint variables to program variables

The first thing we need to model is how to relate program variables to their corresponding constraint variables. For this we will use a mapping (referred to as a symbolic environment) such that , where is a program variable, returns one of the following values:

  • A concrete finite-field value.
  • A constraint variable.
  • A symbolic (array) environment such that represents the symbolic value of the -th position of the array (the value can be either a concrete finite-field value or a constraint variable; it cannot be an array again).

Why we need the concrete finite-field value in the symbolic environment ?

This is needed in order to achieve one of the powerful features of the translation, which executes the instructions when all variables have concrete values and thus avoid generating new constraint variables and corresponding formulas. It also handles aliasing, i.e., if we have an instruction like x = y, then we will not generate a new variable for x and add a formula stating the equality, but rather will assign to x the current value of y and avoid generating a new formula.

We will use the syntax for a symbolic environment that is obtained from by setting to , replacing its current value if any.

For simplicity, when we have a simple expression sexp, which can be a variable or a value, then abusing notation we assume that returns sexp itself when it is a finite-field value.

Encoding of expressions

Encoding of an expression, in the context of a symbolic environment , generates a formula that encodes the result of evaluating the expression.

Encoding of expressions does not modify the symbolic environment , it simply uses the constraint variables of and binds the result to a given output variable . Note that all variables that appear in an expression do not correspond to arrays, otherwise the program is ill-typed (array accesses are handled using dedicated instructions).

Next we describe the encodings of the different expressions as they are defined in the core-LLZK language.

Arithmetic

sexp (Identity)

The encoding is the formula

felt.neg sexp (Negation)

The encoding is the formula

felt.add sexp1 sexp2 (Addition)

The encoding is the formula

felt.sub sexp1 sexp2 (Subtraction)

The encoding is the formula

felt.mul sexp1 sexp2 (Multiplication)

The encoding is the formula

felt.div sexp1 sexp2 (Multiplication by modular inverse)

The encoding is the formula

Bitwise

Encoding of bitwise operations heavily relies on the binary expansion of a given constraint variable (it also works when is a finite-field value). This operation is denoted by , i.e., the binary expansion of using bits. We assume that it generates a formula that is a conjunction of the following constraints:

  1. , where are fresh finite-field variables representing the bits of .
  2. to state that the bits can bey either or . The constraint can also be replaced by when range constraints are allowed.

Recall the the finite-field is with respect to a prime that fits in bits.

bit.and sexp1 sexp2 (Bitwise AND)

Let , , and be the encodings corresponding to , , and . Let denote the -th bit of and the -th bit of , produced by the respective calls.

The encoding is

As an optimization, when sexp1 is a constant that fits in bits, the encoding of and can be done with respect to bits instead of bits, and then use the following encoding

This is valid because all bits with are . Here we save variables, which can be important for scalability during the verification phase. We can apply a similar optimization for the case when sexp2 is a constant.

bit.or sexp1 sexp2 (Bitwise OR)

Let , , and be the formulas corresponding to , , and . Let denote the -th bit of and the -th bit of , produced by the respective calls.

The encodingis

bit.xor sexp1 sexp2 (Bitwise XOR)

Let , , and be the formulas corresponding to , , and . Let denote the -th bit of and the -th bit of , produced by the respective calls.

The encoding is

bit.not sexp (Bitwise NOT)

Let and be the formulas corresponding to and . Let denote the -th bit of , produced by the respective call.

The encoding is

bit.shl sexp1 sexp2 (Left shift)

The encoding of left-shift considers two cases: the first handles the case when sexp2 is a value, and the other when it is not. In practice, we rarely find the second case. Next we explain the two cases.

The case when sexp2 is a constant

Let and be the formulas corresponding to and . Let denote the -th bit of , produced by the respective call. Let the value of sexp2 be . The encoding is

The general case

The second case is more elaborated, and is based computing the binary expansion of using bits, i.e., , and then iteratively left-shift by position when the corresponding bit of is 1.

bit.shr sexp1 sexp2 (Right shift)

The encoding of right-shift considers two cases: the first handles the case when sexp2 is a value, and the other when it is not. In practice, we rarely find the second case. Next we explain the two cases.

The case when sexp2 is a constant

Let and be the formulas corresponding to and . Let denote the -th bit of , produced by the respective call. Let the value of sexp2 be . The encoding is

The general case

It is based on the same idea as the general case of bit.shl.

Boolean

We will rely on formulas of the form , interpreted as: if holds then otherwise . Note that when is a bit variable this can be expressed arithmetically as , however, keeping the form may provide important explicit information during the verification process (the one that uses the encoding of the witness program).

Note that Boolean values are simulated using finite-field values, where represents false and any other value is true.

bool.eq sexp1 sexp2 (Equality)

The encoding is

bool.neq sexp1 sexp2 (Inequality)

The encoding is

bool.and sexp1 sexp2 (Logical AND)

The encoding is

bool.or sexp1 sexp2 (Logical OR)

The encoding is

bool.not sexp (Logical NOT)

The encoding is

bool.lt sexp1 sexp2 (Signed less than)

First recall that we deal with signed values. Thus, comparisons interpret field elements as signed integers. The order of the field elements is defined as

where . The idea is that represent negative numbers.

There are two special cases that we consider, which improve the overall performance of the verification process. They arise when sexp1 or sexp2 are constant values. We describe these cases first, followed by the general case when both are variables. Note that the case when both sexp1 and sexp2 are constant is handled when executing the corresponding command (since both are constant, the comparison is simply evaluated).

The case when sexp1 is a constant

Let be the value of sexp1, so we want to encode the signed comparison . The encoding is divided into several cases; in all of them the encoding is

where is:

  • if , then is false, because represents the largest non-negative value.

  • if , then is , because can be any positive value larger than .

  • if , then is , because can be any non-negative number.

  • if , then is , because is negative, so can be any positive or negative value larger than .

The case when sexp2 is a constant

Let be the value of sexp2, so we want to encode the signed comparison . The encoding is divided into several cases; in all of them the encoding is

where is defined separately for each case:

  • if , then is false, because represents the smallest negative value.

  • if , then is , because is negative and thus is a negative number smaller than .

  • if , then is , because must be negative.

  • if , then is , because is positive so can be negative or non-negative smaller than .

The general case

We assume that both sexp1 and sexp2 are not constant values, so we want to encode the signed comparison . Let and be the formulas corresponding to and . Let denote the -th bit of and the -th bit of , produced by the respective calls.

The idea is to compare the bits from the most to the least significant, until we find such that , in which case the comparison is true, otherwise it is false. This can be done using the encoding

where is recursively defined as:

bool.gt sexp1 sexp2 (Signed greater than)

This is done by using the encoding of bool.lt sexp2 sexp1.

bool.le sexp1 sexp2 (Signed less or equal)

It is computed as the negation of bool.lt sexp2 sexp1. Suppose is the encoding of bool.lt sexp2 sexp1 using an auxiliary output variable , then the encoding is

bool.ge sexp1 sexp2 (Signed greater or equal)

This is done by using the encoding of bool.le sexp2 sexp1.

Encoding of commands

Next we describe how commands and lists of commands are encoded. Any encoding of a command receives as input a command and a symbolic environment , and produces where is the corresponding formula, and is a new symbolic environment that results from by modifying the values of some variables (due to the symbolic execution of ).

Executing a list of commands is done recursively as follows:

  • The symbolic execution of an empty list generates .
  • The symbolic execution of is done in two steps. We first execute using and obtain , then recursively execute using and obtain ; the overall encoding is then .

The symbolic execution of a function is supposed to generate a macro that we denote as

where and are sequences of constraint variables obtained from the input and output parameters of function , and is a sequence of local variables used in the formula (i.e., all variables used in that do not appear in or ). We will explain how this encoding is generated later, but for now a brief description suffices since we will rely on it when encoding function calls.

Next we describe the encodings of the different commands as they are defined in the core-LLZK language.

Assignment

The encoding of an assignment id = exp starts by trying to concretely evaluate exp, and if all variables used in exp have constant values in , the evaluation succeeds and results in a value . We then generate , and the encoding is .

If exp cannot be concretely evaluated, we symbolically evaluate exp using and a fresh output variable and obtain the encoding . Then we generate , and the encoding is .

Arrays

Creating a new array

Creating an array is done using the command array.new sexp id.

To symbolically execute this command, we first evaluate sexp to a concrete value that represents the size of the array (the size of an array must be known during symbolic execution). Then we generate a new symbolic array environment such that for all , and set . The encoding is then .

Accessing an array element

Accessing an array element is done using the command array.read id1[sexp] id2, which retrieves the value at position sexp from array id1, and stores it in variable id2.

To symbolically execute this command, we first let , which is the symbolic environment of the array id1. Then we handle two cases separately: the first when the index is constant, and the other when it is not.

The case of a constant index

If evaluates to a constant index , we generate , and the encoding is then .

The case of a non-constant index

If the index evaluates to a variable , we have to consider all possible values for the index. We let be the size of the array, which is supposed to be known during symbolic execution (it is part of the environment ).

Considering all possible values for the index can be done using where:

where is a fresh variable. Note that this simulates an if-then-else to identify which index was accessed.

Next we generate the output symbolic environment , and let the encoding be .

Updating an array element

Updating an array element is done using the command array.write sexp1 id[sexp2], which updates the value at position sexp2 to the value of sexp1.

To symbolically execute this command, we first let , which is the symbolic environment of the array id. Then we handle two cases separately: the first when the index is constant, and the other when it is not.

The case of a constant index

If evaluates to a constant index , we generate , then , and finally the encoding is .

The case of a non-constant index

If evaluates to a variable , we have to consider all values for the index. We let be the size of the array, which is supposed to be known during symbolic execution.

We first generate new fresh variables for all positions of the array, to represent the values after the update. Let us name them n1. Let be a new array environment such that for all .

We denote by a formula that simulates an update to the -th position of the array, i.e., assigns to , and the rest of position keep their old values. This can be modeled as follows:

Then, to consider all possible cases, we can use an if-then-else structure as in the following recursive definition:

The encoding is then .

Copying an array

Copying an array from one variable to another is done using the command array.copy id1 id2. The encoding simply updates the value of id2 (in ) to that of id1. Let , then the encoding is .

Conditionals

A conditional statement is of the form if sexp1==sexp2 { tb } else { te }, where tb and te are sequences of commands. The encoding is done by combining the encodings of tb and te.

Let and be the encodings of tb and te respectively. The encoding starts by creating a new environment that merges and for the variables that are live immediately after the if-statement (we infer live variables using liveness analysis). For each such live variable : if and agree, then ; otherwise we introduce a fresh variable , add to and to , and set . Assuming that at the end of this process we obtain , , and , the encoding is .

As an important optimization, if the condition sexp1==sexp2 can be concretely evaluated to , i.e., all used variables have concrete values, then we can use the encoding of tb or eb depending on .

Bounded Loops

A bounded loop is of the form repeat sexp { body }, and executes body for sexp iterations. Note that the value of sexp must be known statically.

Assume that , i.e., the loop is executed times. The encoding of the loop is computed using the following recursive definition of , which represents the execution of the loop for iterations:

  • is simply since nothing is executed.
  • For , we first compute the encoding of body starting from , which results in ; then we compute with respect to , which results in ; and the value of is .

The encoding of the loop is then defined as the result of .

Function Calls

A function call is of the form call foo(sexp1, ..., sexpn) to id1,...,idm, where sexp1, ..., sexpn are the input parameters and id1,...,idm are the output parameters. Recall that we have assumed that a function is encoded as a macro of the form

where is a sequence of constraint variables corresponding to the formal input parameters of , is a sequence of constraint variables corresponding to the formal output parameters of , and is a sequence of auxiliary variables (those used in that are not in or ).

The function call is encoded as call to the above macro according to the following steps:

  • We generate the actual input variables by concatenating the values of . If any is an array, then all its elements are inserted into .

  • We generate from by inserting a fresh variable for each output variable idi. For an output variable that is of array type, it is assigned an array of fresh variables.

  • We generate the actual output variables by concatenating the values of . If any is an array, then all its elements are inserted into .

  • We generate a sequence of fresh variables of the same length as (these are, in principle, existential variables).

The encoding of the call is then . Note that we keep it as a call to a macro, which is important when translating the formulas into SMT2 format to allow modular verification.

Encoding of functions

A function is of the form

def foo(x1:t, ...,xn:t) -> y1:t, ..., ym:t {
  body
}

and as explained earlier, its encoding generates a corresponding macro according to the following steps:

  • Generate an initial symbolic environment , where each xi is mapped to a fresh variable or an array of fresh variables, depending on its type. Let be the sequence of constraint variables corresponding to x1,...,xn.

  • Symbolically execute body starting from , which results in .

  • Generate from by inserting a fresh variable for each output variable yi. For an output variable of array type, it is assigned an array of fresh variables. Let be a conjunction of equalities of the form (or when yi is an array of size ) for . Let be the sequence of constraint variables corresponding to y1,...,ym taken from .

  • Let be the sequence of all variables used in that are not in or .

The encoding is then:

Encoding of a program

The encoding of a program is done by encoding all its functions, as macros, and adding a top-level formula that simulates a call to the main function.

Implementation

A symbolic execution engine has been implemented in Lean following the ideas described above, and can be found under translator/lean/llzk/Llzk/SymExec.

To compile it, first move to the directory translator/lean/llzk and run lake build. It can then be executed using the following command:

.lake/build/bin/llzk_cli -zk g64 -se -o output.smt2 input.core

This generates the SMT2 encoding of input.core into output.smt2. The g64 parameter selects the prime 18446744069414584321 with 64 bits. For debugging purposes, it can be replaced by f11 to use the prime 11 with 4 bits. Omitting the -o option prints the result to standard output.

The following command pretty-prints the input program:

.lake/build/bin/llzk_cli -zk g64 -pp -o output.smt2 input.core

and is useful for debugging. Full usage information can be obtained with:

.lake/build/bin/llzk_cli --help

The Clustering Tool

This document explains the clustering command-line tool, what it does, and the arguments it accepts.

What the tool does

The clustering binary reads an input circuit file (.r1cs), runs decomposition/clustering logic, and writes a JSON output file with the clustered structure.

Output JSON format

The output has three main parts:

  • timing: performance metrics for each processing stage.
  • nodes: the circuit decomposed into DAG nodes (subcircuits).
  • equivalency_local and equivalency_structural: optional partitions of node IDs into equivalence classes.

Think of nodes as "what the decomposition produced" and equivalency fields as "which produced nodes are considered similar under a specific notion of similarity".

1) timing

timing reports duration (in seconds) for each high-level stage:

2) nodes

nodes is the core structural output. Each entry represents one DAG node (a grouped subcircuit).

Node fields:

  • node_id (usize): Identifier for the node inside this output.

  • constraints (usize[]): IDs of original constraints contained in this node.

  • input_signals (usize[]): Signal IDs consumed by the node from outside that node.

  • output_signals (usize[]): Signal IDs produced by the node for downstream use.

  • signals (usize[]): Signal IDs associated with the node (internal and boundary).

  • is_custom (bool): Marker for custom/special node type in the decomposition pipeline.

  • successors (usize[]): IDs of nodes that depend on this node's outputs.

  • predecessors (usize[]): IDs of nodes this node's inputs depends on.

3) Equivalence fields and notions

Both equivalence fields encode a partition of node IDs into classes.

  • Data shape: Vec<Vec<usize>>
  • Each inner vector is one equivalence class.
  • Node IDs inside one class are equivalent under that mode's criterion.

Example:

  • [[0, 4], [1], [2, 3]] means:
    • Nodes 0 and 4 are equivalent.
    • Node 1 has no partner (singleton class).
    • Nodes 2 and 3 are equivalent.

Local equivalence (equivalency_local)

Subcircuits in the same equivalence class are equivalent up to renaming, scaling, shuffling.

Structural equivalence (equivalency_structural)

Subcircuits in the same equivalence class are locally equivalent as previous, and there exists an automorphism of the circuit hierarchy that preserves local equivalence and maps one to the other.

Basic usage

clustering [OPTIONS] <FILEPATH>

An example of use case is:

./clustering file_name.r1cs -e total -x 50 -o .

Arguments and options

Required argument

  • FILEPATH
    • Type: String
    • Description: input circuit file path.

Optional flags

  • -o <OUT_DIRECTORY>

    • Type: String
    • Default: .
    • Description: output directory where the resulting JSON file is written.
  • -x, --target-size <TARGET_SIZE>

    • Type: f64
    • Description: target cluster size for modularity-based clustering.
  • -e, --equivalence <EQUIVALENCE_MODE>

    • Type: enum
    • Default: structural
    • Accepted values:
      • total: to compute both structural and local equivalence.
      • structural: to compute structural equivalence.
      • local: to compute local equivalence
      • none: to diactivate the computation of any kind of equivalence.

Built-in Clap flags

  • -h, --help: print help.
  • -V, --version: print version.

ffsol Tool

This document explains the usage of the SMT solver ffsol, specialized in finite field arithmetic.

What the tool does

ffsol is a solver designed to process formulas in the SMT-LIB2 (.smt2) format over finite fields. It checks whether a formula is satisfiable (sat), unsatisfiable (unsat), or if the result is unknown (unknown). Additionally, it allows precise configuration of which optimization and deduction subroutines are activated during the solving process. We have adopted the same SMT-LIB2 format for finite fields presented here.

Installation

The repository of ffsol can be downloaded from here. It provides an installation script, install.sh, which automatically installs all the required libraries and dependencies needed to compile and run the solver.

Basic usage

./ffsol -file <FILE_SMT2>

Usage Examples

Basic invocation

./ffsol -file file.smt2

With time limit

./ffsol -file file.smt2 -tlimit 20

Time limit is 20 seconds; default is no limit.

Writing model to file

./ffsol -file file.smt2 -model fileName

Model will be written to file fileName; default is model is not written.

avazar Tool

This document explains the avazar command-line tool, how to install it, what it does, and the arguments it accepts.

How to install the tool

After cloning the repository, initialize and download all submodule dependencies by running: git submodule update --init --recursive to install the submodule dependencies in the project. Next, build the avazar tool in release mode: cargo build --release. Depending on the SMT solver you plan to use, make sure the corresponding executable is installed and available in your $PATH: cvc5, z3, ffsol, yices.

What the tool does

avazar is a general modular verifier for ZK circuits. It reads an R1CS constraint system, optionally a pre-computed decomposition structure, and verifies the circuit node by node using an SMT solver according to the selected mode. It reports which nodes are verified, failed, or unknown.

Basic usage

avazar [OPTIONS] <INPUT>

An example:

avazar ./circuit.r1cs --solver civer --equivalence structural --clustering_size 200

Modes

The tool has three operating modes:

  • determinism (default)

    • If no mode flag is provided, the tool runs in determinism mode by default. It checks if the inputs are fixed, the outputs are unique.
  • --check_correctness <correctness.json>

    • Activates correctness-checking mode.
    • Verifies whether the input R1CS file is consistent with the given SMT-LIB (.smt2) formula. The correctness.json file must contain several fields:
      • input_signals and output_signals, two lists of signals which must match the input/output structure of the R1CS file.
      • signals, listing the auxiliar variables that appear in the SMT formula.
      • constraints, specifying the SMT formula to be checked.
  • --check_equivalence <another_file.r1cs>

    • Activates equivalence check mode.
    • Checks equivalence between the input and the given R1CS files. Both files must have the same input/output structure. Otherwise, they cannot be equivalent.

Arguments and options

Required argument

  • INPUT
    • Type: path (String)
    • Default: ./circuit.circom
    • Description: path to the R1CS constraint system to be verified.

Optional flags

  • --input_structure <PATH>

    • Type: path
    • Description: path to a pre-computed decomposition structure JSON (as produced by the clustering tool). If not given, avazar clusters the circuit internally using --clustering_size.
  • --original_structure <PATH>

    • Type: path
    • Description: original structural metadata of the circuit (E.g., in circom, template-level component info). Used to produce more meaningful error messages. Optional.
  • --timeout <MILLISECONDS>

    • Type: u64
    • Default: 5000
    • Description: timeout in milliseconds for each call to the underlying solver.
  • --solver <SOLVER>

    • Type: string
    • Default: civer
    • Accepted values: civer, ffsol, cvc5, z3, niaz3, yices.
    • Description:
  • --equivalence <MODE>

    • Type: string
    • Default: structural
    • Accepted values: none, local, structural
    • See Equivalence modes below.
  • -p, --prime <N>

    • Type: BigInt
    • Default: 21888242871839275222246405745257275088548364400416034343698204186575808495617 (BN254 scalar field prime)
    • Description: prime field modulus used during verification.
  • -c, --clustering_size <N>

    • Type: usize
    • Default: 200
    • Description: maximum size of nodes considered for internal re-clustering. Nodes larger than this threshold are split further. Set to 0 to disable internal clustering entirely.
  • -t, --target_size <N>

    • Type: usize
    • Default: 0
    • Description: target size for nodes in the internal clustering. Set to 0 to disable (default). Takes effect only when --input_structure is not provided.

Equivalence modes

avazar avoids redundant solver calls by exploiting equivalence between DAG nodes:

  • none: every node is sent to the solver independently, regardless of similarity.

  • local: nodes grouped by local equivalence (same local fingerprint/signature) share a result. If one representative in the class is verified, the others are assumed verified too.

  • structural: like local, but uses stricter structural equivalence that also accounts for the node's context in the DAG. Two nodes must agree in both content and structural position to be grouped. Consequently, two nodes can be local equivalent, but no structural equivalent.

The equivalence classes come either from the optional --input_structure file or are computed internally if that file is not provided.