Poseidon Hash Execution Flow and Code Analysis

ZKM

Jan 16, 2025

1. Sponge Structure
2. Round Function (Constant Addition, Nonlinear Transformation, Linear Transformation)
- 2.1 Full Rounds Transformation
- 2.2 Partial Rounds Transformation
- 2.3 Permutation Processy
3. Poseidon's Absorption, Permutation, Compression
4. Goldilocks Field & Optimization of Poseidon in Plonky2
- 4.1 MDS Operation Process of Poseidon
- 4. Operation Process of Poseidon on the Goldilocks Field

1. Sponge Construction

Poseidon hash function uses the following sponge structure for its design, where \( P \) is the permutation function of each round of Poseidon, and \( I \) is the initial state. Assume the initial state \( I \) is a zero vector, and a message \( m = m_1 || m_2 || m_3 || m_4 \) is given, with the output being \( h = h_1 || h_2 \). Each \( m_i \) and \( h_i \) are of length \( r \). The computation follows this pattern:

\( I \) is first set to \( m_1 \) and passed through the permutation \( P \), producing the first round output of length \( r \), which is then XORed with \( m_2 \) to obtain the next round result.
This continues until \( m_4 \) is processed, at which point the first \( r \) bits of the result, \( h_1 \), are output. \( h_1 \) is then passed through one more permutation to obtain \( h_2 \).
The final Poseidon hash output is \( h = h_1 || h_2 \).

2. Round Functions (Add Constant, Nonlinear Transformation, Linear Transformation)

The following diagram illustrates the basic process of the permutation function in Poseidon hash, which includes \( R_f \) full rounds and \( R_p \) partial rounds. In the \( R_f \) rounds, all states undergo the S-box transformation, while in the \( R_p \) rounds, only one S-box transformation is applied. The terminology used in the diagram is as follows:

AddRoundConstants, denoted by \( ARC(\cdot) \): Linear layer, where a predefined constant is added to each codeword.
SubWords, denoted by \( S\text{-}box(\cdot) \) or \( SB(\cdot) \): Nonlinear layer, where each codeword is raised to a power, typically \( X^3 \) or \( X^5 \), with \( X \) being the codeword. Plonky2 uses \( X^7 \).
MixLayer, denoted by \( M(\cdot) \): Linear layer, where multiplication with a predefined matrix is performed.

2.1 Full Rounds Transformation

#[inline]
    fn full_rounds(state: &mut [Self; SPONGE_WIDTH], round_ctr: &mut usize) {
        for _ in 0..HALF_N_FULL_ROUNDS { // Apply constant layer, S-box, and MDS
            Self::constant_layer(state, *round_ctr);  // Add constant, linear layer
            Self::sbox_layer(state); // X^3 or X^5, nonlinear layer
            *state = Self::mds_layer(state); // Matrix multiplication, linear layer
            *round_ctr += 1;
        }
    }
    // plonky2/src/hash/poseidon.rs

The code above shows the transformation corresponding to the full rounds of Poseidon hash.

2.2 Partial Rounds Transformation

fn partial_rounds(state: &mut [Self; SPONGE_WIDTH], round_ctr: &mut usize) {
        Self::partial_first_constant_layer(state);
        *state = Self::mds_partial_layer_init(state);

        for i in 0..N_PARTIAL_ROUNDS {
            state[0] = Self::sbox_monomial(state[0]);
            unsafe {
 state[0] = state[0].add_canonical_u64(Self::FAST_PARTIAL_ROUND_CONSTANTS[i]);
            }
            *state = Self::mds_partial_layer_fast(state, i);
        }
        *round_ctr += N_PARTIAL_ROUNDS;
    }
// plonky2/src/hash/poseidon.rs

The code above shows the transformation corresponding to the partial rounds of Poseidon hash.

2.3 Full Permutation Process

#[inline] // Full process of Poseidon hash permutation
    fn poseidon(input: [Self; SPONGE_WIDTH]) -> [Self; SPONGE_WIDTH] {
        let mut state = input;
        let mut round_ctr = 0;

        Self::full_rounds(&mut state, &mut round_ctr);
        Self::partial_rounds(&mut state, &mut round_ctr);
        Self::full_rounds(&mut state, &mut round_ctr);
        debug_assert_eq!(round_ctr, N_ROUNDS);

        state
    }
// plonky2/src/hash/poseidon.rs

The code above shows the complete process of one round of permutation in Poseidon hash.

3. Poseidon’s Absorption, Permutation, and Compression

The complete structure of Poseidon Hash is as follows:

Absorption Phase: The hash_n_to_hash_no_pad function is responsible for reading input data, XORing it with the internal state, and then executing the permutation.

fn hash_no_pad(input: &[F]) -> Self::Hash {
        hash_n_to_hash_no_pad::<F, Self::Permutation>(input)
    }
// plonky2/src/hash/poseidon.rs

Permutation Phase: PoseidonPermutation corresponds to P, implementing each round of the permutation, covering both full rounds and partial rounds.

pub struct PoseidonPermutation<T> {
    state: [T; SPONGE_WIDTH],
}
// plonky2/src/hash/poseidon.rs

Compression Phase: The compression phase extracts the hash result from the internal state.

/// A one-way compression function which takes two ~256 bit inputs and returns a ~256 bit output.pub fn compress<F: Field, P: PlonkyPermutation<F>>(x: HashOut<F>, y: HashOut<F>) -> HashOut<F> {
    // With some refactoring, this function could be implemented as
    // hash_n_to_m_no_pad(chain(x.elements, y.elements), NUM_HASH_OUT_ELTS).

    debug_assert_eq!(x.elements.len(), NUM_HASH_OUT_ELTS);
    debug_assert_eq!(y.elements.len(), NUM_HASH_OUT_ELTS);
    debug_assert!(P::RATE >= NUM_HASH_OUT_ELTS);

    let mut perm = P::new(core::iter::repeat(F::ZERO));
    perm.set_from_slice(&x.elements, 0);
    perm.set_from_slice(&y.elements, NUM_HASH_OUT_ELTS);

    perm.permute();

    HashOut {
        elements: perm.squeeze()[..NUM_HASH_OUT_ELTS].try_into().unwrap(),
    }
}
// plonky2/src/hash/hashing.rs

‍

pub fn hash_n_to_hash_no_pad<F: RichField, P: PlonkyPermutation<F>>(inputs: &[F]) -> HashOut<F> {
    HashOut::from_vec(hash_n_to_m_no_pad::<F, P>(inputs, NUM_HASH_OUT_ELTS))
}
// plonky2/src/hash/hashing.rs

The compress and hash_n_to_hash_no_pad functions implement the specific absorption and compression phases. hash_n_to_hash_no_pad calls PoseidonPermutation to permute the internal state and compress the data at the appropriate points.

Special case: The two_to_one function implements a Merkle tree-like hash, merging two hash values (i.e., two leaf nodes) to generate a new hash value, which is mainly used for constructing Merkle Hash calls in FRI using Poseidon Hash.

fn two_to_one(left: Self::Hash, right: Self::Hash) -> Self::Hash {
        compress::<F, Self::Permutation>(left, right)
    }
    // plonky2/src/hash/poseidon.rs

4. Goldilocks Field & Optimizations of Poseidon in Plonky2

In Plonky2, the Goldilocks field is typically chosen as the finite field \( \mathbb{F}_p \), where \( p = 2^{64} - 2^{32} + 1 \). This field allows any witness/codeword to be represented as a 64-bit number. It has the following key advantages:

Modern processors handle 64-bit numbers very efficiently, as they align perfectly with traditional bit string operations for arithmetic computations, such as multiplying or dividing by 2, which corresponds to bit shifting.
When implementing arithmetic circuits, since all operations in the Goldilocks field are arithmetic, there is no need for simulation. Compared to traditional binary-based SHA256 hash functions, this type of hash function has smaller circuit sizes and fewer constraints.

In Plonky2, the matrix used in the MDS layer has the first row as ([1; 1; 2; 1; 8; 32; 2; 256; 4096; 8; 65536; 1024]), where each element is a power of 2. This allows for the equivalence between arithmetic operations like multiplication and bit shifts, significantly improving computational efficiency. Moreover, the elements of the matrix are relatively small.

4.1 Poseidon’s MDS Operation

fn mds_layer(state: &[Self; 12]) -> [Self; 12] {
    let mut result = [GoldilocksField::ZERO; 12];

    // Using the linearity of the operations we can split the state into a low||high decomposition
    // and operate on each with no overflow and then combine/reduce the result to a field element.
    let mut state_l = [0u64; 12];
    let mut state_h = [0u64; 12];

    for r in 0..12 {
        let s = state[r].0;
        state_h[r] = s >> 32;
        state_l[r] = (s as u32) as u64;
    }

    // Use optimized FFT for calculations
    let state_h = poseidon12_mds::mds_multiply_freq(state_h);
    let state_l = poseidon12_mds::mds_multiply_freq(state_l);

    for r in 0..12 {
        let s = state_l[r] as u128 + ((state_h[r] as u128) << 32);
      // from_noncanonical_u96 converts the 96-bit input (represented as a 64-bit low part and a 32-bit high part)
        // into an element of the Goldilocks field.
        result[r] = GoldilocksField::from_noncanonical_u96((s as u64, (s >> 64) as u32));
    }

    // Add the first element with the only non-zero diagonal matrix coefficient.
    let s = Self::MDS_MATRIX_DIAG[0] as u128 * (state[0].0 as u128);
    result[0] += GoldilocksField::from_noncanonical_u96((s as u64, (s >> 64) as u32));

    result
}
// plonky2/src/hash/poseidon_goldilocks.rs

4.2 Poseidon’s Operation in the Goldilocks Field

1. Initializing the Result Array

let mut result = [GoldilocksField::ZERO; 12];

Here, a result array of length 12 is initialized to store the final MDS transformation results. The initial value is set to GoldilocksField::ZERO.

2. Separating the High and Low Parts of the State Vector

let mut state_l = [0u64; 12];
let mut state_h = [0u64; 12];

for r in 0..12 {
    let s = state[r].0;
    state_h[r] = s >> 32;
    state_l[r] = (s as u32) as u64;
}

‍

The state_l and state_h arrays store the low and high parts of each element of the state vector. • Through bit manipulation, state_h holds the top 32 bits, while state_l holds the lower 32 bits.

3. Accelerating the MDS Algorithm with FFT

let state_h = poseidon12_mds::mds_multiply_freq(state_h);
let state_l = poseidon12_mds::mds_multiply_freq(state_l);

‍

The function poseidon12_mds::mds_multiply_freq applies FFT-based MDS matrix multiplication to both state_h and state_l, accelerating the matrix multiplication operation.

4. Merging the High and Low Parts of the Results

for r in 0..12 {
    let s = state_l[r] as u128 + ((state_h[r] as u128) << 32);
    result[r] = GoldilocksField::from_noncanonical_u96((s as u64, (s >> 64) as u32));
}

‍

The high and low parts are recombined into a 96-bit integer. After left-shifting state_h by 32 bits and adding state_l, the full 96-bit number is obtained. • The function GoldilocksField::from_noncanonical_u96 converts this 96-bit integer into an element of the Goldilocks field and stores it in the result array.

5. Adding the Non-Zero Diagonal Element

let s = Self::MDS_MATRIX_DIAG[0] as u128 * (state[0].0 as u128);
result[0] += GoldilocksField::from_noncanonical_u96((s as u64, (s >> 64) as u32))

‍

Finally, to complete the MDS transformation, the function multiplies MDS_MATRIX_DIAG[0] by state[0] and adds the result to result[0].

Subscribe to the ZKM Blog to stay updated with the latest research by ZKM. If you have any questions, you can contact the ZKM team on discord: discord.com/zkm

The DEEP-FRI Protocol

The FRI protocol is an interactive proof system designed to prove the low degree (degree constraint) of a polynomial. The Prover commits to a polynomial form, and the Verifier ensures that the polynomial is of low degree by sampling several points from it, performing multiple random linear combinations, and polynomial folding. This interactive proof system is particularly important in zero-knowledge proofs, as it allows the verification of the polynomial’s properties without revealing the polynomial itself.

ZKM ECP Contributor Board — November Wrap-up

November began with an exciting development at ZKM — we launched Cohort 2 of our Early Contributor Program (ECP): Community Evolution

1. Sponge Structure
2. Round Function (Constant Addition, Nonlinear Transformation, Linear Transformation)
- 2.1 Full Rounds Transformation
- 2.2 Partial Rounds Transformation
- 2.3 Permutation Processy
3. Poseidon's Absorption, Permutation, Compression
4. Goldilocks Field & Optimization of Poseidon in Plonky2
- 4.1 MDS Operation Process of Poseidon
- 4. Operation Process of Poseidon on the Goldilocks Field

1. Sponge Construction

\( I \) is first set to \( m_1 \) and passed through the permutation \( P \), producing the first round output of length \( r \), which is then XORed with \( m_2 \) to obtain the next round result.
This continues until \( m_4 \) is processed, at which point the first \( r \) bits of the result, \( h_1 \), are output. \( h_1 \) is then passed through one more permutation to obtain \( h_2 \).
The final Poseidon hash output is \( h = h_1 || h_2 \).

2. Round Functions (Add Constant, Nonlinear Transformation, Linear Transformation)

AddRoundConstants, denoted by \( ARC(\cdot) \): Linear layer, where a predefined constant is added to each codeword.
SubWords, denoted by \( S\text{-}box(\cdot) \) or \( SB(\cdot) \): Nonlinear layer, where each codeword is raised to a power, typically \( X^3 \) or \( X^5 \), with \( X \) being the codeword. Plonky2 uses \( X^7 \).
MixLayer, denoted by \( M(\cdot) \): Linear layer, where multiplication with a predefined matrix is performed.

2.1 Full Rounds Transformation

#[inline]
    fn full_rounds(state: &mut [Self; SPONGE_WIDTH], round_ctr: &mut usize) {
        for _ in 0..HALF_N_FULL_ROUNDS { // Apply constant layer, S-box, and MDS
            Self::constant_layer(state, *round_ctr);  // Add constant, linear layer
            Self::sbox_layer(state); // X^3 or X^5, nonlinear layer
            *state = Self::mds_layer(state); // Matrix multiplication, linear layer
            *round_ctr += 1;
        }
    }
    // plonky2/src/hash/poseidon.rs

The code above shows the transformation corresponding to the full rounds of Poseidon hash.

2.2 Partial Rounds Transformation

fn partial_rounds(state: &mut [Self; SPONGE_WIDTH], round_ctr: &mut usize) {
        Self::partial_first_constant_layer(state);
        *state = Self::mds_partial_layer_init(state);

        for i in 0..N_PARTIAL_ROUNDS {
            state[0] = Self::sbox_monomial(state[0]);
            unsafe {
 state[0] = state[0].add_canonical_u64(Self::FAST_PARTIAL_ROUND_CONSTANTS[i]);
            }
            *state = Self::mds_partial_layer_fast(state, i);
        }
        *round_ctr += N_PARTIAL_ROUNDS;
    }
// plonky2/src/hash/poseidon.rs

The code above shows the transformation corresponding to the partial rounds of Poseidon hash.

2.3 Full Permutation Process

#[inline] // Full process of Poseidon hash permutation
    fn poseidon(input: [Self; SPONGE_WIDTH]) -> [Self; SPONGE_WIDTH] {
        let mut state = input;
        let mut round_ctr = 0;

        Self::full_rounds(&mut state, &mut round_ctr);
        Self::partial_rounds(&mut state, &mut round_ctr);
        Self::full_rounds(&mut state, &mut round_ctr);
        debug_assert_eq!(round_ctr, N_ROUNDS);

        state
    }
// plonky2/src/hash/poseidon.rs

The code above shows the complete process of one round of permutation in Poseidon hash.

3. Poseidon’s Absorption, Permutation, and Compression

The complete structure of Poseidon Hash is as follows:

Absorption Phase: The hash_n_to_hash_no_pad function is responsible for reading input data, XORing it with the internal state, and then executing the permutation.

fn hash_no_pad(input: &[F]) -> Self::Hash {
        hash_n_to_hash_no_pad::<F, Self::Permutation>(input)
    }
// plonky2/src/hash/poseidon.rs

Permutation Phase: PoseidonPermutation corresponds to P, implementing each round of the permutation, covering both full rounds and partial rounds.

pub struct PoseidonPermutation<T> {
    state: [T; SPONGE_WIDTH],
}
// plonky2/src/hash/poseidon.rs

Compression Phase: The compression phase extracts the hash result from the internal state.

/// A one-way compression function which takes two ~256 bit inputs and returns a ~256 bit output.pub fn compress<F: Field, P: PlonkyPermutation<F>>(x: HashOut<F>, y: HashOut<F>) -> HashOut<F> {
    // With some refactoring, this function could be implemented as
    // hash_n_to_m_no_pad(chain(x.elements, y.elements), NUM_HASH_OUT_ELTS).

    debug_assert_eq!(x.elements.len(), NUM_HASH_OUT_ELTS);
    debug_assert_eq!(y.elements.len(), NUM_HASH_OUT_ELTS);
    debug_assert!(P::RATE >= NUM_HASH_OUT_ELTS);

    let mut perm = P::new(core::iter::repeat(F::ZERO));
    perm.set_from_slice(&x.elements, 0);
    perm.set_from_slice(&y.elements, NUM_HASH_OUT_ELTS);

    perm.permute();

    HashOut {
        elements: perm.squeeze()[..NUM_HASH_OUT_ELTS].try_into().unwrap(),
    }
}
// plonky2/src/hash/hashing.rs

‍

pub fn hash_n_to_hash_no_pad<F: RichField, P: PlonkyPermutation<F>>(inputs: &[F]) -> HashOut<F> {
    HashOut::from_vec(hash_n_to_m_no_pad::<F, P>(inputs, NUM_HASH_OUT_ELTS))
}
// plonky2/src/hash/hashing.rs

fn two_to_one(left: Self::Hash, right: Self::Hash) -> Self::Hash {
        compress::<F, Self::Permutation>(left, right)
    }
    // plonky2/src/hash/poseidon.rs

4. Goldilocks Field & Optimizations of Poseidon in Plonky2

Modern processors handle 64-bit numbers very efficiently, as they align perfectly with traditional bit string operations for arithmetic computations, such as multiplying or dividing by 2, which corresponds to bit shifting.
When implementing arithmetic circuits, since all operations in the Goldilocks field are arithmetic, there is no need for simulation. Compared to traditional binary-based SHA256 hash functions, this type of hash function has smaller circuit sizes and fewer constraints.

4.1 Poseidon’s MDS Operation

fn mds_layer(state: &[Self; 12]) -> [Self; 12] {
    let mut result = [GoldilocksField::ZERO; 12];

    // Using the linearity of the operations we can split the state into a low||high decomposition
    // and operate on each with no overflow and then combine/reduce the result to a field element.
    let mut state_l = [0u64; 12];
    let mut state_h = [0u64; 12];

    for r in 0..12 {
        let s = state[r].0;
        state_h[r] = s >> 32;
        state_l[r] = (s as u32) as u64;
    }

    // Use optimized FFT for calculations
    let state_h = poseidon12_mds::mds_multiply_freq(state_h);
    let state_l = poseidon12_mds::mds_multiply_freq(state_l);

    for r in 0..12 {
        let s = state_l[r] as u128 + ((state_h[r] as u128) << 32);
      // from_noncanonical_u96 converts the 96-bit input (represented as a 64-bit low part and a 32-bit high part)
        // into an element of the Goldilocks field.
        result[r] = GoldilocksField::from_noncanonical_u96((s as u64, (s >> 64) as u32));
    }

    // Add the first element with the only non-zero diagonal matrix coefficient.
    let s = Self::MDS_MATRIX_DIAG[0] as u128 * (state[0].0 as u128);
    result[0] += GoldilocksField::from_noncanonical_u96((s as u64, (s >> 64) as u32));

    result
}
// plonky2/src/hash/poseidon_goldilocks.rs

4.2 Poseidon’s Operation in the Goldilocks Field

1. Initializing the Result Array

let mut result = [GoldilocksField::ZERO; 12];

Here, a result array of length 12 is initialized to store the final MDS transformation results. The initial value is set to GoldilocksField::ZERO.

2. Separating the High and Low Parts of the State Vector

let mut state_l = [0u64; 12];
let mut state_h = [0u64; 12];

for r in 0..12 {
    let s = state[r].0;
    state_h[r] = s >> 32;
    state_l[r] = (s as u32) as u64;
}

‍

3. Accelerating the MDS Algorithm with FFT

let state_h = poseidon12_mds::mds_multiply_freq(state_h);
let state_l = poseidon12_mds::mds_multiply_freq(state_l);

‍

The function poseidon12_mds::mds_multiply_freq applies FFT-based MDS matrix multiplication to both state_h and state_l, accelerating the matrix multiplication operation.

4. Merging the High and Low Parts of the Results

for r in 0..12 {
    let s = state_l[r] as u128 + ((state_h[r] as u128) << 32);
    result[r] = GoldilocksField::from_noncanonical_u96((s as u64, (s >> 64) as u32));
}

‍

5. Adding the Non-Zero Diagonal Element

let s = Self::MDS_MATRIX_DIAG[0] as u128 * (state[0].0 as u128);
result[0] += GoldilocksField::from_noncanonical_u96((s as u64, (s >> 64) as u32))

‍

Finally, to complete the MDS transformation, the function multiplies MDS_MATRIX_DIAG[0] by state[0] and adds the result to result[0].

Subscribe to the ZKM Blog to stay updated with the latest research by ZKM. If you have any questions, you can contact the ZKM team on discord: discord.com/zkm

Contents

1. Sponge Construction

2. Round Functions (Add Constant, Nonlinear Transformation, Linear Transformation)

2.1 Full Rounds Transformation

2.2 Partial Rounds Transformation

2.3 Full Permutation Process

3. Poseidon’s Absorption, Permutation, and Compression

4. Goldilocks Field & Optimizations of Poseidon in Plonky2

4.1 Poseidon’s MDS Operation

4.2 Poseidon’s Operation in the Goldilocks Field

3. Accelerating the MDS Algorithm with FFT

4. Merging the High and Low Parts of the Results

5. Adding the Non-Zero Diagonal Element

H1

h2

h3

H4

H5

What’s a Rich Text element?

Static and dynamic content editing

How to customize formatting for each rich text

Headings, paragraphs,

Contents

1. Sponge Construction

2. Round Functions (Add Constant, Nonlinear Transformation, Linear Transformation)

2.1 Full Rounds Transformation

2.2 Partial Rounds Transformation

2.3 Full Permutation Process

3. Poseidon’s Absorption, Permutation, and Compression

4. Goldilocks Field & Optimizations of Poseidon in Plonky2

4.1 Poseidon’s MDS Operation

4.2 Poseidon’s Operation in the Goldilocks Field

3. Accelerating the MDS Algorithm with FFT

4. Merging the High and Low Parts of the Results

5. Adding the Non-Zero Diagonal Element