Merkle Trees: Theory and Intuition

What is a Merkle Tree

A Merkle tree is a hash-based data structure that allows efficient verification of large sets of data. Instead of verifying every individual piece of data, a Merkle tree enables verification by comparing a small number of hashes.

At its core, a Merkle tree is a binary tree where:

  • Leaf nodes contain hashes of data blocks
  • Internal nodes contain hashes of the concatenation of their child hashes
  • The root hash, called the Merkle root, represents the entire dataset

If any underlying data changes, the Merkle root also changes. This property is what makes Merkle trees useful for data integrity and verification.

Why Merkle Trees Exist

Merkle trees are designed to solve three practical problems:

  1. Efficient data integrity verification
  2. Minimal data transfer during verification
  3. Tamper detection in distributed systems

Instead of re-sending or re-hashing all data, systems can exchange only the hashes required to prove correctness.

Where Merkle Trees Are Used

Merkle trees appear in many production systems, including:

  • Blockchains like Bitcoin and Ethereum
  • Distributed file systems
  • Version control systems
  • Peer-to-peer networks
  • Database replication and synchronization

Their design favors verification efficiency over storage simplicity.

How a Merkle Tree Is Built

Step 1: Hash the Data Blocks

Each data block is hashed individually. These hashes form the leaf nodes.

H1 = hash(data1)
H2 = hash(data2)
H3 = hash(data3)
H4 = hash(data4)
use sha2::{Digest, Sha256};

pub trait Hasher {
    fn hash(&self, data: &[u8]) -> Vec<u8>;
    fn hash_pair(&self, left: &[u8], right: &[u8]) -> Vec<u8>;
}

#[derive(Clone, Debug)]
pub struct Sha256Hasher;

impl Hasher for Sha256Hasher {
    fn hash(&self, data: &[u8]) -> Vec<u8> {
        let mut hasher = Sha256::new();
        hasher.update(data);
        hasher.finalize().to_vec()
    }

    fn hash_pair(&self, left: &[u8], right: &[u8]) -> Vec<u8> {
        let mut hasher = Sha256::new();
        hasher.update(left);
        hasher.update(right);
        hasher.finalize().to_vec()
    }
}

Step 2: Build Parent Nodes

Adjacent hashes are concatenated and hashed again to form parent nodes.

P1 = hash(H1 || H2)
P2 = hash(H3 || H4)

Step 3: Compute the Merkle Root

The process repeats until a single hash remains.

Root = hash(P1 || P2)

This root hash uniquely represents the entire dataset.

#[derive(Debug, Clone)]
pub struct MerkleTree<H: Hasher> {
    hasher: H,
    leaves: Vec<Vec<u8>>,
    layers: Vec<Vec<Vec<u8>>>,
}

impl<H: Hasher> MerkleTree<H> {
    pub fn new(hasher: H, data_blocks: &[Vec<u8>]) -> Self {
        let leaves = data_blocks
            .iter()
            .map(|data| hasher.hash(data))
            .collect::<Vec<_>>();

        let layers = Self::build_layers(&hasher, &leaves);

        Self {
            hasher,
            leaves,
            layers,
        }
    }

    fn build_layers(hasher: &H, leaves: &[Vec<u8>]) -> Vec<Vec<Vec<u8>>> {
        let mut layers = Vec::new();
        layers.push(leaves.to_vec());

        let mut current_layer = leaves.to_vec();

        while current_layer.len() > 1 {
            let mut next_layer = Vec::new();

            let mut i = 0;
            while i < current_layer.len() {
                let left = &current_layer[i];
                let right = if i + 1 < current_layer.len() {
                    &current_layer[i + 1]
                } else {
                    &current_layer[i]
                };

                let parent = hasher.hash_pair(left, right);
                next_layer.push(parent);

                i += 2;
            }

            layers.push(next_layer.clone());
            current_layer = next_layer;
        }

        layers
    }

    pub fn root(&self) -> Option<&[u8]> {
        self.layers
            .last()
            .and_then(|layer| layer.first())
            .map(|v| v.as_slice())
    }

    pub fn leaf_count(&self) -> usize {
        self.leaves.len()
    }

    pub fn height(&self) -> usize {
        self.layers.len()
    }
}

Merkle Tree Diagram

Below is a conceptual diagram of a Merkle tree with four data blocks.

                Merkle Root
              hash(P1 || P2)
                     |
          -----------------------
          |                     |
        P1 hash              P2 hash
     hash(H1||H2)        hash(H3||H4)
          |                     |
     -----------           -----------
     |         |           |         |
    H1        H2          H3         H4
 hash(d1)  hash(d2)   hash(d3)   hash(d4)

Odd Number of Leaves

If the number of data blocks is odd, the last hash is typically duplicated before hashing upward. This behavior depends on the implementation and protocol.

This choice affects:

  • Tree shape
  • Proof construction
  • Cross-system compatibility
let right = if i + 1 < current_layer.len() {
  &current_layer[i + 1]
} else {
  &current_layer[i]
};

Merkle Proofs

A Merkle proof allows you to verify that a piece of data exists in the tree without having the full dataset.

A proof consists of:

  • The target leaf hash
  • A list of sibling hashes along the path to the root

By recomputing hashes step by step, the verifier can compare the result with the known Merkle root.

This is not a full Merkle inclusion proof. It assumes the verifier already has the tree and its internal layers.

impl<H: Hasher> MerkleTree<H> {
    /// Checks if a piece of data matches a leaf in the tree
    pub fn verify_leaf_content(&self, data: &[u8], index: usize) -> bool {
        if index >= self.leaves.len() {
            return false;
        }
        let hashed_data = self.hasher.hash(data);
        self.leaves[index] == hashed_data
    }
}

Security Properties

Merkle trees rely on cryptographic hash functions. Their guarantees depend on:

  • Collision resistance
  • Preimage resistance
  • Second-preimage resistance

If the hash function is weak, the Merkle tree inherits those weaknesses. The tree structure itself does not add cryptographic strength.

Some systems distinguish leaf and internal hashes using prefixes

Example Usage

fn main() {
    let data = vec![
        b"tx1".to_vec(),
        b"tx2".to_vec(),
        b"tx3".to_vec(),
        b"tx4".to_vec(),
        b"tx5".to_vec(),
    ];
    
    let tree = MerkleTree::new(Sha256Hasher, &data);
    
    if let Some(root) = tree.root() {
        println!("Merkle root: {:x?}", root);
    }
    
    // Case 1: Correct data and correct index
    let is_valid = tree.verify_leaf_content(b"tx1", 0);
    println!("Is tx1 at index 0? {}", is_valid); // Should print: true

    // Case 2: Wrong data for that index
    let is_valid_wrong_data = tree.verify_leaf_content(b"tx999", 0);
    println!("Is tx999 at index 0? {}", is_valid_wrong_data); // Should print: false

    // Case 3: Correct data, but wrong index
    let is_valid_wrong_idx = tree.verify_leaf_content(b"tx1", 1);
    println!("Is tx1 at index 1? {}", is_valid_wrong_idx); // Should print: false
}

Rust - Merkle Tree