Erasure coding metrics

This document presents a reliability and a redundancy metric that can help tune the erasure coding mechanism to ensure reliability, efficiency and minimize reconstruction failures in the system.

Parameters

Name Type Description
honestPercent uint Fraction of clusters assumed to be honest
totChunks uint Total number of chunks generated (by erasure coding) from the underlying message
reqChunks uint Number of chunks required to reconstruct the underlying message
totClusters uint Total number of clusters in the system

Reliability metric

The reliability metric captures the (inverse of) the failure rate in the erasure coding system defined by the above parameters.

Combinatorics

Let nCr(n, r) denote the combinatorics operator. Assuming honestPercent of the total number of clusters totClusters are honest for propagating reqChunks, we calculate the probability of reconstruction of reqChunks out of totChunks erasure coded system. This is equivalent to selecting totChunks number of clusters from totClusters such that atleast reqChunks of clusters are honest.

Define

  • P(x) : Probability of exactly x clusters out of the selected reqChunks clusters being honest when picked from the total cluster pool of size totClusters
  • P(x+) : Probability of atleast x clusters out of the selected reqChunks clusters being honest when picked from the total cluster pool of size totClusters
  • w : Minimum honest clusters in the total cluster pool which is given by totClusters*honestPercent
  • b : Maximum dishonest clusters in the total cluster pool given by totClusters - w

P(reqChunks+) = P(reqChunks) + P(reqChunks + 1) + ..... + P(totChunks)

P(x) = (Ways of x honest clusters and totChunks - x dishonest chunks)/(Ways of picking totChunks clusters from set) = (nCr(w+x-1, x)*nCr(b+(totChunks-x)-1, totChunks-x))/(nCr(totClusters+totChunks-1, totChunks))

P(reqChunks+) =

Python reference code

from decimal import *
def fac(n):
    pdt = 1
    # iteration to avoid reaching max recursion depth
    for k in range(1, n+1):
        pdt = pdt * k
    return pdt
def choose(n, k):
    return fac(n) / fac(k) / fac(n-k)
def probOne(w, n, k, t):
    b = n - w
    return Decimal(choose(w+k-1, k)*choose(b+t-k-1, t-k))/choose(n+t-1, t)
def probAll(w, n, t, ht):
    return sum([probOne(w, n, i, t) for i  in range(ht, t+1)])

reliability = probAll(w, totClusters, totChunks, reqChunks)
print reliability

From experiements with the above script, the redundency ratio as defined below comes out to be around 2 if the honestPercentage is 66%.

Redundancy metric

The redundancy metric captures the bandwidth multiple that nodes spend in aggregate in the erasure coding system defined by the above parameters.

This is simply given by totChunks/reqChunks assuming that in the erasure coding system, the total required size of chunks is pretty close to the original message size.