What is Erasure Coding

By Bill Jones, Enterprise Architect

One of the newest features to arrive in large-scale storage is erasure coding. What catches people off-guard is that erasure coding protects against data loss. “Erasure” anything as “data protection” makes for an unexpected combination of words.

So, I’d like to take a few minutes to tell you about how we got here and why.

Why Isn’t Parity Enough?

Storage and server admins are often familiar with the word “parity.” RAID arrays can use parity to protect against a drive failure (or two). As an admin, I had known those details for a few years before I learned what parity is. Parity is a mathematical calculation that compares bits. With odd parity, the total number of ones must be odd; with even parity, the total number of ones must be even.

So, as an example, let’s use even parity to protect an 8-bit piece of information – 01101001. When we combine the data and the parity bit, we want an even number of 1s. The data contains four 1s. Four is an even number. Therefore the parity bit is 0.

If someone were to erase one of the bits from the sequence 01101001 and if we had access to the parity bit value 0, we can calculate the value of the missing bit. If the bit that is erased is a 1, then there will be an odd number of ones. The parity bit of 0 tells us that the number of 1s should be even. So, the missing bit is a 1. If the number of remaining bits is even, then the missing bit must be 0. If the parity bit is missing, we just recalculate it.

In storage systems, we calculate parity across “blocks.” These blocks are arranged in a stripe. Each block will reside on a different physical disk. The size of the block will vary from manufacturer to manufacturer. But, the example above of 8 data bits and 1 parity bit is analogous to a RAID5 (8+1) array. The 8+1 indicates that each stripe contains 8 data blocks and 1 parity block. The parity calculation is done bit-by-bit across the stripe. By that, I mean that the first bit of each data block in the stripe is used to calculate the first bit in the parity block; the second bit in each data block is used to calculate the second bit in the parity block; and so on, and so on, until all the bits in the block are protected.

One of the really cool things about this method is that the even parity calculation is a bitwise exclusive OR (a.k.a. XOR) operation, and bitwise operations like AND, OR, and XOR are pretty fast, and that helps to keep rebuild times relatively low.

There are two problems with this method. First, a single parity block per stripe can only protect against failure of a single drive. Using our example above of 0110110 with parity of 0, if we lose two data bits and have an even number of ones remaining, we cannot be certain whether we lost two 1s or two 0s. If we have an odd number of 1s, did we lose a 0 and a 1 or a 1 and a 0? Second, as drives get larger and larger, it takes longer and longer to rebuild data when a drive fails. And, if we haven’t finished rebuilding before a second drive fails, we have the same problem we would have from losing two drives at once.

To address this, storage devices starting using dual-drive protection methods. The most commonly known is RAID6; however, there are multiple methods available, and the exact method will vary based on manufacturer. For example, if the first parity block is calculated using a traditional horizontal parity calculation, and the second parity block is calculated using a diagonal pattern (within the stripe), this gives us two independent parity calculations, and we can now recover from a dual drive failure. With dual-drive protection, we can reduce the risk of a second drive failure occurring while the first drive is rebuilding.

Unfortunately, modern mechanical hard drives are large enough that rebuilds can take multiple days, and dual drive protection methods cannot protect against a triple drive failure. (And I am one of those admins who has experienced a triple drive failure on a dual drive protected array.)

Okay, So Why Not Use a Third Parity Block?

The problem with coming up with new patterns for protecting parity is… that it gets complicated. For the calculations to work, we must ensure that each parity calculation is mathematically independent of the others. This is difficult enough when doing it with two parity patterns. But, with each additional parity pattern, careful analysis must be done to ensure that this independence exists and that it exists for every supported RAID group geometry. For example, a diagonal pattern is relatively easy to envision with 2ⁿ data drives within each RAID group, but it is less easy to visualize when there are an odd number of drives. Add in the complexity of multiple, non-horizontal parity calculations, and it quickly becomes a solution set that makes people’s heads swim.

If data protection calculations are complicated with 2 parity blocks and really complicated with 3 parity blocks, what happens when hard drives get large enough that we need 4 or 5 parity blocks? Erasure coding provides a more scalable method for protecting against larger and larger numbers of drive failures.

Wait, What?

I know this is a lot to take in. I was a mathlete in school. I have two patents in video encryption technology. I do math for fun. And it still took a lot of work for me to wrap my head around all of this. So, if you’re feeling lost, the brief summary of everything so far is…

Parity is a mathematical calculation.
Parity uses XOR, which is fast!
Parity can protect against a single drive failure with a modest amount of complexity.
Parity can protect against a double drive failure with a higher degree of complexity.
Using parity to protect against a triple drive failure is really complicated. Let’s not do that.

Please hang in there. We’re almost finished.

So, Erasure Coding Does What?

Erasure coding is a mathematical calculation for protecting data. It uses a coding method to protect data from unwanted “erasures” – like hard drive failures. And that’s where it gets its name.

Erasure coding works like this:

Divide the data into m segments
Use a system of independent equations to calculate n data protection segments

It’s two steps. That’s pretty simple. If we want to protect against the “erasure” of 3 pieces of data, we need 3 independent equations. If we want to protect against 4, we need 4 equations. If we want to have better storage efficiency, we use a higher ratio of data segments to data protection segments. Erasure coding with a 9+3 algorithm data protection consumes 25% of the raw capacity. With an erasure coding algorithm of 12+3, the data protection only consumes 20%.

What makes erasure coding so powerful is that it gives us a ready method for scaling our data protection levels as hard drives continue to grow.

What Is a System of Independent Equations?

In case you’re wondering, a system of equations is a group of equations which share the same variables. The equations are independent each equation cannot be created from the other equations in the solution set. For example, these two equations are not independent:

X + Y = 10

2X + 2Y = 20

The second equation can be created by multiplying each side of the first equation by two. Every value of X and Y that solves the first equation will also solve the second equation.

These two equations are independent:

X + 2Y = 9

2X + Y = 0

Because they are independent, there cannot be more than one set of values for X and Y that solves both equations. In this case, the only valid solution is (X = -3) and (Y = 6).

Is There Anything Else I Should Know?

Yes. Depending on which solution you go with, the erasure coding may occur on a per file (or per object) basis or may occur as part of the RAID implementation. A given workload or data type may be better suited to one of these methods than it is to another method. But, that topic is big enough that it probably deserves to be in its own blog post.

In Conclusion…

To sum everything up (no pun intended), erasure coding and parity are both the names of mathematical algorithms which are designed to recover data that is erased or lost. As hard drives get bigger, the rebuild times have grown from hours to days, and bigger hard drives are on the horizon. Parity-based solutions are relatively fast, but scaling has proven difficult. Erasure coding gives us a ready model to scale beyond double-drive failures.

If you would like to learn more about enterprise storage solutions, SAN storage, object storage, etc., please contact [email protected] to setup an appointment.

Postscript: Please Show Me the Math

As I said above, I do math for fun. So, in the hopes that somebody wants to see the math, here you go…

Let’s look at Erasure Coding 9+3 – that is 9 data segments and 3 data protection segments. To make the math easier to follow, we’ll use integers instead of 4KB data blocks. Integers are a lot easier to type, and the principles are largely the same. Also, we’ll solve these equations using algebraic substitutions. Computers would probably use multi-dimensional matrix-based math, and while matrices can make computation faster, they make explaining what is going on much more difficult.

For our 9 data values, we’ll use

A=1; B=3; C=5; D=7; E=2; F=4; G=6; H=8; & I=0

For our system of n independent equations, we’ll use:

P₁ = A+B+C+D+E+F+G+H+I

P₂ = A+2B+3C+4D+5E+6F+7G+8H+9I

P₃ = A+4B+9C+16D+25E+36F+49G+64H+81I

So, for our data:

P₁ = 36

P₂ = 190

P₃ = 1170

Now, what happens when the data for A, B, and C is “erased”? If we recalculate P_1,P_2,& P_3,we get:

P₁ = A+B+C+27

P₂ = A+2B+3C+168

P₃ = A+3B+9C+1112

Substituting our previous calculated values for P_1,P_2,& P_3,we get:

36 = A+B+C+27

190 = A+2B+3C+168

1170 = A+4B+9C+1112

Subtracting the integers from each side, we get:

9 = A+B+C

22 = A+2B+3C

58 = A+4B+9C

If we subtract (9=A+B+C) from the bottom two equations, we get:

13 = B+2C

49 = 3B+8C

If we multiply (13=B+2C) by 3 and then subtract it from the bottom equation, we get:

10 = 2C

If we then divide both sides by 2, we get:

C = 5

Having solved for C, we can then substitute in 5 for C and get:

9 = A+B+5

22 = A+2B+15

Subtracting the integers from each side, we get:

4 = A+B

7 = A+2B

If we subtract (4=A+B) from the bottom equations, we get:

3 = B

Having solved for B, we can then substitute in 3 for B and get:

4 = A+3

Subtracting the integers from each side, we get:

1 = A

The math would be different depending on which pieces of data were lost. The point of the exercise above is to show that we can recover from the “erasure” of three pieces of data using an erasure coding of 9+3. The equations we use to calculate the data protection values do not need to be the ones I supplied here. They can be any set of mathematically independent equations.