Saturday, December 29, 2012

OS that supports ZFS

The next step for me is to look for OS that supports ZFS. Here are a list some of OS that supports ZFS at the time of writing this post:
  • Solaris 10
  • Solaris 11
  • OpenSolaris
  • OpenIndiana
  • OS X
  • NetBSD
  • FreeBSD
  • FreeNAS
  • NAS4Free
  • Linux
Please note that different OS at their different release version supports different ZFS version.

Just by looking at the list above, we can immediately identify 2 OS that might be useful to us, FreeNAS and NAS4Free. The striking similarity from their name is because they once share the same code under the name FreeNAS, but now they are being developed separately. Both of them are based on FreeBSD. Shall post more about them soon.

Thursday, December 27, 2012

Data Redundancy - Some available Software RAID

Here are some available software RAID that we can use to build our NAS. I will briefly talk about it. My choice of software RAID is ZFS.

Windows' Storage Spaces
Storage Spaces supports creation of 3 types of RAID, striping (RAID 0), mirror (RAID 1), and parity (RAID 5). It has thin provisioning which allows us to declare a storage space larger than our physical storage, and will notify us when we are out of physical storage, then all we need to do is to slot in additional harddisk into our system and add it to the storage pool.

Linux MD
It supports RAID 0, RAID 1, RAID 4, RAID 5 and RAID 6. All Linux distro, i.e., ubuntu, redhat, all supports linux MD. The advantage here over Windows' Storage Spaces is that it has RAID 6, and Linux in general is more stable than Windows.

unRaid
Basically it is implemented as a custom Linux MD driver. It uses a custom RAID similar to RAID 4 (RAID 4 defers from RAID 5 only that all parity is stored in a drive instead of distributed). Generally it gives drobo like features, but the free version only supports up to 3 harddisk.

ZFS
ZFS is a combined file system and logical volume manager that includes features such as data integrity, continuous integrity checking and automatic repair, RAID 0, RAID 1, RAID-Z (RAID 5), RAID-Z2 (RAID 6), RAID-Z3 (allow up to 3 harddisk failure). It is open source so there are many free OS that supports it. It also completely eliminates the RAID write hole, a problem in low end RAID card that cause data corruption.

Data Redundancy - Software RAID or Hardware RAID

There are basically two type of RAID that can be used to build a NAS, software RAID and hardware RAID. Here are some of the reasons why I prefer software RAID over hardware RAID.
  1. Hardware RAID creates more points of failure since RAID card can fail, and when they do fail, replace the card can be a problem if you want the data to be intact.
  2. Software RAID can maximize storage capacity when using different size disk.
  3. Software RAID only requires harddisk, while hardware RAID requires RAID controller. In other words, even if your motherboard does not have build in RAID controller, software RAID will also work just fine.
  4. Software RAID is hardware independent, changing a piece of hardware, i.e. SATA expansion card, will not affecting the RAID.
  5. Main role of my NAS is archiving, so performance is not really an issue.
Feel free to correct me if I made a mistake.

Wednesday, December 26, 2012

Data Redundancy - Nested RAID

Nested RAID is basically combining different RAID level together. I will briefly give 3 example, RAID 01 (RAID 0 + 1), RAID 10 (RAID 1 + 0) and RAID 50 (RAID 5 + 0).


RAID 01
RAID is basically mirroring two or more stripe volumes together, each stripe can contains 2 or more disk. Below show an example of 8 disk in RAID 10.


RAID 10
RAID 10 is basically striping multiple (2 or more) mirrors together. Mirror can have more than 2 disk. Below show an example of 8 disk in RAID 10.


If you have noticed, RAID 01 and RAID 10 are actually very similar. The above 2 RAID 01 and RAID 10 will give the same amount of usable space for data, but the main difference is fault tolerance. RAID 10 provides better fault tolerance. For example if Disk 1 fails, the probability that RAID 10 fails is 1/7, if disk 2 fails. But for RAID 01, the probability is 4/7, if either disk 5 6 7 8 fails, it will fail entirely.

RAID 50
In RAID 50, we are basically striping multiple RAID 5 volumes together.



Tuesday, December 25, 2012

Data Redundancy - RAID 6 parity

In the previous post I talked about different RAID levels and in this post I will explain about the math about RAID 6. It is a little complicated so please feel free to skip this post. Just remember that RAID 6 allow up to 2 disk to fail while having all data intact.

The math behind RAID 6 is Galois field, GF(2^8) (finite fields of characteristic 2 with 8 terms). By using GF(2^8), we can have up to a total of 257 disk, 255 data disk and 2 parity disk. Here are a summary of things we will be using, it is not an exhaustive properties.
  1. Additive identity is 0.
  2. Addition and subtraction is simply a bitwise XOR operation, thus A + B = A - B. *Note that A + A = A - A = 0.
  3. Multiplicative identity is 1. A * 1 = 1 * A = A
  4. Multiplication is implemented by the the following relations:
    • bit 7 of (A * 2) = bit 6 of (A)
    • bit 6 of (A * 2) = bit 5 of (A)
    • bit 5 of (A * 2) = bit 4 of (A)
    • bit 4 of (A * 2) = bit 3 of (A) + bit 7 of (A)
    • bit 3 of (A * 2) = bit 2 of (A) + bit 7 of (A)
    • bit 2 of (A * 2) = bit 1 of (A) + bit 7 of (A)
    • bit 1 of (A * 2) = bit 0 of (A)
    • bit 0 of (A * 2) = bit 7 of (A)
  5. Basic algebra applies, i.e., A * B = B * A, A * ( B + C) = A* B + A * C.
  6. For all A such that A is not 0, there exist an inverse A (1/A or A^-1) such that A * A^-1 = 1.
  7. A/B = A * 1/B = A * B^-1, for B not equal 0. Thus if A * B = C, then C/B = A for B not equal 0.
  8. Anything multiply by 0 is 0.
  9. Raising power can apply the distributive law:
    • *Number are in hexadecimal*
    • 02^2 = 02^1 * 02 = 04
    • 02^3 = 02^2 * 02 = 04 * 2 = 08
    • 02^4 = 02^3 * 02 = 08 * 2 = 10
    • 02^5 = 02^4 * 02 = 10 * 2 = 20
    • 02^6 = 02^5 * 02 = 20 * 2 = 40
    • 02^7 = 02^6 * 02 = 40 * 2 = 80
    • 02^8 = 02^7 * 02 = 80 * 2 = 1d (see number 4)
    • If you are multiplying by a number that is not a power of 2, for example 1F, you can express it like this: 1E = 10 + 08 + 04 + 02, thus A * 1E = A*10 + A*08 + A*04 + A*02.
  10. Raising power, the power is congruent mod 255. Base cannot be 0.
    • A^256 = A^(256 mod 255) = A^1 = A
    • A^255 = A^(255 mod 255) = A^0 = 1
    • A^254 = A^(255-1) = A^255 / A^1 = 1/A
    • A^-2 = 1 / A^2 = A^255 / A^2 = A^253
      • A^-x = A^(255-x)
  11. Element, g, is called generator if g is an element in GF(q) and all non zero element in GF(q) can be written as (g^n mod q)
    • 2 is a generator for GF(3). GF(3) has 2 non zero element, 1 2.
      • 2^1 mod 3 = 2
      • 2^2 mod 3 = 1
    • 2 is not a generator for GF(7). There exists no i such that (2^i mod 7) = 3.
  12. Any generator g, defines a function from non zero elements in GF(2^8) to elements in 255 (0 to 254 mod 255). For example g = 02 (Hex), 02^6 = 40, log02 40 = 6.
  13. For any non zero element A and B in Galois field
    • *logg => log base g, (+) and (-) are normal addition and subtraction followed by mod 255*
    • A*B = C <==> logg A (+) logg B = logg C
    • thus A*B = C <==> C = g^(logg A (+) logg B)
    • A/B = C <==> logg A (-) logg B = logg C
    • thus A/B = C <==> C = g^(logg A (-) logg B)
Please note that I am only be simply listing the properties for the sole purpose to explain how RAID 6 works, if you are interested how the properties are derived, you can read up on finite fields.

How RAID 6 works? RAID 6 requires two parity data, P and Q, to ensure all data are intact even after suffering up to 2 disk failure. Please note that P is the same as even parity of RAID 5.

Formula of P and Q, for n is the number of data disk and n <= 255 disk:
P = Disk0 + Disk1 + Disk2 + ... + Disk(n-1)
Q = (g^0 * Disk0) + (g^1 * Disk1) + (g^2 * Disk2) + ... +(g^(n-1) * Disk(n-1))
*g is any generator and we will be operating byte by byte

For example a 3 data disk setup, and using 02 (Hex) for our generator:
P = Disk0 + Disk1 + Disk2
= 1010 1010 + 1100 1100 + 0001 0110
= 0111 0000

g^0 = 02^0 = 0000 0001

g^1 = 02^1 = 0000 0010
g^2 = 02^2 = 0000 0100

Q = (g^0 * Disk0) + (g^1 * Disk1) + (g^2 * Disk2)
= (0000 0001 * 1010 1010) + (0000 0010 * 1100 1100) + (0000 0100 * 0001 0110)
= 1010 1010 + 1000 0101 + 0101 1000
= 0111 0111

Disk 0: 1010 1010
Disk 1: 1100 1100
Disk 2: 0001 0110
Disk 3: 0111 0000 (P)
Disk 4: 0111 0111 (Q)

For 1 disk failure:

If we lose a data disk, we can reconstruct the lost disk using P (see previous post about RAID 5 parity).
If we lose P or Q, we can reconstruct them using the data disks.

For 2 disk failure:
If we lose P and Q, we can simply reconstruct them using the data disks.

If we lose a data disk and Q, we can reconstruct the data disk using P (see previous post about RAID 5 parity), then reconstruct Q using the data disks.

If we lose a data disk and P, we can reconstruct the data disk using Q, then reconstruct P using the data disks

To reconstruct the data using Q, we first construct a Q' by assuming the lost data disk contains 0000 0000
By property 1, 5 and 8, we get:
Q' + (g^x * Dx) = Q (x denote the lost drive number, Dx is the data)

By property 2, we can rewrite it as:
(g^x * Dx) = Q + Q'

By property 6 and 7 and g^x is non since g is 02 (Hex), we can rewrite it as:
Dx = (Q + Q')/g^x = (Q + Q') * g^-x

By property 10, we can rewrite it as:
Dx = (Q + Q') * g^(255-x)

Here, Q, Q' and x are known, so we solve Dx

Here is an example:
Assume we lose disk 2 (data) and disk 3 (P), x = 2
Q' = (0000 0001 * 1010 1010) + (0000 0010 * 1100 1100) + (0000 0100 * 0000 0000)
= 1010 1010 + 1000 0101 + 0000 0000
= 0010 1111
Q + Q' = 0111 0111 + 0010 1111
= 0101 1000
Dx = (Q + Q') * g^(255-x)
= 0101 1000 * g^(255-2)
= 0101 1000 * g^(253)
= 0101 1000 * 0100 0111 (apply property 9)
= (0101 1000 * 0100 0000) + (0101 1000 * 0000 0100) + (0101 1000 * 0000 0010) + (0101 1000 * 0000 0001)
= 1000 0011 + 0111 1101 + 1011 0000 + 0101 1000
= 0001 0110 (disk 2's data, with this we can easily compute P)

If we lose 2 data disk, we can reconstruct both the data using P and Q.
First we need to compute P' and Q' by assuming the 2 lost data disk contains 0000 0000.

By property 1, 5 and 8, we get 2 equations:
(1) P' + Dx + Dy = P (x and y denote the lost drive number, Dx and Dy denote the data)
(2) Q' + (g^x * Dx) + (g^y * Dy) = Q

We have 2 unknown variable, Dx and Dy, and we have 2 equations, so yes, simultaneous equations.
Divide (2) by g^x
(3) (Q' * g^-x) + Dx + (g^(y-x) * Dy) = (Q * g^-x)

By property 2, we rearrange (3)
Dx = (Q * g^-x) + (Q' * g^-x) + (g^(y-x) * Dy)
(4) Dx = ((Q + Q') * g^-x) + (g^(y-x) * Dy)

By property 2, we rearrange (1)
(5) Dy = P + P' + Dx

Sub (5) into (4)
Dx = ((Q + Q') * g^-x) + (g^(y-x) * (P + P' + Dx))
Dx = ((Q + Q') * g^-x) + (g^(y-x) * (P + P')) + (g^(y-x) * Dx)
Dx + (g^(y-x) * Dx) = ((Q + Q') * g^-x) + (g^(y-x) * (P + P')) (by property 2)
(6) Dx * (1 + g^(y-x)) = ((Q + Q') * g^-x) + (g^(y-x) * (P + P'))

By property 6 and 7, if (1 + g^y-x) is not equal to 0, we can divide by it. So, g^(y-x) cannot equal to 1 which means y and x are not congruent mod 255. Since we do not allow more than 255 disk, and by our assumption, the 2 disk are different disk, we do not violate the constraint.

Now we rewrite (6)
(7) Dx = (((Q + Q') * g^-x) + (g^(y-x) * (P + P'))) / (1 + g^(y-x))

All the variable on the right hand side of (7) are all known variable, thus we can solve Dx. After we solve Dx, we can simply use (5) to solve for Dy.

Here is an example:
Assume we lose disk 1 (data) and disk 2 (data), x = 1, y = 2
Q' = (0000 0001 * 1010 1010) + (0000 0010 * 0000 0000) + (0000 0100 * 0000 0000)
= 1010 1010 + 0000 0000 + 0000 0000
= 1010 1010
Q + Q' = 0111 0111 + 1010 1010
= 1101 1101
P' = 1010 1010 + 0000 0000 + 0000 0000
= 1010 1010
P + P' = 0111 0000 + 1010 1010
= 1101 1010
(Q + Q') * g^-x = 1101 1101 * g^-1
= 1101 1101 * g^255-1
= 1101 1101 * g^254
= 1101 1101 * 1000 1110 (apply property 9)
= (1101 1101 * 1000 0000) + (1101 1101 * 0000 1000) + (1101 1101 * 0000 0100) + (1101 1101 * 0000 0010)
= 1011 0010 + 1010 0110 + 0101 0011 + 1010 0111
= 1110 0000
g^(y-x) * (P + P') = g^(2-1) * 1101 1010
= g^1 * 1101 1010
= 0000 0010 * 1101 1010
= 1010 1001
Dx = (((Q + Q') * g^-x) + (g^(y-x) * (P + P'))) / (1 + g^(y-x))
= 0100 1001 / 0000 0011 (apply property 6 and 7)
= 0100 1001 * 1111 0100 (apply property 9)
= 0100 0000 * 1111 0100 + 0000 1000 * 1111 0100 + 0000 0001 * 1111 0100
= 1100 1011 + 1111 0011 + 1111 0100
= 1100 1100 (Disk 1 data, with this we can easily compute Disk 2)

Monday, December 24, 2012

Data Redundancy - RAID

We will be using RAID to provide us with redundancy. So what is RAID?

Redundant Array of Independent Disks (RAID), is basically a storage technology that can be used to provide safer storage by using an array of disk. Several RAID configurations that we are interested in, are RAID 0, RAID 1, RAID 5, RAID 6. For more details about RAID, you can refer to Wikipedia.

RAID 0 (AKA striping)
This configuration does not provide any data redundancy. It is basically using 2 or more disk together like a single disk. If one of the disk fail, the whole array will fail, all your data will be lost.


RAID 1 (AKA mirror)
Basically you use two or more disk together all storing the same data. If you have 3 drive, and 2 of them failed, you still have all your data intact. For n number of drive, you can have up to n-1 drive to fail while still having you data intact.


RAID 5
Basically you use three or more disk together. 1 disk worth of space used for parity while the rest is for data. In other words, if you have n drive, you can only use up to n-1 drive worth of space. The parity is distributed into all drive. In this configuration you can have 1 disk fail and still keeping all data intact.


Some might be wondering what is parity, so let me explain a little about parity. For example, using even parity (the column sums to an even number), a file is stored in disk 1 2 and 3, then the parity is stored in disk 4.
Disk 1's content: 1 1 1 0 0 1 0 1 0 1 0 0 0
Disk 2's content: 1 0 1 0 1 0 1 0 1 0 1 0 1
Disk 3's content: 0 0 1 1 0 0 1 1 0 0 1 0 1
Disk 4's content: 0 1 1 1 1 1 0 0 1 1 0 0 0 (even parity)

If Disk 2 fail, we can rebuild the content because column must sum to an even number.

Disk 1's content: 1 1 1 0 0 1 0 1 0 1 0 0 0
Disk 3's content: 0 0 1 1 0 0 1 1 0 0 1 0 1
Disk 4's content: 0 1 1 1 1 1 0 0 1 1 0 0 0 (even parity)


First column is 1 + 0 + 0 = 1, so the first bit in disk 2 must be 1.
Second column is 1 + 0 + 1 = 2, so the second bit is disk 2 must be 0.
So on and so forth.

RAID 6
RAID 6 is similar to RAID 5 but it has 2 sets of parity thus using 2 disk worth of space for parity and you need four or more disk. In this configuration you can have 2 disk fail and still keeping all data intact and if you have n drive, you can only use up to n-2 drive worth of space.


The parity math here gets a little more complicated so I will post the details in the next post. Merry Christmas!

Please note that RAID requires all the disk to be of the same size. If you use different size, the smallest size among them will be used for building the array. For example, four disk, 200 GB, 400 GB, 400 GB, 400 GB in RAID 5, it will be viewed as four 200 GB disk the total usable space will be 600 GB (4-1 disk worth of space). All the pictures are taken from Wikipedia, I take no credit in them.

Sunday, December 23, 2012

First Post

If anyone had a hard disk crash before, you will understand how dangerous it is to rely on a single hard disk. Even with Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T), my hard disk still crashed without triggering any warning. Haha. So after some consideration, I decided that I need a Network Attached Storage (NAS) with data redundancy but the problem is the current NAS in the market is very expensive. A five bay NAS can cost up to thousands, so I decided to DIY a NAS for myself.

One of the most impressive NAS I came across is Drobo. It has really impressive features. My ideal build will be to be able to have features like drobo and also have low power consumption. A green NAS. Haha.

I will be sharing my experience of building one through this blog. So stay around if you are interested. :)