Introduction of Hamming Code
Either computer data storage or telecommunication, regardless of the data storages and transmission, is non-zero probabilities that the data could be changed while it's being stored or transmitted. There is always a code-word with block length without free bit-errors. That means the data probably could be changed while it is being processed or transmitted. If the machine can't locate the position of the error and correct it, the information might be lost forever.
Introduction
Early, Richard Hamming, the mathematician and computer scientist, grew increasingly fustrated with having to restart his programs from scratch due to detected errors in the relays. After that and over the few years, he worked on the problem of error-correction, developing an increasingly powerful array of algorithms, which is now known as Hamming code. This invented code still remains in use today in application such as Error correction code memory.
In this post, I would like to try to walk through how Hamming code were invented mathematically, and also the fundamental theory of error detection and error correction slightly.
Error Detection & Error Correction
Error detection and error correction are different techniques that enable reliable delivery of digital data over unreliable communications channels. Usually error detection is much simpler whereas error correction could be more complicated. In addition, error detection and error correction does not always work. That means all error-detection and error correction methods only work below a certain error rate. Above that rate, the line is simply not usable.
Error Detecting Code
Parity
Parity bit appended to a binary number provides the simplest form of error detecting code. If a single bit in the resulting value is changed, then it will no longer have the correct parity: changing a bit in the orginal number gives it a different parity than the recorded one, and changing the parity bit while not changing the number it was derived from again produces an incorrect result.
The following is a simple example of attaching 1 parity bit to 7 bits of data, making the bit string to always have even number of 1s. Therefore, the XOR of the 8-bit data is always 0.
Data (7 Bits) | Count of 1-Bits | Parity Bit | Data Including Parity (8 Bits) | XOR |
---|---|---|---|---|
0000000 | 0 | 0 | 00000000 | 0 |
1010001 | 3 | 1 | 10100011 | 0 |
1101001 | 4 | 0 | 11010010 | 0 |
1111111 | 7 | 1 | 11111111 | 0 |
If the bit string has one single error, either in the data or at the parity bit position, the number of 1st in the bit string will not be even, and XOR will not be 0. However, if there are even number of errors in the bit string, the error could never be detected, as XOR would remain 0.
Moreover, parity does not indicate which bit contained the error, even when it can detect it. The data must be discarded entirely and re-transmitted from scratch, which could be inconvenient and time-consuming.
Error Correcting Code
Repetitions
The repetition code is one of the most basic error-correcting codes. In order to transmit a message over a noisy channel that may corrupt the transmission in a few places, the idea of the repetition code is to just repeat the message several times in order to ensure that it was sent correctly.
The following is a simple example of repeating every bit from 3 bits of data 3 times.
Data (3 Bits) | Number of Repetitions | Data Including Repetitions |
---|---|---|
000 | 3 | 000000000 |
010 | 3 | 000111000 |
101 | 3 | 111000111 |
111 | 3 | 111111111 |
If the data including repetitions received are not identical, an error occurred during transmission. If the channel is clean enough, most of the time only one bit will change in each triple. Therefore, 001, 010, and 100 each correspond to 0 bit, while 110, 101, and 001 correspond to a 1 bit, with the greater quantity of digits that are the same ('0' or a '1') indicating what the data bit should be.
However, such repetition code cannot correctly repair all errors, but could be mitigated by using larger number of repetitions. The more bit repetitions to vote, the more robust the error correction code to error rate. The drawback is that it will reduce throughput and the efficiency drops drastically as we increase the number of times each bit is duplicated.
Hamming Codes
Hamming codes can detect one-bit and two-bit errors, or correct
one-bit errors without detection of uncorrested errors. In mathematical
terms, Hamming codes are a class of binary linear code. For each integer
Here is an example, for
consists of 4 data bits,
The Binary Hamming(7,4) code (with r = 3)
Binary Linear Algebra
The encoding and decoding of Hamming codes could be represented as linear block codes. The property of linearity is the sum of any two codewords is also a code word, and they are applied to the soruce bits in blocks. Let's try to be the genius Richard Hamming in 1950s and come up with a generic Hamming code for different numbers of error-correction bits.
The number of parity bits
Construction of Generator & Parity-check Matrix
Hamming code could be represented using (binary) linear algebra.
Consider Hamming(7,4) code, the data bits could be represented using a
column vector
There are two matrics: generator matrix G and parity-check matrix H for linear block codes. The construction of G and H is in standard (or systematic) form:
The generator matrix of a linear (n, k) code:
The parity-check matrix of a linear (n, k) code:
Regardless of form, G and H also must satisfy:
Encoding:
To encode the data bits d, the codeword
x is given by the standard matrix product
Note: the generator matrix G could be easily derived from the bit sequence and parity table.
The Hamming(7,4) encoded bits x would be
Error Detection:
Suppose that the reveived data is x', and x' may or may not equal to x. The error detection is just parity checking by applying the parity-check matrix H on x', where
The property of H is that each column in H is actually binary index sequence, 1, 2, 3, etc. This property will be very useful for error correction. From the definition of the parity-check matrix is directly follows the minimum distance of the code is the minimum number d while there exist d columns of H that are linearly dependent.
Error Correction:
Suppose that the actual Hamming encoded code reveived would be
Thus, parity checking would just be
The resulting value would be non-zero if there is a bit error. In order to fix the bit error, just simply flip the bit value at address x.
Now, as you can see, the Hamming code could be derived from the bit sequence and parity table. The way Richard Hamming worked for error correction must be some mathematical motivations behind. However, I did not get time to reserach and explore. Thay's why Richard Hamming was a famous mathematician but I am not. :)
Reference
Thanks for reading! Feel free to leave the comments below or email to me. Any pieces of advice or discussions are always welcome. :)